Documentation Index
Fetch the complete documentation index at: https://wildcampstudio.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Basic Usage
Thedelve CLI provides a simple interface for taxonomy generation:
Command: run
Generate a taxonomy from a data source and categorize documents.Arguments
Path or URI to your data source. Can be:
- Path to CSV file (e.g.,
data.csv) - Path to JSON/JSONL file (e.g.,
data.json) - LangSmith URI (e.g.,
langsmith://project-name)
Options
Data Source Options
Column name containing text data (required for CSV files)
Column name for document IDs (optional). If not specified, auto-generated IDs will be used.
JSONPath expression for extracting text from nested JSON structures.
JSONPath allows you to access nested fields. Example:
$.data[*].attributes.text extracts text from deeply nested objects.Force specific data source type. Options: By default, Delve auto-detects the source type from file extensions.
csv, json, jsonl, langsmith, autoModel Options
Main LLM model for taxonomy generation and reasoning.Supported models:
anthropic/claude-sonnet-4-5-20250929(default)anthropic/claude-opus-4- Any model supported by LiteLLM
Fast LLM model for document summarization.Use a faster, cheaper model for summarization to reduce costs.
Processing Options
Number of documents to sample for taxonomy generation.
Number of documents per minibatch during iterative clustering.
Smaller batches (50-100) produce more refined taxonomies through more iterations. Larger batches (200-300) are faster but may be less precise.
Maximum number of clusters/categories to generate in the taxonomy.
Output Options
Directory for saving results.Creates the directory if it doesn’t exist.
Output formats to generate. Can specify multiple times.Available formats:
json- Machine-readable taxonomy and labeled documentscsv- Spreadsheet format for analysismarkdown- Human-readable reports
Customization Options
Custom description of your taxonomy use case. Helps guide the LLM to generate relevant categories.
LangSmith Options
LangSmith API key for accessing LangSmith data sources.Can also be set via
LANGSMITH_API_KEY environment variable.Number of days to look back when fetching LangSmith runs.
Output Control
Control output verbosity level.Levels:
- No flag:
NORMAL- Spinners and completion checkmarks -q:QUIET- Errors only-v:VERBOSE- Progress bars with throughput-based ETA-vv:DEBUG- Full debug output including warnings
Verbosity Output Examples
NORMAL (default)
NORMAL (default)
VERBOSE (-v)
VERBOSE (-v)
DEBUG (-vv)
DEBUG (-vv)
Examples
Output Files
Delve generates multiple output files in your specified output directory:taxonomy.json
Complete taxonomy with category descriptions and metadata:labeled_data.csv
Spreadsheet format for easy analysis:report.md
Human-readable Markdown report with:- Taxonomy overview
- Category descriptions
- Document distribution statistics
- Sample documents per category
Getting Help
Environment Variables
Set these environment variables before running Delve:The OpenAI API key is needed for generating embeddings when training the classifier. If your dataset is small enough that all documents are labeled by the LLM (no classifier needed), you can skip the OpenAI key.
Next Steps
SDK Reference
Use Delve programmatically
Examples
See more code examples
