Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wildcampstudio.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Basic Usage

The delve CLI provides a simple interface for taxonomy generation:
delve run DATA_SOURCE [OPTIONS]

Command: run

Generate a taxonomy from a data source and categorize documents.
delve run DATA_SOURCE [OPTIONS]

Arguments

DATA_SOURCE
string
required
Path or URI to your data source. Can be:
  • Path to CSV file (e.g., data.csv)
  • Path to JSON/JSONL file (e.g., data.json)
  • LangSmith URI (e.g., langsmith://project-name)

Options

Data Source Options

--text-column
string
Column name containing text data (required for CSV files)
delve run data.csv --text-column conversation
--id-column
string
Column name for document IDs (optional). If not specified, auto-generated IDs will be used.
delve run data.csv --text-column text --id-column doc_id
--json-path
string
JSONPath expression for extracting text from nested JSON structures.
delve run data.json --json-path "$.messages[*].content"
JSONPath allows you to access nested fields. Example: $.data[*].attributes.text extracts text from deeply nested objects.
--source-type
string
default:"auto"
Force specific data source type. Options: csv, json, jsonl, langsmith, auto
delve run data.txt --source-type json
By default, Delve auto-detects the source type from file extensions.

Model Options

--model
string
default:"anthropic/claude-sonnet-4-5-20250929"
Main LLM model for taxonomy generation and reasoning.
delve run data.csv --text-column text --model anthropic/claude-opus-4
Supported models:
  • anthropic/claude-sonnet-4-5-20250929 (default)
  • anthropic/claude-opus-4
  • Any model supported by LiteLLM
--fast-llm
string
default:"anthropic/claude-haiku-4-5-20251001"
Fast LLM model for document summarization.
delve run data.csv --text-column text --fast-llm anthropic/claude-haiku-4-5-20251001
Use a faster, cheaper model for summarization to reduce costs.

Processing Options

--sample-size
integer
default:"100"
Number of documents to sample for taxonomy generation.
delve run data.csv --text-column text --sample-size 200
Larger samples (200-500) produce more comprehensive taxonomies but take longer and cost more. Start with 100 for quick iterations.
--batch-size
integer
default:"200"
Number of documents per minibatch during iterative clustering.
delve run data.csv --text-column text --batch-size 50
Smaller batches (50-100) produce more refined taxonomies through more iterations. Larger batches (200-300) are faster but may be less precise.
--max-clusters
integer
default:"5"
Maximum number of clusters/categories to generate in the taxonomy.
delve run data.csv --text-column text --max-clusters 10
Start with a smaller number (5-10) for focused taxonomies. Increase for more granular categorization of diverse datasets.

Output Options

--output-dir
path
default:"./results"
Directory for saving results.
delve run data.csv --text-column text --output-dir ./my-results
Creates the directory if it doesn’t exist.
--output-format
string[]
default:"['json', 'csv', 'markdown']"
Output formats to generate. Can specify multiple times.
# Only JSON
delve run data.csv --text-column text --output-format json

# JSON and CSV
delve run data.csv --text-column text --output-format json --output-format csv

# All formats (default)
delve run data.csv --text-column text --output-format json --output-format csv --output-format markdown
Available formats:
  • json - Machine-readable taxonomy and labeled documents
  • csv - Spreadsheet format for analysis
  • markdown - Human-readable reports

Customization Options

--use-case
string
Custom description of your taxonomy use case. Helps guide the LLM to generate relevant categories.
delve run data.csv --text-column text \
  --use-case "Categorize customer feedback into product features and sentiment"
Providing a use case improves taxonomy quality by giving the model context about your domain and goals.

LangSmith Options

--langsmith-key
string
LangSmith API key for accessing LangSmith data sources.
delve run langsmith://my-project --langsmith-key $LANGSMITH_API_KEY
Can also be set via LANGSMITH_API_KEY environment variable.
--days
integer
default:"7"
Number of days to look back when fetching LangSmith runs.
delve run langsmith://my-project --langsmith-key $KEY --days 14

Output Control

-q / -v / -vv
verbosity flags
Control output verbosity level.
# Normal (default) - spinners and checkmarks
delve run data.csv --text-column text

# Quiet - errors only
delve run data.csv --text-column text -q

# Verbose - progress bars with ETA
delve run data.csv --text-column text -v

# Debug - everything including internal state
delve run data.csv --text-column text -vv
Levels:
  • No flag: NORMAL - Spinners and completion checkmarks
  • -q: QUIET - Errors only
  • -v: VERBOSE - Progress bars with throughput-based ETA
  • -vv: DEBUG - Full debug output including warnings

Verbosity Output Examples

⠹ Validating API keys...
✓ API keys validated
⠹ Loading data from data.csv...
✓ Loaded 5,000 documents
⠹ Generating taxonomy...
✓ Generated 12 categories
⠹ Labeling documents...
✓ Labeled 5,000 documents
✓ Results saved to ./results/
✓ API keys validated
✓ Loaded 5,000 documents

Labeling documents with LLM ━━━━━━━━━━━━━━━━ 100% 100/100 0:01:45 0:00:00
✓ Classifier trained - Test F1: 0.847, Test Accuracy: 0.85
✓ Total labeled: 5,000 documents
  - 100 by LLM
  - 4,900 by classifier

✓ Results saved to ./results/
==================================================
Delve Configuration:
  Model: anthropic/claude-sonnet-4-5-20250929
  Fast LLM: anthropic/claude-haiku-4-5-20251001
  Sample size: 100
  Batch size: 200
  Embedding model: text-embedding-3-large
  Output dir: ./results
  Use case: Generate taxonomy...
==================================================

✓ Loaded 5,000 documents
Taxonomy has 12 categories:
  [1] Technical Support
  [2] Billing Inquiry
  ...

Training set: 95 samples, 10 classes
Class distribution:
  [0] Technical Support: 15 samples
  [2] Billing Inquiry: 12 samples
  ...

! Warning: 5 documents labeled as 'Other', skipped
✓ Results saved to ./results/

Examples

# CSV with required text column
delve run data.csv --text-column message

# JSON with JSONPath for nested data
delve run messages.json --json-path "$.conversations[*].text"

# LangSmith project
delve run langsmith://my-project --langsmith-key $LANGSMITH_API_KEY --days 7

# Full configuration example
delve run data.csv \
  --text-column feedback \
  --sample-size 200 \
  --output-dir ./results \
  --use-case "Categorize support tickets by issue type"

Output Files

Delve generates multiple output files in your specified output directory:
results/
├── taxonomy.json              # Machine-readable taxonomy with metadata
├── labeled_documents.json     # All documents with assigned categories
├── labeled_data.csv          # Spreadsheet format with categories
├── taxonomy_reference.csv    # Category lookup table
├── report.md                 # Human-readable summary with statistics
└── metadata.json             # Run configuration and metadata

taxonomy.json

Complete taxonomy with category descriptions and metadata:
{
  "taxonomy": [
    {
      "id": "1",
      "name": "Technical Support",
      "description": "Questions about technical issues and troubleshooting"
    },
    {
      "id": "2",
      "name": "Billing Inquiry",
      "description": "Questions about pricing, payments, and invoices"
    }
  ],
  "metadata": {
    "num_documents": 100,
    "num_categories": 5,
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "timestamp": "2024-01-15T10:30:00Z"
  }
}

labeled_data.csv

Spreadsheet format for easy analysis:
id,content,category,explanation
doc1,"How do I reset my password?","Technical Support","User asking about account recovery"
doc2,"What are your pricing plans?","Billing Inquiry","Question about product pricing"

report.md

Human-readable Markdown report with:
  • Taxonomy overview
  • Category descriptions
  • Document distribution statistics
  • Sample documents per category

Getting Help

# Get general help
delve --help

# Get help for run command
delve run --help

# Check version
delve --version

Environment Variables

Set these environment variables before running Delve:
# Required
export ANTHROPIC_API_KEY="your-anthropic-key"

# Required when sample_size > 0 and docs > sample_size (for classifier)
export OPENAI_API_KEY="your-openai-key"

# Optional
export LANGSMITH_API_KEY="your-langsmith-key"
The OpenAI API key is needed for generating embeddings when training the classifier. If your dataset is small enough that all documents are labeled by the LLM (no classifier needed), you can skip the OpenAI key.

Next Steps

SDK Reference

Use Delve programmatically

Examples

See more code examples