CLI Reference

Basic Usage

The delve CLI provides a simple interface for taxonomy generation:

delve run DATA_SOURCE [OPTIONS]

Command: run

Generate a taxonomy from a data source and categorize documents.

delve run DATA_SOURCE [OPTIONS]

Arguments

DATA_SOURCE

string

required

Path or URI to your data source. Can be:

Path to CSV file (e.g., data.csv)
Path to JSON/JSONL file (e.g., data.json)
LangSmith URI (e.g., langsmith://project-name)

Options

Data Source Options

--text-column

string

Column name containing text data (required for CSV files)

delve run data.csv --text-column conversation

--id-column

string

Column name for document IDs (optional). If not specified, auto-generated IDs will be used.

delve run data.csv --text-column text --id-column doc_id

--json-path

string

JSONPath expression for extracting text from nested JSON structures.

delve run data.json --json-path "$.messages[*].content"

JSONPath allows you to access nested fields. Example: $.data[*].attributes.text extracts text from deeply nested objects.

--source-type

string

default:"auto"

Force specific data source type. Options: csv, json, jsonl, langsmith, auto

delve run data.txt --source-type json

By default, Delve auto-detects the source type from file extensions.

Model Options

--model

string

default:"anthropic/claude-sonnet-4-5-20250929"

Main LLM model for taxonomy generation and reasoning.

delve run data.csv --text-column text --model anthropic/claude-opus-4

Supported models:

anthropic/claude-sonnet-4-5-20250929 (default)
anthropic/claude-opus-4
Any model supported by LiteLLM

--fast-llm

string

default:"anthropic/claude-haiku-4-5-20251001"

Fast LLM model for document summarization.

delve run data.csv --text-column text --fast-llm anthropic/claude-haiku-4-5-20251001

Use a faster, cheaper model for summarization to reduce costs.

Processing Options

--sample-size

integer

default:"100"

Number of documents to sample for taxonomy generation.

delve run data.csv --text-column text --sample-size 200

Larger samples (200-500) produce more comprehensive taxonomies but take longer and cost more. Start with 100 for quick iterations.

--batch-size

integer

default:"200"

Number of documents per minibatch during iterative clustering.

delve run data.csv --text-column text --batch-size 50

Smaller batches (50-100) produce more refined taxonomies through more iterations. Larger batches (200-300) are faster but may be less precise.

--max-clusters

integer

default:"5"

Maximum number of clusters/categories to generate in the taxonomy.

delve run data.csv --text-column text --max-clusters 10

Start with a smaller number (5-10) for focused taxonomies. Increase for more granular categorization of diverse datasets.

Output Options

--output-dir

path

default:"./results"

Directory for saving results.

delve run data.csv --text-column text --output-dir ./my-results

Creates the directory if it doesn’t exist.

--output-format

string[]

default:"['json', 'csv', 'markdown']"

Output formats to generate. Can specify multiple times.

# Only JSON
delve run data.csv --text-column text --output-format json

# JSON and CSV
delve run data.csv --text-column text --output-format json --output-format csv

# All formats (default)
delve run data.csv --text-column text --output-format json --output-format csv --output-format markdown

Available formats:

json - Machine-readable taxonomy and labeled documents
csv - Spreadsheet format for analysis
markdown - Human-readable reports

Customization Options

--use-case

string

Custom description of your taxonomy use case. Helps guide the LLM to generate relevant categories.

delve run data.csv --text-column text \
  --use-case "Categorize customer feedback into product features and sentiment"

Providing a use case improves taxonomy quality by giving the model context about your domain and goals.

LangSmith Options

--langsmith-key

string

LangSmith API key for accessing LangSmith data sources.

delve run langsmith://my-project --langsmith-key $LANGSMITH_API_KEY

Can also be set via LANGSMITH_API_KEY environment variable.

--days

integer

default:"7"

Number of days to look back when fetching LangSmith runs.

delve run langsmith://my-project --langsmith-key $KEY --days 14

Output Control

-q / -v / -vv

verbosity flags

Control output verbosity level.

# Normal (default) - spinners and checkmarks
delve run data.csv --text-column text

# Quiet - errors only
delve run data.csv --text-column text -q

# Verbose - progress bars with ETA
delve run data.csv --text-column text -v

# Debug - everything including internal state
delve run data.csv --text-column text -vv

Levels:

No flag: NORMAL - Spinners and completion checkmarks
-q: QUIET - Errors only
-v: VERBOSE - Progress bars with throughput-based ETA
-vv: DEBUG - Full debug output including warnings

Verbosity Output Examples

NORMAL (default)

⠹ Validating API keys...
✓ API keys validated
⠹ Loading data from data.csv...
✓ Loaded 5,000 documents
⠹ Generating taxonomy...
✓ Generated 12 categories
⠹ Labeling documents...
✓ Labeled 5,000 documents
✓ Results saved to ./results/

VERBOSE (-v)

✓ API keys validated
✓ Loaded 5,000 documents

Labeling documents with LLM ━━━━━━━━━━━━━━━━ 100% 100/100 0:01:45 0:00:00
✓ Classifier trained - Test F1: 0.847, Test Accuracy: 0.85
✓ Total labeled: 5,000 documents
  - 100 by LLM
  - 4,900 by classifier

✓ Results saved to ./results/

DEBUG (-vv)

==================================================
Delve Configuration:
  Model: anthropic/claude-sonnet-4-5-20250929
  Fast LLM: anthropic/claude-haiku-4-5-20251001
  Sample size: 100
  Batch size: 200
  Embedding model: text-embedding-3-large
  Output dir: ./results
  Use case: Generate taxonomy...
==================================================

✓ Loaded 5,000 documents
Taxonomy has 12 categories:
  [1] Technical Support
  [2] Billing Inquiry
  ...

Training set: 95 samples, 10 classes
Class distribution:
  [0] Technical Support: 15 samples
  [2] Billing Inquiry: 12 samples
  ...

! Warning: 5 documents labeled as 'Other', skipped
✓ Results saved to ./results/

Examples

# CSV with required text column
delve run data.csv --text-column message

# JSON with JSONPath for nested data
delve run messages.json --json-path "$.conversations[*].text"

# LangSmith project
delve run langsmith://my-project --langsmith-key $LANGSMITH_API_KEY --days 7

# Full configuration example
delve run data.csv \
  --text-column feedback \
  --sample-size 200 \
  --output-dir ./results \
  --use-case "Categorize support tickets by issue type"

Output Files

Delve generates multiple output files in your specified output directory:

results/
├── taxonomy.json              # Machine-readable taxonomy with metadata
├── labeled_documents.json     # All documents with assigned categories
├── labeled_data.csv          # Spreadsheet format with categories
├── taxonomy_reference.csv    # Category lookup table
├── report.md                 # Human-readable summary with statistics
└── metadata.json             # Run configuration and metadata

taxonomy.json

Complete taxonomy with category descriptions and metadata:

{
  "taxonomy": [
    {
      "id": "1",
      "name": "Technical Support",
      "description": "Questions about technical issues and troubleshooting"
    },
    {
      "id": "2",
      "name": "Billing Inquiry",
      "description": "Questions about pricing, payments, and invoices"
    }
  ],
  "metadata": {
    "num_documents": 100,
    "num_categories": 5,
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "timestamp": "2024-01-15T10:30:00Z"
  }
}

labeled_data.csv

Spreadsheet format for easy analysis:

id,content,category,explanation
doc1,"How do I reset my password?","Technical Support","User asking about account recovery"
doc2,"What are your pricing plans?","Billing Inquiry","Question about product pricing"

report.md

Human-readable Markdown report with:

Taxonomy overview
Category descriptions
Document distribution statistics
Sample documents per category

Getting Help

# Get general help
delve --help

# Get help for run command
delve run --help

# Check version
delve --version

Environment Variables

Set these environment variables before running Delve:

# Required
export ANTHROPIC_API_KEY="your-anthropic-key"

# Required when sample_size > 0 and docs > sample_size (for classifier)
export OPENAI_API_KEY="your-openai-key"

# Optional
export LANGSMITH_API_KEY="your-langsmith-key"

The OpenAI API key is needed for generating embeddings when training the classifier. If your dataset is small enough that all documents are labeled by the LLM (no classifier needed), you can skip the OpenAI key.

Getting Started

Advanced Topics

CLI Usage

SDK Usage

Examples

CLI Reference

Basic Usage

Command: run

Arguments

Options

Data Source Options

Model Options

Processing Options

Output Options

Customization Options

LangSmith Options

Output Control

Verbosity Output Examples

Examples

Output Files

taxonomy.json

labeled_data.csv

report.md

Getting Help

Environment Variables

Next Steps

SDK Reference

Examples

Getting Started

Advanced Topics

CLI Usage

SDK Usage

Examples

Documentation Index

​Basic Usage

​Command: run

​Arguments

​Options

​Data Source Options

​Model Options

​Processing Options

​Output Options

​Customization Options

​LangSmith Options

​Output Control

​Verbosity Output Examples

​Examples

​Output Files

​taxonomy.json

​labeled_data.csv

​report.md

​Getting Help

​Environment Variables

​Next Steps

SDK Reference

Examples

Basic Usage

Command: run

Arguments

Options

Data Source Options

Model Options

Processing Options

Output Options

Customization Options

LangSmith Options

Output Control

Verbosity Output Examples

Examples

Output Files

taxonomy.json

labeled_data.csv

report.md

Getting Help

Environment Variables

Next Steps