Configuration Guide

This guide explains how Delve’s configuration parameters affect taxonomy quality, cost, and performance. Understanding these tradeoffs helps you tune Delve for your specific use case.

Model Selection

`model` - Main LLM

The main model handles the “thinking” tasks: taxonomy generation, iterative refinement, and quality review.

Model	Strengths	Best For
`anthropic/claude-sonnet-4-5-20250929`	Excellent balance of capability and cost	Default choice, most use cases
`anthropic/claude-opus-4`	Highest reasoning capability	Complex domains, nuanced categories
`anthropic/claude-haiku-4-5-20251001`	Fast and cheap	Quick iterations, simple data

How it affects results:

More capable models → Better category definitions, more nuanced distinctions
Faster models → Quicker iterations but potentially less refined taxonomies

# High-quality taxonomy for complex data
delve = Delve(model="anthropic/claude-opus-4")

# Quick iteration during development
delve = Delve(model="anthropic/claude-haiku-4-5-20251001")

Start with Claude Sonnet (default) and only upgrade to Opus if you need more nuanced category distinctions. The quality difference is often subtle for straightforward categorization tasks.

`fast_llm` - Summarization & Labeling Model

The fast LLM handles high-volume tasks: document summarization and individual document labeling. How it affects results:

Summary quality impacts downstream taxonomy quality (garbage in, garbage out)
Labeling accuracy directly affects your final results
Cost scales with document count, so model choice matters more here

# Default: fast and cost-effective
delve = Delve(fast_llm="anthropic/claude-haiku-4-5-20251001")

# Higher quality labeling at higher cost
delve = Delve(fast_llm="anthropic/claude-sonnet-4-5-20250929")

Claude Haiku is the recommended choice for most use cases. It’s significantly cheaper while maintaining good quality for summarization and labeling tasks.

Processing Parameters

`sample_size` - Taxonomy Discovery Sample

What it controls: How many documents are used to discover and validate the taxonomy. How it affects results:

Sample Size	Effect on Taxonomy	Effect on Labeling
Smaller (50-100)	May miss rare categories	Smaller training set for classifier
Medium (100-200)	Good coverage for typical data	Balanced accuracy
Larger (200-500)	Comprehensive coverage	Better classifier accuracy
Very large (500+)	Diminishing returns	Excellent but expensive

Cost implications:

Each sampled document requires LLM summarization
Each sampled document requires LLM labeling
Larger samples = proportionally higher costs for taxonomy discovery

# Quick exploration
delve = Delve(sample_size=50)

# Production use
delve = Delve(sample_size=150)

# Comprehensive analysis
delve = Delve(sample_size=300)

Setting sample_size=0 means ALL documents are labeled by the LLM. This is expensive for large datasets but guarantees every document gets LLM-quality labeling.

When to increase sample size:

Your data is highly diverse (many potential categories)
Initial runs are missing important categories
Classifier accuracy is below expectations

When to decrease sample size:

Your data is homogeneous
You’re iterating on taxonomy design
Budget is constrained

`batch_size` - Minibatch Size for Clustering

What it controls: How many documents the LLM sees at once during taxonomy generation. How it affects results:

Batch Size	Iterations	Taxonomy Quality
Smaller (50-100)	More iterations	More refined, potentially more categories discovered
Medium (150-200)	Moderate	Balanced
Larger (200-300)	Fewer iterations	Faster but may miss nuances

Example: With sample_size=200:

batch_size=50 → 4 iterations of refinement
batch_size=200 → 1 iteration (no refinement)

# More refined taxonomy (more iterations)
delve = Delve(sample_size=200, batch_size=50)

# Faster processing
delve = Delve(sample_size=200, batch_size=200)

If your taxonomy seems to be missing categories, try reducing batch_size to allow more refinement iterations.

`max_num_clusters` - Category Limit

What it controls: The maximum number of categories the LLM will generate. How it affects results:

Max Clusters	Result
Small (3-5)	High-level, broad categories
Medium (5-10)	Balanced granularity
Large (10-20)	Fine-grained categories
Very large (20+)	Risk of overlapping or sparse categories

Choosing the right value:

Consider how you’ll use the taxonomy
More categories = more specific insights but harder to analyze
Fewer categories = easier to understand but less detail

# Executive summary level
delve = Delve(max_num_clusters=5)

# Detailed analysis
delve = Delve(max_num_clusters=15)

Setting this too high can lead to overlapping categories or categories with very few documents. The LLM may also create artificial distinctions to fill the quota.

Recommendations by use case:

Use Case	Recommended `max_num_clusters`
Quick overview	3-5
Support ticket triage	5-10
Detailed content analysis	10-15
Research categorization	10-20

Classification Parameters

`embedding_model` - Classifier Embeddings

What it controls: Which OpenAI model generates embeddings for classifier training. Available options:

text-embedding-3-large (default) - Highest quality, 3072 dimensions
text-embedding-3-small - Faster, cheaper, 1536 dimensions
text-embedding-ada-002 - Legacy, 1536 dimensions

How it affects results:

Better embeddings → Better classifier accuracy
Larger embeddings → Slightly slower training and inference

# Best accuracy (default)
delve = Delve(embedding_model="text-embedding-3-large")

# Cost-effective alternative
delve = Delve(embedding_model="text-embedding-3-small")

The difference in classifier accuracy between embedding models is usually small (1-3%). For most use cases, the default is fine.

`classifier_confidence_threshold` - Uncertainty Handling

What it controls: When to flag classifier predictions as uncertain. How it works:

Classifier outputs probability scores for each category
If the top probability is below the threshold, the prediction is flagged as uncertain
What happens to uncertain predictions is controlled by low_confidence_action
0.0 = Never flag (trust classifier for everything, default)
0.7 = Flag predictions below 70% confidence

# Trust classifier completely (default)
delve = Delve(classifier_confidence_threshold=0.0)

# Flag uncertain predictions (below 70% confidence)
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="other",  # Label as "Other"
)

Start with the default (0.0). If you notice miscategorized documents, try increasing the threshold to 0.6-0.8 and using low_confidence_action="other" to be honest about uncertainty.

`low_confidence_action` - What to Do with Uncertain Predictions

What it controls: How to handle documents where the classifier’s confidence is below the threshold. Available options:

Action	Behavior	Cost Impact
`"other"` (default)	Label as “Other” category	None
`"llm"`	Re-label with LLM (max 20 docs)	Low-Medium
`"keep"`	Keep the classifier’s prediction	None

How it works:

Only applies when classifier_confidence_threshold > 0
"other": Honest about uncertainty - the classifier doesn’t know, so label as “Other”
"llm": Re-label with LLM for better accuracy, but capped at 20 documents
"keep": Accept the classifier’s best guess despite low confidence

# Label uncertain predictions as "Other" (recommended for large datasets)
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="other",
)

# Re-label with LLM (small datasets only, max 20 docs)
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="llm",
)

# Keep classifier predictions regardless of confidence
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="keep",
)

Safeguard for "llm" action: If more than 20 documents need re-labeling, Delve automatically falls back to "other" and logs a warning. This prevents unexpected costs on imbalanced datasets.

Don’t use this as a replacement for “Other” in your taxonomy. Low classifier confidence usually means uncertainty between valid categories, not that the document doesn’t fit any category. In testing, inferring “Other” from confidence achieved only 45% accuracy vs 89% when “Other” was included in the taxonomy. See the Class Imbalance guide for details.

`min_examples_per_category` - Sample Augmentation

What it controls: Minimum number of training examples required per category. How it works:

After initial LLM labeling, Delve checks category distribution
For categories below the minimum, it finds similar documents using embedding search
Those candidates are labeled by LLM and added to the training set
0 = Disabled (default)
5 = Ensure at least 5 training examples per category

When to use:

Your data has significant class imbalance
Some categories are rare (< 1% of data)
You’re using a predefined taxonomy with many categories

Tradeoffs:

Value	Effect	LLM Cost Increase
0 (default)	No augmentation	None
3	Light balancing	+10-20%
5	Moderate balancing	+20-40%
10	Heavy balancing	+50-100%

# Disabled (default)
delve = Delve(min_examples_per_category=0)

# Ensure minimum coverage for imbalanced data
delve = Delve(min_examples_per_category=5)

# Aggressive balancing for highly skewed data
delve = Delve(min_examples_per_category=10)

See the Handling Class Imbalance guide for a detailed explanation of when and how to use this parameter.

`sampling_strategy` - Sampling Mode

What it controls: How documents are selected for the initial sample. Available options:

random (default) - Simple random sampling
stratified - Reserved for future use

# Random sampling (default)
delve = Delve(sampling_strategy="random")

For handling imbalanced data, use min_examples_per_category rather than changing the sampling strategy. The sample augmentation approach is more effective because it uses embedding similarity to find good candidates.

Customization Parameters

`use_case` - Domain Context

What it controls: Provides context to the LLM about your specific use case. How it affects results:

More specific use cases → More relevant category names and descriptions
Guides the LLM to focus on distinctions that matter for your domain

Examples:

# Generic (less helpful)
delve = Delve(use_case="Categorize documents")

# Specific (better results)
delve = Delve(use_case="Categorize customer support tickets by issue type and urgency for routing to appropriate teams")

# Domain-specific (best results)
delve = Delve(use_case="Categorize e-commerce product reviews by: product quality issues, shipping problems, customer service interactions, and feature requests")

Be specific about:

What kind of documents you have
What distinctions matter to you
How you’ll use the categories

`predefined_taxonomy` - Skip Discovery

What it controls: Use an existing taxonomy instead of discovering one. When to use:

You already have categories you want to apply
You’re labeling new data with an established taxonomy
You want consistent categories across multiple runs

How it affects the pipeline:

Skips phases 3-6 (minibatch generation, taxonomy generation, update, review)
Goes directly to document labeling
Much faster for large datasets with known categories

# From a file
delve = Delve(predefined_taxonomy="categories.json")

# Inline definition
delve = Delve(predefined_taxonomy=[
    {"id": "1", "name": "Bug Report", "description": "Reports of software bugs or defects"},
    {"id": "2", "name": "Feature Request", "description": "Requests for new features or enhancements"},
    {"id": "3", "name": "Question", "description": "General questions about usage or functionality"},
])

Output Configuration

`output_formats` - Export Types

Available formats:

json - Machine-readable, good for integrations
csv - Spreadsheet-compatible, good for analysis
markdown - Human-readable reports, good for sharing

# All formats (default)
delve = Delve(output_formats=["json", "csv", "markdown"])

# Just what you need
delve = Delve(output_formats=["csv"])  # For spreadsheet analysis
delve = Delve(output_formats=["json"])  # For API integration

`verbosity` - Progress Output

Level	What You See	Best For
`SILENT`	Nothing	SDK integration, pipelines
`QUIET`	Errors only	Background jobs
`NORMAL`	Spinners, checkmarks	Interactive use
`VERBOSE`	Progress bars with ETA	Monitoring long runs
`DEBUG`	Everything + internal state	Troubleshooting

from delve import Delve, Verbosity

# Silent for scripts
delve = Delve(verbosity=Verbosity.SILENT)

# Visual feedback for interactive use
delve = Delve(verbosity=Verbosity.VERBOSE)

Recommended Configurations

Quick Exploration

delve = Delve(
    sample_size=50,
    batch_size=50,
    max_num_clusters=5,
    fast_llm="anthropic/claude-haiku-4-5-20251001",
    verbosity=Verbosity.NORMAL,
)

Balanced Production

delve = Delve(
    sample_size=150,
    batch_size=100,
    max_num_clusters=10,
    use_case="Your specific use case here",
    verbosity=Verbosity.VERBOSE,
)

High-Quality Analysis

delve = Delve(
    model="anthropic/claude-opus-4",
    sample_size=300,
    batch_size=75,
    max_num_clusters=15,
    classifier_confidence_threshold=0.7,
    use_case="Detailed domain-specific description",
    verbosity=Verbosity.VERBOSE,
)

Cost-Optimized at Scale

delve = Delve(
    sample_size=100,
    batch_size=200,
    max_num_clusters=8,
    fast_llm="anthropic/claude-haiku-4-5-20251001",
    embedding_model="text-embedding-3-small",
    classifier_confidence_threshold=0.0,
    verbosity=Verbosity.QUIET,
)

Classifier Export & Reuse

After a successful run, you can save the trained classifier for later use without any LLM costs.

Saving a Classifier

result = delve.run_sync("data.csv", text_column="text")
result.save_classifier("my_classifier.joblib")

The saved bundle includes:

The trained RandomForest model
Category mappings
Embedding model configuration
Training metrics for reference

Classifying New Documents

predictions = Delve.classify(
    "new_documents.csv",
    classifier_path="my_classifier.joblib",
    text_column="text",
)

for doc in predictions.documents:
    print(f"{doc.id}: {doc.category} ({doc.confidence:.0%})")

Classification only requires OpenAI embedding API calls - no LLM costs. This makes it very cost-effective for production use.

Training from Labeled Data

If you have your own labeled dataset (or corrected Delve output), train a classifier directly:

result = Delve.train_from_labeled(
    "labeled_data.csv",
    text_column="text",
    label_column="category",
)
print(f"Test F1: {result.metrics['test_f1']:.2%}")
result.save_classifier("production_classifier.joblib")

This is perfect for human-in-the-loop workflows: run Delve, review and correct labels, then train an improved classifier.

See the Classifier Export & Training guide for complete documentation.

Next Steps

How It Works

Understand the pipeline in depth

Examples

See complete code examples

Getting Started

Advanced Topics

CLI Usage

SDK Usage

Examples

Configuration Guide

Model Selection

`model` - Main LLM

`fast_llm` - Summarization & Labeling Model

Processing Parameters

`sample_size` - Taxonomy Discovery Sample

`batch_size` - Minibatch Size for Clustering

`max_num_clusters` - Category Limit

Classification Parameters

`embedding_model` - Classifier Embeddings

`classifier_confidence_threshold` - Uncertainty Handling

`low_confidence_action` - What to Do with Uncertain Predictions

`min_examples_per_category` - Sample Augmentation

`sampling_strategy` - Sampling Mode

Customization Parameters

`use_case` - Domain Context

`predefined_taxonomy` - Skip Discovery

Output Configuration

`output_formats` - Export Types

`verbosity` - Progress Output

Recommended Configurations

Quick Exploration

Balanced Production

High-Quality Analysis

Cost-Optimized at Scale

Classifier Export & Reuse

Saving a Classifier

Classifying New Documents

Training from Labeled Data

Next Steps

How It Works

Examples

Getting Started

Advanced Topics

CLI Usage

SDK Usage

Examples

Documentation Index

​Model Selection

​model - Main LLM

​fast_llm - Summarization & Labeling Model

​Processing Parameters

​sample_size - Taxonomy Discovery Sample

​batch_size - Minibatch Size for Clustering

​max_num_clusters - Category Limit

​Classification Parameters

​embedding_model - Classifier Embeddings

​classifier_confidence_threshold - Uncertainty Handling

​low_confidence_action - What to Do with Uncertain Predictions

​min_examples_per_category - Sample Augmentation

​sampling_strategy - Sampling Mode

​Customization Parameters

​use_case - Domain Context

​predefined_taxonomy - Skip Discovery

​Output Configuration

​output_formats - Export Types

​verbosity - Progress Output

​Recommended Configurations

​Quick Exploration

​Balanced Production

​High-Quality Analysis

​Cost-Optimized at Scale

​Classifier Export & Reuse

​Saving a Classifier

​Classifying New Documents

​Training from Labeled Data

​Next Steps

How It Works

Examples

Model Selection

`model` - Main LLM

`fast_llm` - Summarization & Labeling Model

Processing Parameters

`sample_size` - Taxonomy Discovery Sample

`batch_size` - Minibatch Size for Clustering

`max_num_clusters` - Category Limit

Classification Parameters

`embedding_model` - Classifier Embeddings

`classifier_confidence_threshold` - Uncertainty Handling

`low_confidence_action` - What to Do with Uncertain Predictions

`min_examples_per_category` - Sample Augmentation

`sampling_strategy` - Sampling Mode

Customization Parameters

`use_case` - Domain Context

`predefined_taxonomy` - Skip Discovery

Output Configuration

`output_formats` - Export Types

`verbosity` - Progress Output

Recommended Configurations

Quick Exploration

Balanced Production

High-Quality Analysis

Cost-Optimized at Scale

Classifier Export & Reuse

Saving a Classifier

Classifying New Documents

Training from Labeled Data

Next Steps