Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wildcampstudio.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This guide explains how Delve’s configuration parameters affect taxonomy quality, cost, and performance. Understanding these tradeoffs helps you tune Delve for your specific use case.

Model Selection

model - Main LLM

The main model handles the “thinking” tasks: taxonomy generation, iterative refinement, and quality review.
ModelStrengthsBest For
anthropic/claude-sonnet-4-5-20250929Excellent balance of capability and costDefault choice, most use cases
anthropic/claude-opus-4Highest reasoning capabilityComplex domains, nuanced categories
anthropic/claude-haiku-4-5-20251001Fast and cheapQuick iterations, simple data
How it affects results:
  • More capable models → Better category definitions, more nuanced distinctions
  • Faster models → Quicker iterations but potentially less refined taxonomies
# High-quality taxonomy for complex data
delve = Delve(model="anthropic/claude-opus-4")

# Quick iteration during development
delve = Delve(model="anthropic/claude-haiku-4-5-20251001")
Start with Claude Sonnet (default) and only upgrade to Opus if you need more nuanced category distinctions. The quality difference is often subtle for straightforward categorization tasks.

fast_llm - Summarization & Labeling Model

The fast LLM handles high-volume tasks: document summarization and individual document labeling. How it affects results:
  • Summary quality impacts downstream taxonomy quality (garbage in, garbage out)
  • Labeling accuracy directly affects your final results
  • Cost scales with document count, so model choice matters more here
# Default: fast and cost-effective
delve = Delve(fast_llm="anthropic/claude-haiku-4-5-20251001")

# Higher quality labeling at higher cost
delve = Delve(fast_llm="anthropic/claude-sonnet-4-5-20250929")
Claude Haiku is the recommended choice for most use cases. It’s significantly cheaper while maintaining good quality for summarization and labeling tasks.

Processing Parameters

sample_size - Taxonomy Discovery Sample

What it controls: How many documents are used to discover and validate the taxonomy. How it affects results:
Sample SizeEffect on TaxonomyEffect on Labeling
Smaller (50-100)May miss rare categoriesSmaller training set for classifier
Medium (100-200)Good coverage for typical dataBalanced accuracy
Larger (200-500)Comprehensive coverageBetter classifier accuracy
Very large (500+)Diminishing returnsExcellent but expensive
Cost implications:
  • Each sampled document requires LLM summarization
  • Each sampled document requires LLM labeling
  • Larger samples = proportionally higher costs for taxonomy discovery
# Quick exploration
delve = Delve(sample_size=50)

# Production use
delve = Delve(sample_size=150)

# Comprehensive analysis
delve = Delve(sample_size=300)
Setting sample_size=0 means ALL documents are labeled by the LLM. This is expensive for large datasets but guarantees every document gets LLM-quality labeling.
When to increase sample size:
  • Your data is highly diverse (many potential categories)
  • Initial runs are missing important categories
  • Classifier accuracy is below expectations
When to decrease sample size:
  • Your data is homogeneous
  • You’re iterating on taxonomy design
  • Budget is constrained

batch_size - Minibatch Size for Clustering

What it controls: How many documents the LLM sees at once during taxonomy generation. How it affects results:
Batch SizeIterationsTaxonomy Quality
Smaller (50-100)More iterationsMore refined, potentially more categories discovered
Medium (150-200)ModerateBalanced
Larger (200-300)Fewer iterationsFaster but may miss nuances
Example: With sample_size=200:
  • batch_size=50 → 4 iterations of refinement
  • batch_size=200 → 1 iteration (no refinement)
# More refined taxonomy (more iterations)
delve = Delve(sample_size=200, batch_size=50)

# Faster processing
delve = Delve(sample_size=200, batch_size=200)
If your taxonomy seems to be missing categories, try reducing batch_size to allow more refinement iterations.

max_num_clusters - Category Limit

What it controls: The maximum number of categories the LLM will generate. How it affects results:
Max ClustersResult
Small (3-5)High-level, broad categories
Medium (5-10)Balanced granularity
Large (10-20)Fine-grained categories
Very large (20+)Risk of overlapping or sparse categories
Choosing the right value:
  • Consider how you’ll use the taxonomy
  • More categories = more specific insights but harder to analyze
  • Fewer categories = easier to understand but less detail
# Executive summary level
delve = Delve(max_num_clusters=5)

# Detailed analysis
delve = Delve(max_num_clusters=15)
Setting this too high can lead to overlapping categories or categories with very few documents. The LLM may also create artificial distinctions to fill the quota.
Recommendations by use case:
Use CaseRecommended max_num_clusters
Quick overview3-5
Support ticket triage5-10
Detailed content analysis10-15
Research categorization10-20

Classification Parameters

embedding_model - Classifier Embeddings

What it controls: Which OpenAI model generates embeddings for classifier training. Available options:
  • text-embedding-3-large (default) - Highest quality, 3072 dimensions
  • text-embedding-3-small - Faster, cheaper, 1536 dimensions
  • text-embedding-ada-002 - Legacy, 1536 dimensions
How it affects results:
  • Better embeddings → Better classifier accuracy
  • Larger embeddings → Slightly slower training and inference
# Best accuracy (default)
delve = Delve(embedding_model="text-embedding-3-large")

# Cost-effective alternative
delve = Delve(embedding_model="text-embedding-3-small")
The difference in classifier accuracy between embedding models is usually small (1-3%). For most use cases, the default is fine.

classifier_confidence_threshold - Uncertainty Handling

What it controls: When to flag classifier predictions as uncertain. How it works:
  • Classifier outputs probability scores for each category
  • If the top probability is below the threshold, the prediction is flagged as uncertain
  • What happens to uncertain predictions is controlled by low_confidence_action
  • 0.0 = Never flag (trust classifier for everything, default)
  • 0.7 = Flag predictions below 70% confidence
# Trust classifier completely (default)
delve = Delve(classifier_confidence_threshold=0.0)

# Flag uncertain predictions (below 70% confidence)
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="other",  # Label as "Other"
)
Start with the default (0.0). If you notice miscategorized documents, try increasing the threshold to 0.6-0.8 and using low_confidence_action="other" to be honest about uncertainty.

low_confidence_action - What to Do with Uncertain Predictions

What it controls: How to handle documents where the classifier’s confidence is below the threshold. Available options:
ActionBehaviorCost Impact
"other" (default)Label as “Other” categoryNone
"llm"Re-label with LLM (max 20 docs)Low-Medium
"keep"Keep the classifier’s predictionNone
How it works:
  • Only applies when classifier_confidence_threshold > 0
  • "other": Honest about uncertainty - the classifier doesn’t know, so label as “Other”
  • "llm": Re-label with LLM for better accuracy, but capped at 20 documents
  • "keep": Accept the classifier’s best guess despite low confidence
# Label uncertain predictions as "Other" (recommended for large datasets)
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="other",
)

# Re-label with LLM (small datasets only, max 20 docs)
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="llm",
)

# Keep classifier predictions regardless of confidence
delve = Delve(
    classifier_confidence_threshold=0.7,
    low_confidence_action="keep",
)
Safeguard for "llm" action: If more than 20 documents need re-labeling, Delve automatically falls back to "other" and logs a warning. This prevents unexpected costs on imbalanced datasets.
Don’t use this as a replacement for “Other” in your taxonomy. Low classifier confidence usually means uncertainty between valid categories, not that the document doesn’t fit any category. In testing, inferring “Other” from confidence achieved only 45% accuracy vs 89% when “Other” was included in the taxonomy. See the Class Imbalance guide for details.

min_examples_per_category - Sample Augmentation

What it controls: Minimum number of training examples required per category. How it works:
  • After initial LLM labeling, Delve checks category distribution
  • For categories below the minimum, it finds similar documents using embedding search
  • Those candidates are labeled by LLM and added to the training set
  • 0 = Disabled (default)
  • 5 = Ensure at least 5 training examples per category
When to use:
  • Your data has significant class imbalance
  • Some categories are rare (< 1% of data)
  • You’re using a predefined taxonomy with many categories
Tradeoffs:
ValueEffectLLM Cost Increase
0 (default)No augmentationNone
3Light balancing+10-20%
5Moderate balancing+20-40%
10Heavy balancing+50-100%
# Disabled (default)
delve = Delve(min_examples_per_category=0)

# Ensure minimum coverage for imbalanced data
delve = Delve(min_examples_per_category=5)

# Aggressive balancing for highly skewed data
delve = Delve(min_examples_per_category=10)
See the Handling Class Imbalance guide for a detailed explanation of when and how to use this parameter.

sampling_strategy - Sampling Mode

What it controls: How documents are selected for the initial sample. Available options:
  • random (default) - Simple random sampling
  • stratified - Reserved for future use
# Random sampling (default)
delve = Delve(sampling_strategy="random")
For handling imbalanced data, use min_examples_per_category rather than changing the sampling strategy. The sample augmentation approach is more effective because it uses embedding similarity to find good candidates.

Customization Parameters

use_case - Domain Context

What it controls: Provides context to the LLM about your specific use case. How it affects results:
  • More specific use cases → More relevant category names and descriptions
  • Guides the LLM to focus on distinctions that matter for your domain
Examples:
# Generic (less helpful)
delve = Delve(use_case="Categorize documents")

# Specific (better results)
delve = Delve(use_case="Categorize customer support tickets by issue type and urgency for routing to appropriate teams")

# Domain-specific (best results)
delve = Delve(use_case="Categorize e-commerce product reviews by: product quality issues, shipping problems, customer service interactions, and feature requests")
Be specific about:
  • What kind of documents you have
  • What distinctions matter to you
  • How you’ll use the categories

predefined_taxonomy - Skip Discovery

What it controls: Use an existing taxonomy instead of discovering one. When to use:
  • You already have categories you want to apply
  • You’re labeling new data with an established taxonomy
  • You want consistent categories across multiple runs
How it affects the pipeline:
  • Skips phases 3-6 (minibatch generation, taxonomy generation, update, review)
  • Goes directly to document labeling
  • Much faster for large datasets with known categories
# From a file
delve = Delve(predefined_taxonomy="categories.json")

# Inline definition
delve = Delve(predefined_taxonomy=[
    {"id": "1", "name": "Bug Report", "description": "Reports of software bugs or defects"},
    {"id": "2", "name": "Feature Request", "description": "Requests for new features or enhancements"},
    {"id": "3", "name": "Question", "description": "General questions about usage or functionality"},
])

Output Configuration

output_formats - Export Types

Available formats:
  • json - Machine-readable, good for integrations
  • csv - Spreadsheet-compatible, good for analysis
  • markdown - Human-readable reports, good for sharing
# All formats (default)
delve = Delve(output_formats=["json", "csv", "markdown"])

# Just what you need
delve = Delve(output_formats=["csv"])  # For spreadsheet analysis
delve = Delve(output_formats=["json"])  # For API integration

verbosity - Progress Output

LevelWhat You SeeBest For
SILENTNothingSDK integration, pipelines
QUIETErrors onlyBackground jobs
NORMALSpinners, checkmarksInteractive use
VERBOSEProgress bars with ETAMonitoring long runs
DEBUGEverything + internal stateTroubleshooting
from delve import Delve, Verbosity

# Silent for scripts
delve = Delve(verbosity=Verbosity.SILENT)

# Visual feedback for interactive use
delve = Delve(verbosity=Verbosity.VERBOSE)

Quick Exploration

delve = Delve(
    sample_size=50,
    batch_size=50,
    max_num_clusters=5,
    fast_llm="anthropic/claude-haiku-4-5-20251001",
    verbosity=Verbosity.NORMAL,
)

Balanced Production

delve = Delve(
    sample_size=150,
    batch_size=100,
    max_num_clusters=10,
    use_case="Your specific use case here",
    verbosity=Verbosity.VERBOSE,
)

High-Quality Analysis

delve = Delve(
    model="anthropic/claude-opus-4",
    sample_size=300,
    batch_size=75,
    max_num_clusters=15,
    classifier_confidence_threshold=0.7,
    use_case="Detailed domain-specific description",
    verbosity=Verbosity.VERBOSE,
)

Cost-Optimized at Scale

delve = Delve(
    sample_size=100,
    batch_size=200,
    max_num_clusters=8,
    fast_llm="anthropic/claude-haiku-4-5-20251001",
    embedding_model="text-embedding-3-small",
    classifier_confidence_threshold=0.0,
    verbosity=Verbosity.QUIET,
)

Classifier Export & Reuse

After a successful run, you can save the trained classifier for later use without any LLM costs.

Saving a Classifier

result = delve.run_sync("data.csv", text_column="text")
result.save_classifier("my_classifier.joblib")
The saved bundle includes:
  • The trained RandomForest model
  • Category mappings
  • Embedding model configuration
  • Training metrics for reference

Classifying New Documents

predictions = Delve.classify(
    "new_documents.csv",
    classifier_path="my_classifier.joblib",
    text_column="text",
)

for doc in predictions.documents:
    print(f"{doc.id}: {doc.category} ({doc.confidence:.0%})")
Classification only requires OpenAI embedding API calls - no LLM costs. This makes it very cost-effective for production use.

Training from Labeled Data

If you have your own labeled dataset (or corrected Delve output), train a classifier directly:
result = Delve.train_from_labeled(
    "labeled_data.csv",
    text_column="text",
    label_column="category",
)
print(f"Test F1: {result.metrics['test_f1']:.2%}")
result.save_classifier("production_classifier.joblib")
This is perfect for human-in-the-loop workflows: run Delve, review and correct labels, then train an improved classifier.
See the Classifier Export & Training guide for complete documentation.

Next Steps

How It Works

Understand the pipeline in depth

Examples

See complete code examples