This guide explains how Delve’s configuration parameters affect taxonomy quality, cost, and performance. Understanding these tradeoffs helps you tune Delve for your specific use case.Documentation Index
Fetch the complete documentation index at: https://wildcampstudio.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Model Selection
model - Main LLM
The main model handles the “thinking” tasks: taxonomy generation, iterative refinement, and quality review.
| Model | Strengths | Best For |
|---|---|---|
anthropic/claude-sonnet-4-5-20250929 | Excellent balance of capability and cost | Default choice, most use cases |
anthropic/claude-opus-4 | Highest reasoning capability | Complex domains, nuanced categories |
anthropic/claude-haiku-4-5-20251001 | Fast and cheap | Quick iterations, simple data |
- More capable models → Better category definitions, more nuanced distinctions
- Faster models → Quicker iterations but potentially less refined taxonomies
fast_llm - Summarization & Labeling Model
The fast LLM handles high-volume tasks: document summarization and individual document labeling.
How it affects results:
- Summary quality impacts downstream taxonomy quality (garbage in, garbage out)
- Labeling accuracy directly affects your final results
- Cost scales with document count, so model choice matters more here
Claude Haiku is the recommended choice for most use cases. It’s significantly cheaper while maintaining good quality for summarization and labeling tasks.
Processing Parameters
sample_size - Taxonomy Discovery Sample
What it controls: How many documents are used to discover and validate the taxonomy.
How it affects results:
| Sample Size | Effect on Taxonomy | Effect on Labeling |
|---|---|---|
| Smaller (50-100) | May miss rare categories | Smaller training set for classifier |
| Medium (100-200) | Good coverage for typical data | Balanced accuracy |
| Larger (200-500) | Comprehensive coverage | Better classifier accuracy |
| Very large (500+) | Diminishing returns | Excellent but expensive |
- Each sampled document requires LLM summarization
- Each sampled document requires LLM labeling
- Larger samples = proportionally higher costs for taxonomy discovery
- Your data is highly diverse (many potential categories)
- Initial runs are missing important categories
- Classifier accuracy is below expectations
- Your data is homogeneous
- You’re iterating on taxonomy design
- Budget is constrained
batch_size - Minibatch Size for Clustering
What it controls: How many documents the LLM sees at once during taxonomy generation.
How it affects results:
| Batch Size | Iterations | Taxonomy Quality |
|---|---|---|
| Smaller (50-100) | More iterations | More refined, potentially more categories discovered |
| Medium (150-200) | Moderate | Balanced |
| Larger (200-300) | Fewer iterations | Faster but may miss nuances |
sample_size=200:
batch_size=50→ 4 iterations of refinementbatch_size=200→ 1 iteration (no refinement)
max_num_clusters - Category Limit
What it controls: The maximum number of categories the LLM will generate.
How it affects results:
| Max Clusters | Result |
|---|---|
| Small (3-5) | High-level, broad categories |
| Medium (5-10) | Balanced granularity |
| Large (10-20) | Fine-grained categories |
| Very large (20+) | Risk of overlapping or sparse categories |
- Consider how you’ll use the taxonomy
- More categories = more specific insights but harder to analyze
- Fewer categories = easier to understand but less detail
| Use Case | Recommended max_num_clusters |
|---|---|
| Quick overview | 3-5 |
| Support ticket triage | 5-10 |
| Detailed content analysis | 10-15 |
| Research categorization | 10-20 |
Classification Parameters
embedding_model - Classifier Embeddings
What it controls: Which OpenAI model generates embeddings for classifier training.
Available options:
text-embedding-3-large(default) - Highest quality, 3072 dimensionstext-embedding-3-small- Faster, cheaper, 1536 dimensionstext-embedding-ada-002- Legacy, 1536 dimensions
- Better embeddings → Better classifier accuracy
- Larger embeddings → Slightly slower training and inference
The difference in classifier accuracy between embedding models is usually small (1-3%). For most use cases, the default is fine.
classifier_confidence_threshold - Uncertainty Handling
What it controls: When to flag classifier predictions as uncertain.
How it works:
- Classifier outputs probability scores for each category
- If the top probability is below the threshold, the prediction is flagged as uncertain
- What happens to uncertain predictions is controlled by
low_confidence_action 0.0= Never flag (trust classifier for everything, default)0.7= Flag predictions below 70% confidence
low_confidence_action - What to Do with Uncertain Predictions
What it controls: How to handle documents where the classifier’s confidence is below the threshold.
Available options:
| Action | Behavior | Cost Impact |
|---|---|---|
"other" (default) | Label as “Other” category | None |
"llm" | Re-label with LLM (max 20 docs) | Low-Medium |
"keep" | Keep the classifier’s prediction | None |
- Only applies when
classifier_confidence_threshold > 0 "other": Honest about uncertainty - the classifier doesn’t know, so label as “Other”"llm": Re-label with LLM for better accuracy, but capped at 20 documents"keep": Accept the classifier’s best guess despite low confidence
min_examples_per_category - Sample Augmentation
What it controls: Minimum number of training examples required per category.
How it works:
- After initial LLM labeling, Delve checks category distribution
- For categories below the minimum, it finds similar documents using embedding search
- Those candidates are labeled by LLM and added to the training set
0= Disabled (default)5= Ensure at least 5 training examples per category
- Your data has significant class imbalance
- Some categories are rare (< 1% of data)
- You’re using a predefined taxonomy with many categories
| Value | Effect | LLM Cost Increase |
|---|---|---|
| 0 (default) | No augmentation | None |
| 3 | Light balancing | +10-20% |
| 5 | Moderate balancing | +20-40% |
| 10 | Heavy balancing | +50-100% |
See the Handling Class Imbalance guide for a detailed explanation of when and how to use this parameter.
sampling_strategy - Sampling Mode
What it controls: How documents are selected for the initial sample.
Available options:
random(default) - Simple random samplingstratified- Reserved for future use
Customization Parameters
use_case - Domain Context
What it controls: Provides context to the LLM about your specific use case.
How it affects results:
- More specific use cases → More relevant category names and descriptions
- Guides the LLM to focus on distinctions that matter for your domain
predefined_taxonomy - Skip Discovery
What it controls: Use an existing taxonomy instead of discovering one.
When to use:
- You already have categories you want to apply
- You’re labeling new data with an established taxonomy
- You want consistent categories across multiple runs
- Skips phases 3-6 (minibatch generation, taxonomy generation, update, review)
- Goes directly to document labeling
- Much faster for large datasets with known categories
Output Configuration
output_formats - Export Types
Available formats:
json- Machine-readable, good for integrationscsv- Spreadsheet-compatible, good for analysismarkdown- Human-readable reports, good for sharing
verbosity - Progress Output
| Level | What You See | Best For |
|---|---|---|
SILENT | Nothing | SDK integration, pipelines |
QUIET | Errors only | Background jobs |
NORMAL | Spinners, checkmarks | Interactive use |
VERBOSE | Progress bars with ETA | Monitoring long runs |
DEBUG | Everything + internal state | Troubleshooting |
Recommended Configurations
Quick Exploration
Balanced Production
High-Quality Analysis
Cost-Optimized at Scale
Classifier Export & Reuse
After a successful run, you can save the trained classifier for later use without any LLM costs.Saving a Classifier
- The trained RandomForest model
- Category mappings
- Embedding model configuration
- Training metrics for reference
Classifying New Documents
Classification only requires OpenAI embedding API calls - no LLM costs. This makes it very cost-effective for production use.
Training from Labeled Data
If you have your own labeled dataset (or corrected Delve output), train a classifier directly:Next Steps
How It Works
Understand the pipeline in depth
Examples
See complete code examples
