Documentation Index
Fetch the complete documentation index at: https://wildcampstudio.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Installation
Delve Client
The main class for interacting with Delve programmatically.Basic Usage
Initialization
Configuration Options
Main LLM model for taxonomy generation and reasoning.Supported models:
anthropic/claude-sonnet-4-5-20250929(recommended)anthropic/claude-opus-4(most capable)anthropic/claude-haiku-4-5-20251001(fastest/cheapest)- Any model supported by LiteLLM
Faster model for document summarization to reduce costs.Use a faster, cheaper model for the summarization step.
Number of documents to sample for taxonomy generation.
Number of documents per minibatch during iterative clustering.
Smaller batches (50-100) produce more refined taxonomies. Larger batches (200-300) are faster but may be less precise.
Maximum number of clusters/categories to generate in the taxonomy.
Custom description of your taxonomy use case.
Directory for saving output files.Creates the directory if it doesn’t exist.
List of output formats to generate.Available formats:
json- Machine-readable taxonomy and labeled documentscsv- Spreadsheet format for analysismarkdown- Human-readable reports
Output verbosity level. Controls how much progress information is displayed.Levels:
SILENT(0) - No output, ideal for SDK usage in scriptsQUIET(1) - Errors onlyNORMAL(2) - Spinners and success checkmarksVERBOSE(3) - Progress bars with item counts and ETADEBUG(4) - All output plus warnings and debug info
Use an existing taxonomy instead of generating one. Useful when you want to label documents with known categories.When provided, Delve skips taxonomy discovery and directly labels documents using the given categories.
OpenAI embedding model for classifier training. Used when
sample_size < total documents to train an efficient classifier for labeling remaining documents.Minimum confidence for classifier predictions. Documents below this threshold fall back to LLM labeling. Set to 0 to use classifier for all documents (no fallback).
Methods
run_sync()
Synchronous method for taxonomy generation (recommended for most use cases).Data source to process. Can be:
- Path to CSV file (
"data.csv") - Path to JSON/JSONL file (
"data.json") - LangSmith URI (
"langsmith://project-name") - pandas DataFrame
Column/field name containing text content (required for CSV/DataFrame).
Column/field name for document IDs (optional).
Force specific adapter type:
csv, json, jsonl, langsmith, dataframeAdditional adapter-specific parameters:For JSON:
json_path- JSONPath expression for nested datatext_field- Field name containing text
api_key- LangSmith API keydays- Days to look back (default: 7)max_runs- Maximum runs to fetchfilter_expr- LangSmith filter expression
DelveResult object with taxonomy, labeled documents, and metadata.
Example:
run()
Asynchronous version ofrun_sync(). Use for async applications.
run_with_docs() / run_with_docs_sync()
Process pre-createdDoc objects directly, useful for programmatic document creation or testing.
find_matches() / find_matches_async()
Fast, lightweight binary detection for finding documents matching a single category. Uses hybrid semantic + keyword matching without running the full taxonomy pipeline.Category definition with
name, description, and optional keywords list.Minimum score (0-1) for a document to be considered a match.
Weight for semantic (embedding) similarity.
Weight for keyword matching. Set to 0 for pure semantic matching.
MatchResult with all documents scored, plus matched_documents and unmatched_documents properties.
Binary detection is much faster (2-4 min for 30K docs) and cheaper ($1-2) than full taxonomy generation. See Binary Detection for full documentation.
Data Sources
Delve supports multiple input formats. Thesource_type is auto-detected from file extensions, or you can specify it explicitly.
| Format | Extension | Required Parameters |
|---|---|---|
| CSV | .csv | text_column |
| JSON | .json | text_field or json_path |
| JSONL | .jsonl | (auto-extracts text) |
| DataFrame | (in-memory) | text_column |
| LangSmith | langsmith:// URI | api_key |
Working with Results
TheDelveResult object provides access to all outputs:
TaxonomyCategory
| Attribute | Type | Description |
|---|---|---|
id | str | Unique category identifier |
name | str | Category name |
description | str | Category description |
Doc (labeled document)
| Attribute | Type | Description |
|---|---|---|
id | str | Document identifier |
content | str | Original text content |
category | str | Assigned category name |
explanation | str | None | Why this category was assigned |
summary | str | None | LLM-generated summary |
Metadata
Theresult.metadata dictionary contains comprehensive run statistics:
The
classifier_metrics key is only present when sample_size < total documents, meaning a classifier was trained to label the remaining documents.Error Handling
Environment Variables
Set these before running your code:The OpenAI API key is required for generating embeddings when training the classifier. If you set
sample_size=0, all documents are labeled by the LLM and no OpenAI key is needed.Next Steps
Examples
See working code examples
CLI Reference
Learn CLI commands
