- Created standalone batch_llm_classifier.py for custom email queries - Optimized all LLM prompts for caching (static instructions first, variables last) - Configured rtx3090 vLLM endpoint (qwen3-coder-30b) - Tested batch_size=4 optimal (100% success, 4.65 req/sec) - Added comprehensive documentation (tools/README.md, BATCH_LLM_QUICKSTART.md) Tool is completely separate from main ML pipeline - no interference. Prerequisite: vLLM server must be running at rtx3090.bobai.com.au
Email Sorter - Supplementary Tools
This directory contains optional standalone tools that complement the main ML classification pipeline without interfering with it.
Tools
batch_llm_classifier.py
Purpose: Ask custom questions across batches of emails using vLLM server
Prerequisite: vLLM server must be running at configured endpoint
When to use this:
- One-off batch analysis with custom questions
- Exploratory queries ("find all emails mentioning budget cuts")
- Custom classification criteria not in trained ML model
- Quick ad-hoc analysis without retraining
When to use RAG instead:
- Searching across large email corpus (10k+ emails)
- Finding specific topics/keywords with semantic search
- Building knowledge base from email content
- Multi-step reasoning across many documents
When to use main ML pipeline:
- Regular ongoing classification of incoming emails
- High-volume processing (100k+ emails)
- Consistent categories that don't change
- Maximum speed (pure ML with no LLM calls)
batch_llm_classifier.py Usage
Check vLLM Server Status
python tools/batch_llm_classifier.py check
Expected output:
✓ vLLM server is running and ready
✓ Max concurrent requests: 4
✓ Estimated throughput: ~4.4 emails/sec
Ask Custom Question
python tools/batch_llm_classifier.py ask \
--source enron \
--limit 100 \
--question "Does this email contain any financial numbers or budget information?" \
--output financial_emails.txt
Parameters:
--source: Email provider (gmail, enron)--credentials: Path to credentials (for Gmail)--limit: Number of emails to process--question: Custom question to ask about each email--output: Output file for results
Example Questions
Finding specific content:
--question "Is this email about a meeting or calendar event? Answer yes/no and provide date if found."
Sentiment analysis:
--question "What is the tone of this email? Professional/Casual/Urgent/Friendly?"
Categorization with custom criteria:
--question "Should this email be archived or kept for reference? Explain why."
Data extraction:
--question "Extract all names, dates, and dollar amounts mentioned in this email."
Configuration
vLLM server settings are in batch_llm_classifier.py:
VLLM_CONFIG = {
'base_url': 'https://rtx3090.bobai.com.au/v1',
'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
'model': 'qwen3-coder-30b',
'batch_size': 4, # Tested optimal - 100% success rate
'temperature': 0.1,
'max_tokens': 500
}
Note: batch_size: 4 is the tested optimal setting. Uses proper batch pooling (send 4, wait for completion, send next 4). Higher values cause 503 errors.
Performance Benchmarks
Tested on rtx3090.bobai.com.au with qwen3-coder-30b:
| Emails | Batch Size | Time | Throughput | Success Rate |
|---|---|---|---|---|
| 500 | 4 (pooled) | 108s | 4.65/sec | 100% |
| 500 | 8 (pooled) | 62s | 8.10/sec | 60% |
| 500 | 20 (pooled) | 23s | 21.8/sec | 23% |
Conclusion: batch_size=4 with proper batch pooling is optimal (100% reliability, ~4.7 req/sec)
Architecture Notes
Prompt Caching Optimization
Prompts are structured with static content first, variable content last:
STATIC (cached):
- System instructions
- Question
- Output format guidelines
VARIABLE (not cached):
- Email subject
- Email sender
- Email body
This allows vLLM to cache the static portion across all emails in the batch.
Separation from Main Pipeline
This tool is completely independent from the main classification pipeline:
-
Main pipeline (
src/cli.py run):- Uses calibrated LightGBM model
- Fast pure ML classification
- Optional LLM fallback for low-confidence cases
- Processes 10k emails in ~24s (pure ML) or ~5min (with LLM fallback)
-
Batch LLM tool (
tools/batch_llm_classifier.py):- Uses vLLM server exclusively
- Custom questions per run
- ~4.4 emails/sec throughput
- For ad-hoc analysis, not production classification
No Interference Guarantee
The batch LLM tool:
- ✓ Does NOT modify any files in
src/ - ✓ Does NOT touch trained models in
src/models/ - ✓ Does NOT affect config files
- ✓ Does NOT interfere with existing workflows
- ✓ Uses separate vLLM endpoint (not Ollama)
Comparison: Batch LLM vs RAG
| Feature | Batch LLM (this tool) | RAG (rag-search) |
|---|---|---|
| Speed | 4.4 emails/sec | Instant (pre-indexed) |
| Flexibility | Custom questions | Semantic search queries |
| Best for | 50-500 email batches | 10k+ email corpus |
| Prerequisite | vLLM server running | RAG collection indexed |
| Use case | "Does this mention X?" | "Find all emails about X" |
| Reasoning | Per-email LLM analysis | Similarity + ranking |
Rule of thumb:
- < 500 emails + custom question = Use Batch LLM
-
1000 emails + topic search = Use RAG
- Regular classification = Use main ML pipeline
Prerequisites
-
vLLM server must be running
- Endpoint: https://rtx3090.bobai.com.au/v1
- Model loaded: qwen3-coder-30b
- Check with:
python tools/batch_llm_classifier.py check
-
Python dependencies
pip install httpx click -
Email provider setup
- Enron: No setup needed (uses local maildir)
- Gmail: Requires credentials file
Troubleshooting
"vLLM server not available"
Check server status:
curl https://rtx3090.bobai.com.au/v1/models \
-H "Authorization: Bearer rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092"
Verify model is loaded:
python tools/batch_llm_classifier.py check
High error rate (503 errors)
Reduce concurrent requests in VLLM_CONFIG:
'max_concurrent': 2, # Lower if getting 503s
Slow processing
- Check vLLM server isn't overloaded
- Verify network latency to rtx3090.bobai.com.au
- Consider using main ML pipeline for large batches
Future Enhancements
Potential additions (not implemented):
- Support for custom prompt templates
- JSON output mode for structured extraction
- Progress bar for large batches
- Retry logic for transient failures
- Multi-server load balancing
- Streaming responses for real-time feedback
Remember: This tool is supplementary. For production email classification, use the main ML pipeline (src/cli.py run).