- Created standalone batch_llm_classifier.py for custom email queries - Optimized all LLM prompts for caching (static instructions first, variables last) - Configured rtx3090 vLLM endpoint (qwen3-coder-30b) - Tested batch_size=4 optimal (100% success, 4.65 req/sec) - Added comprehensive documentation (tools/README.md, BATCH_LLM_QUICKSTART.md) Tool is completely separate from main ML pipeline - no interference. Prerequisite: vLLM server must be running at rtx3090.bobai.com.au
2.8 KiB
2.8 KiB
Batch LLM Classifier - Quick Start
Prerequisite Check
python tools/batch_llm_classifier.py check
Expected: ✓ vLLM server is running and ready
If not running: Start vLLM server at rtx3090.bobai.com.au first
Basic Usage
python tools/batch_llm_classifier.py ask \
--source enron \
--limit 50 \
--question "YOUR QUESTION HERE" \
--output results.txt
Example Questions
Find Urgent Emails
--question "Is this email urgent or time-sensitive? Answer yes/no and explain."
Extract Financial Data
--question "List any dollar amounts, budgets, or financial numbers in this email."
Meeting Detection
--question "Does this email mention a meeting? If yes, extract date/time/location."
Sentiment Analysis
--question "What is the tone? Professional/Casual/Urgent/Frustrated? Explain."
Custom Classification
--question "Should this email be archived or kept active? Why?"
Performance
- Throughput: 4.65 requests/sec
- Batch size: 4 (proper batch pooling)
- Reliability: 100% success rate
- Example: 500 requests in 108 seconds
When To Use
✅ Use Batch LLM for:
- Custom questions on 50-500 emails
- One-off exploratory analysis
- Flexible classification criteria
- Data extraction tasks
❌ Use RAG instead for:
- Searching 10k+ email corpus
- Semantic topic search
- Multi-document reasoning
❌ Use Main ML Pipeline for:
- Regular ongoing classification
- High-volume processing (10k+ emails)
- Consistent categories
- Maximum speed
Quick Test
# Check server
python tools/batch_llm_classifier.py check
# Process 10 emails
python tools/batch_llm_classifier.py ask \
--source enron \
--limit 10 \
--question "Summarize this email in one sentence." \
--output test.txt
# Check results
cat test.txt
Files Created
tools/batch_llm_classifier.py- Main tool (executable)tools/README.md- Full documentationtest_llm_concurrent.py- Performance testing script (root)
No files in src/ were modified - existing ML pipeline untouched
Configuration
Edit VLLM_CONFIG in batch_llm_classifier.py:
VLLM_CONFIG = {
'base_url': 'https://rtx3090.bobai.com.au/v1',
'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
'model': 'qwen3-coder-30b',
'batch_size': 4, # Don't increase - causes 503 errors
}
Troubleshooting
Server not available:
curl https://rtx3090.bobai.com.au/v1/models -H "Authorization: Bearer rtx3090_..."
503 errors:
Lower batch_size to 2 in config (currently optimal is 4)
Slow processing: Check vLLM server load - may be handling other requests
Done! Ready to ask custom questions across email batches.