FSSCoding 10862583ad Add batch LLM classifier tool with prompt caching optimization

- Created standalone batch_llm_classifier.py for custom email queries
- Optimized all LLM prompts for caching (static instructions first, variables last)
- Configured rtx3090 vLLM endpoint (qwen3-coder-30b)
- Tested batch_size=4 optimal (100% success, 4.65 req/sec)
- Added comprehensive documentation (tools/README.md, BATCH_LLM_QUICKSTART.md)

Tool is completely separate from main ML pipeline - no interference.
Prerequisite: vLLM server must be running at rtx3090.bobai.com.au

2025-11-14 16:01:57 +11:00

2.8 KiB

Raw Permalink Blame History

Batch LLM Classifier - Quick Start

Prerequisite Check

python tools/batch_llm_classifier.py check

Expected: ✓ vLLM server is running and ready

If not running: Start vLLM server at rtx3090.bobai.com.au first

Basic Usage

python tools/batch_llm_classifier.py ask \
  --source enron \
  --limit 50 \
  --question "YOUR QUESTION HERE" \
  --output results.txt

Example Questions

Find Urgent Emails

--question "Is this email urgent or time-sensitive? Answer yes/no and explain."

Extract Financial Data

--question "List any dollar amounts, budgets, or financial numbers in this email."

Meeting Detection

--question "Does this email mention a meeting? If yes, extract date/time/location."

Sentiment Analysis

--question "What is the tone? Professional/Casual/Urgent/Frustrated? Explain."

Custom Classification

--question "Should this email be archived or kept active? Why?"

Performance

Throughput: 4.65 requests/sec
Batch size: 4 (proper batch pooling)
Reliability: 100% success rate
Example: 500 requests in 108 seconds

When To Use

✅ Use Batch LLM for:

Custom questions on 50-500 emails
One-off exploratory analysis
Flexible classification criteria
Data extraction tasks

❌ Use RAG instead for:

Searching 10k+ email corpus
Semantic topic search
Multi-document reasoning

❌ Use Main ML Pipeline for:

Regular ongoing classification
High-volume processing (10k+ emails)
Consistent categories
Maximum speed

Quick Test

# Check server
python tools/batch_llm_classifier.py check

# Process 10 emails
python tools/batch_llm_classifier.py ask \
  --source enron \
  --limit 10 \
  --question "Summarize this email in one sentence." \
  --output test.txt

# Check results
cat test.txt

Files Created

tools/batch_llm_classifier.py - Main tool (executable)
tools/README.md - Full documentation
test_llm_concurrent.py - Performance testing script (root)

No files in src/ were modified - existing ML pipeline untouched

Configuration

Edit VLLM_CONFIG in batch_llm_classifier.py:

VLLM_CONFIG = {
    'base_url': 'https://rtx3090.bobai.com.au/v1',
    'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
    'model': 'qwen3-coder-30b',
    'batch_size': 4,  # Don't increase - causes 503 errors
}

Troubleshooting

Server not available:

curl https://rtx3090.bobai.com.au/v1/models -H "Authorization: Bearer rtx3090_..."

503 errors: Lower batch_size to 2 in config (currently optimal is 4)

Slow processing: Check vLLM server load - may be handling other requests

Done! Ready to ask custom questions across email batches.

2.8 KiB Raw Permalink Blame History