FSSCoding 10862583ad Add batch LLM classifier tool with prompt caching optimization
- Created standalone batch_llm_classifier.py for custom email queries
- Optimized all LLM prompts for caching (static instructions first, variables last)
- Configured rtx3090 vLLM endpoint (qwen3-coder-30b)
- Tested batch_size=4 optimal (100% success, 4.65 req/sec)
- Added comprehensive documentation (tools/README.md, BATCH_LLM_QUICKSTART.md)

Tool is completely separate from main ML pipeline - no interference.
Prerequisite: vLLM server must be running at rtx3090.bobai.com.au
2025-11-14 16:01:57 +11:00
..

Email Sorter - Supplementary Tools

This directory contains optional standalone tools that complement the main ML classification pipeline without interfering with it.

Tools

batch_llm_classifier.py

Purpose: Ask custom questions across batches of emails using vLLM server

Prerequisite: vLLM server must be running at configured endpoint

When to use this:

  • One-off batch analysis with custom questions
  • Exploratory queries ("find all emails mentioning budget cuts")
  • Custom classification criteria not in trained ML model
  • Quick ad-hoc analysis without retraining

When to use RAG instead:

  • Searching across large email corpus (10k+ emails)
  • Finding specific topics/keywords with semantic search
  • Building knowledge base from email content
  • Multi-step reasoning across many documents

When to use main ML pipeline:

  • Regular ongoing classification of incoming emails
  • High-volume processing (100k+ emails)
  • Consistent categories that don't change
  • Maximum speed (pure ML with no LLM calls)

batch_llm_classifier.py Usage

Check vLLM Server Status

python tools/batch_llm_classifier.py check

Expected output:

✓ vLLM server is running and ready
✓ Max concurrent requests: 4
✓ Estimated throughput: ~4.4 emails/sec

Ask Custom Question

python tools/batch_llm_classifier.py ask \
  --source enron \
  --limit 100 \
  --question "Does this email contain any financial numbers or budget information?" \
  --output financial_emails.txt

Parameters:

  • --source: Email provider (gmail, enron)
  • --credentials: Path to credentials (for Gmail)
  • --limit: Number of emails to process
  • --question: Custom question to ask about each email
  • --output: Output file for results

Example Questions

Finding specific content:

--question "Is this email about a meeting or calendar event? Answer yes/no and provide date if found."

Sentiment analysis:

--question "What is the tone of this email? Professional/Casual/Urgent/Friendly?"

Categorization with custom criteria:

--question "Should this email be archived or kept for reference? Explain why."

Data extraction:

--question "Extract all names, dates, and dollar amounts mentioned in this email."

Configuration

vLLM server settings are in batch_llm_classifier.py:

VLLM_CONFIG = {
    'base_url': 'https://rtx3090.bobai.com.au/v1',
    'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
    'model': 'qwen3-coder-30b',
    'batch_size': 4,  # Tested optimal - 100% success rate
    'temperature': 0.1,
    'max_tokens': 500
}

Note: batch_size: 4 is the tested optimal setting. Uses proper batch pooling (send 4, wait for completion, send next 4). Higher values cause 503 errors.


Performance Benchmarks

Tested on rtx3090.bobai.com.au with qwen3-coder-30b:

Emails Batch Size Time Throughput Success Rate
500 4 (pooled) 108s 4.65/sec 100%
500 8 (pooled) 62s 8.10/sec 60%
500 20 (pooled) 23s 21.8/sec 23%

Conclusion: batch_size=4 with proper batch pooling is optimal (100% reliability, ~4.7 req/sec)


Architecture Notes

Prompt Caching Optimization

Prompts are structured with static content first, variable content last:

STATIC (cached):
  - System instructions
  - Question
  - Output format guidelines

VARIABLE (not cached):
  - Email subject
  - Email sender
  - Email body

This allows vLLM to cache the static portion across all emails in the batch.

Separation from Main Pipeline

This tool is completely independent from the main classification pipeline:

  • Main pipeline (src/cli.py run):

    • Uses calibrated LightGBM model
    • Fast pure ML classification
    • Optional LLM fallback for low-confidence cases
    • Processes 10k emails in ~24s (pure ML) or ~5min (with LLM fallback)
  • Batch LLM tool (tools/batch_llm_classifier.py):

    • Uses vLLM server exclusively
    • Custom questions per run
    • ~4.4 emails/sec throughput
    • For ad-hoc analysis, not production classification

No Interference Guarantee

The batch LLM tool:

  • ✓ Does NOT modify any files in src/
  • ✓ Does NOT touch trained models in src/models/
  • ✓ Does NOT affect config files
  • ✓ Does NOT interfere with existing workflows
  • ✓ Uses separate vLLM endpoint (not Ollama)

Comparison: Batch LLM vs RAG

Feature Batch LLM (this tool) RAG (rag-search)
Speed 4.4 emails/sec Instant (pre-indexed)
Flexibility Custom questions Semantic search queries
Best for 50-500 email batches 10k+ email corpus
Prerequisite vLLM server running RAG collection indexed
Use case "Does this mention X?" "Find all emails about X"
Reasoning Per-email LLM analysis Similarity + ranking

Rule of thumb:

  • < 500 emails + custom question = Use Batch LLM
  • 1000 emails + topic search = Use RAG

  • Regular classification = Use main ML pipeline

Prerequisites

  1. vLLM server must be running

  2. Python dependencies

    pip install httpx click
    
  3. Email provider setup

    • Enron: No setup needed (uses local maildir)
    • Gmail: Requires credentials file

Troubleshooting

"vLLM server not available"

Check server status:

curl https://rtx3090.bobai.com.au/v1/models \
  -H "Authorization: Bearer rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092"

Verify model is loaded:

python tools/batch_llm_classifier.py check

High error rate (503 errors)

Reduce concurrent requests in VLLM_CONFIG:

'max_concurrent': 2,  # Lower if getting 503s

Slow processing

  • Check vLLM server isn't overloaded
  • Verify network latency to rtx3090.bobai.com.au
  • Consider using main ML pipeline for large batches

Future Enhancements

Potential additions (not implemented):

  • Support for custom prompt templates
  • JSON output mode for structured extraction
  • Progress bar for large batches
  • Retry logic for transient failures
  • Multi-server load balancing
  • Streaming responses for real-time feedback

Remember: This tool is supplementary. For production email classification, use the main ML pipeline (src/cli.py run).