# Email Sorter - Supplementary Tools This directory contains **optional** standalone tools that complement the main ML classification pipeline without interfering with it. ## Tools ### batch_llm_classifier.py **Purpose**: Ask custom questions across batches of emails using vLLM server **Prerequisite**: vLLM server must be running at configured endpoint **When to use this:** - One-off batch analysis with custom questions - Exploratory queries ("find all emails mentioning budget cuts") - Custom classification criteria not in trained ML model - Quick ad-hoc analysis without retraining **When to use RAG instead:** - Searching across large email corpus (10k+ emails) - Finding specific topics/keywords with semantic search - Building knowledge base from email content - Multi-step reasoning across many documents **When to use main ML pipeline:** - Regular ongoing classification of incoming emails - High-volume processing (100k+ emails) - Consistent categories that don't change - Maximum speed (pure ML with no LLM calls) --- ## batch_llm_classifier.py Usage ### Check vLLM Server Status ```bash python tools/batch_llm_classifier.py check ``` Expected output: ``` ✓ vLLM server is running and ready ✓ Max concurrent requests: 4 ✓ Estimated throughput: ~4.4 emails/sec ``` ### Ask Custom Question ```bash python tools/batch_llm_classifier.py ask \ --source enron \ --limit 100 \ --question "Does this email contain any financial numbers or budget information?" \ --output financial_emails.txt ``` **Parameters:** - `--source`: Email provider (gmail, enron) - `--credentials`: Path to credentials (for Gmail) - `--limit`: Number of emails to process - `--question`: Custom question to ask about each email - `--output`: Output file for results ### Example Questions **Finding specific content:** ```bash --question "Is this email about a meeting or calendar event? Answer yes/no and provide date if found." ``` **Sentiment analysis:** ```bash --question "What is the tone of this email? Professional/Casual/Urgent/Friendly?" ``` **Categorization with custom criteria:** ```bash --question "Should this email be archived or kept for reference? Explain why." ``` **Data extraction:** ```bash --question "Extract all names, dates, and dollar amounts mentioned in this email." ``` --- ## Configuration vLLM server settings are in `batch_llm_classifier.py`: ```python VLLM_CONFIG = { 'base_url': 'https://rtx3090.bobai.com.au/v1', 'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092', 'model': 'qwen3-coder-30b', 'batch_size': 4, # Tested optimal - 100% success rate 'temperature': 0.1, 'max_tokens': 500 } ``` **Note**: `batch_size: 4` is the tested optimal setting. Uses proper batch pooling (send 4, wait for completion, send next 4). Higher values cause 503 errors. --- ## Performance Benchmarks Tested on rtx3090.bobai.com.au with qwen3-coder-30b: | Emails | Batch Size | Time | Throughput | Success Rate | |--------|-----------|------|------------|--------------| | 500 | 4 (pooled)| 108s | 4.65/sec | 100% | | 500 | 8 (pooled)| 62s | 8.10/sec | 60% | | 500 | 20 (pooled)| 23s | 21.8/sec | 23% | **Conclusion**: batch_size=4 with proper batch pooling is optimal (100% reliability, ~4.7 req/sec) --- ## Architecture Notes ### Prompt Caching Optimization Prompts are structured with static content first, variable content last: ``` STATIC (cached): - System instructions - Question - Output format guidelines VARIABLE (not cached): - Email subject - Email sender - Email body ``` This allows vLLM to cache the static portion across all emails in the batch. ### Separation from Main Pipeline This tool is **completely independent** from the main classification pipeline: - **Main pipeline** (`src/cli.py run`): - Uses calibrated LightGBM model - Fast pure ML classification - Optional LLM fallback for low-confidence cases - Processes 10k emails in ~24s (pure ML) or ~5min (with LLM fallback) - **Batch LLM tool** (`tools/batch_llm_classifier.py`): - Uses vLLM server exclusively - Custom questions per run - ~4.4 emails/sec throughput - For ad-hoc analysis, not production classification ### No Interference Guarantee The batch LLM tool: - ✓ Does NOT modify any files in `src/` - ✓ Does NOT touch trained models in `src/models/` - ✓ Does NOT affect config files - ✓ Does NOT interfere with existing workflows - ✓ Uses separate vLLM endpoint (not Ollama) --- ## Comparison: Batch LLM vs RAG | Feature | Batch LLM (this tool) | RAG (rag-search) | |---------|----------------------|------------------| | **Speed** | 4.4 emails/sec | Instant (pre-indexed) | | **Flexibility** | Custom questions | Semantic search queries | | **Best for** | 50-500 email batches | 10k+ email corpus | | **Prerequisite** | vLLM server running | RAG collection indexed | | **Use case** | "Does this mention X?" | "Find all emails about X" | | **Reasoning** | Per-email LLM analysis | Similarity + ranking | **Rule of thumb:** - < 500 emails + custom question = Use Batch LLM - > 1000 emails + topic search = Use RAG - Regular classification = Use main ML pipeline --- ## Prerequisites 1. **vLLM server must be running** - Endpoint: https://rtx3090.bobai.com.au/v1 - Model loaded: qwen3-coder-30b - Check with: `python tools/batch_llm_classifier.py check` 2. **Python dependencies** ```bash pip install httpx click ``` 3. **Email provider setup** - Enron: No setup needed (uses local maildir) - Gmail: Requires credentials file --- ## Troubleshooting ### "vLLM server not available" Check server status: ```bash curl https://rtx3090.bobai.com.au/v1/models \ -H "Authorization: Bearer rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092" ``` Verify model is loaded: ```bash python tools/batch_llm_classifier.py check ``` ### High error rate (503 errors) Reduce concurrent requests in `VLLM_CONFIG`: ```python 'max_concurrent': 2, # Lower if getting 503s ``` ### Slow processing - Check vLLM server isn't overloaded - Verify network latency to rtx3090.bobai.com.au - Consider using main ML pipeline for large batches --- ## Future Enhancements Potential additions (not implemented): - Support for custom prompt templates - JSON output mode for structured extraction - Progress bar for large batches - Retry logic for transient failures - Multi-server load balancing - Streaming responses for real-time feedback --- **Remember**: This tool is supplementary. For production email classification, use the main ML pipeline (`src/cli.py run`).