- Created standalone batch_llm_classifier.py for custom email queries - Optimized all LLM prompts for caching (static instructions first, variables last) - Configured rtx3090 vLLM endpoint (qwen3-coder-30b) - Tested batch_size=4 optimal (100% success, 4.65 req/sec) - Added comprehensive documentation (tools/README.md, BATCH_LLM_QUICKSTART.md) Tool is completely separate from main ML pipeline - no interference. Prerequisite: vLLM server must be running at rtx3090.bobai.com.au
249 lines
6.5 KiB
Markdown
249 lines
6.5 KiB
Markdown
# Email Sorter - Supplementary Tools
|
|
|
|
This directory contains **optional** standalone tools that complement the main ML classification pipeline without interfering with it.
|
|
|
|
## Tools
|
|
|
|
### batch_llm_classifier.py
|
|
|
|
**Purpose**: Ask custom questions across batches of emails using vLLM server
|
|
|
|
**Prerequisite**: vLLM server must be running at configured endpoint
|
|
|
|
**When to use this:**
|
|
- One-off batch analysis with custom questions
|
|
- Exploratory queries ("find all emails mentioning budget cuts")
|
|
- Custom classification criteria not in trained ML model
|
|
- Quick ad-hoc analysis without retraining
|
|
|
|
**When to use RAG instead:**
|
|
- Searching across large email corpus (10k+ emails)
|
|
- Finding specific topics/keywords with semantic search
|
|
- Building knowledge base from email content
|
|
- Multi-step reasoning across many documents
|
|
|
|
**When to use main ML pipeline:**
|
|
- Regular ongoing classification of incoming emails
|
|
- High-volume processing (100k+ emails)
|
|
- Consistent categories that don't change
|
|
- Maximum speed (pure ML with no LLM calls)
|
|
|
|
---
|
|
|
|
## batch_llm_classifier.py Usage
|
|
|
|
### Check vLLM Server Status
|
|
|
|
```bash
|
|
python tools/batch_llm_classifier.py check
|
|
```
|
|
|
|
Expected output:
|
|
```
|
|
✓ vLLM server is running and ready
|
|
✓ Max concurrent requests: 4
|
|
✓ Estimated throughput: ~4.4 emails/sec
|
|
```
|
|
|
|
### Ask Custom Question
|
|
|
|
```bash
|
|
python tools/batch_llm_classifier.py ask \
|
|
--source enron \
|
|
--limit 100 \
|
|
--question "Does this email contain any financial numbers or budget information?" \
|
|
--output financial_emails.txt
|
|
```
|
|
|
|
**Parameters:**
|
|
- `--source`: Email provider (gmail, enron)
|
|
- `--credentials`: Path to credentials (for Gmail)
|
|
- `--limit`: Number of emails to process
|
|
- `--question`: Custom question to ask about each email
|
|
- `--output`: Output file for results
|
|
|
|
### Example Questions
|
|
|
|
**Finding specific content:**
|
|
```bash
|
|
--question "Is this email about a meeting or calendar event? Answer yes/no and provide date if found."
|
|
```
|
|
|
|
**Sentiment analysis:**
|
|
```bash
|
|
--question "What is the tone of this email? Professional/Casual/Urgent/Friendly?"
|
|
```
|
|
|
|
**Categorization with custom criteria:**
|
|
```bash
|
|
--question "Should this email be archived or kept for reference? Explain why."
|
|
```
|
|
|
|
**Data extraction:**
|
|
```bash
|
|
--question "Extract all names, dates, and dollar amounts mentioned in this email."
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
vLLM server settings are in `batch_llm_classifier.py`:
|
|
|
|
```python
|
|
VLLM_CONFIG = {
|
|
'base_url': 'https://rtx3090.bobai.com.au/v1',
|
|
'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
|
|
'model': 'qwen3-coder-30b',
|
|
'batch_size': 4, # Tested optimal - 100% success rate
|
|
'temperature': 0.1,
|
|
'max_tokens': 500
|
|
}
|
|
```
|
|
|
|
**Note**: `batch_size: 4` is the tested optimal setting. Uses proper batch pooling (send 4, wait for completion, send next 4). Higher values cause 503 errors.
|
|
|
|
---
|
|
|
|
## Performance Benchmarks
|
|
|
|
Tested on rtx3090.bobai.com.au with qwen3-coder-30b:
|
|
|
|
| Emails | Batch Size | Time | Throughput | Success Rate |
|
|
|--------|-----------|------|------------|--------------|
|
|
| 500 | 4 (pooled)| 108s | 4.65/sec | 100% |
|
|
| 500 | 8 (pooled)| 62s | 8.10/sec | 60% |
|
|
| 500 | 20 (pooled)| 23s | 21.8/sec | 23% |
|
|
|
|
**Conclusion**: batch_size=4 with proper batch pooling is optimal (100% reliability, ~4.7 req/sec)
|
|
|
|
---
|
|
|
|
## Architecture Notes
|
|
|
|
### Prompt Caching Optimization
|
|
|
|
Prompts are structured with static content first, variable content last:
|
|
|
|
```
|
|
STATIC (cached):
|
|
- System instructions
|
|
- Question
|
|
- Output format guidelines
|
|
|
|
VARIABLE (not cached):
|
|
- Email subject
|
|
- Email sender
|
|
- Email body
|
|
```
|
|
|
|
This allows vLLM to cache the static portion across all emails in the batch.
|
|
|
|
### Separation from Main Pipeline
|
|
|
|
This tool is **completely independent** from the main classification pipeline:
|
|
|
|
- **Main pipeline** (`src/cli.py run`):
|
|
- Uses calibrated LightGBM model
|
|
- Fast pure ML classification
|
|
- Optional LLM fallback for low-confidence cases
|
|
- Processes 10k emails in ~24s (pure ML) or ~5min (with LLM fallback)
|
|
|
|
- **Batch LLM tool** (`tools/batch_llm_classifier.py`):
|
|
- Uses vLLM server exclusively
|
|
- Custom questions per run
|
|
- ~4.4 emails/sec throughput
|
|
- For ad-hoc analysis, not production classification
|
|
|
|
### No Interference Guarantee
|
|
|
|
The batch LLM tool:
|
|
- ✓ Does NOT modify any files in `src/`
|
|
- ✓ Does NOT touch trained models in `src/models/`
|
|
- ✓ Does NOT affect config files
|
|
- ✓ Does NOT interfere with existing workflows
|
|
- ✓ Uses separate vLLM endpoint (not Ollama)
|
|
|
|
---
|
|
|
|
## Comparison: Batch LLM vs RAG
|
|
|
|
| Feature | Batch LLM (this tool) | RAG (rag-search) |
|
|
|---------|----------------------|------------------|
|
|
| **Speed** | 4.4 emails/sec | Instant (pre-indexed) |
|
|
| **Flexibility** | Custom questions | Semantic search queries |
|
|
| **Best for** | 50-500 email batches | 10k+ email corpus |
|
|
| **Prerequisite** | vLLM server running | RAG collection indexed |
|
|
| **Use case** | "Does this mention X?" | "Find all emails about X" |
|
|
| **Reasoning** | Per-email LLM analysis | Similarity + ranking |
|
|
|
|
**Rule of thumb:**
|
|
- < 500 emails + custom question = Use Batch LLM
|
|
- > 1000 emails + topic search = Use RAG
|
|
- Regular classification = Use main ML pipeline
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
1. **vLLM server must be running**
|
|
- Endpoint: https://rtx3090.bobai.com.au/v1
|
|
- Model loaded: qwen3-coder-30b
|
|
- Check with: `python tools/batch_llm_classifier.py check`
|
|
|
|
2. **Python dependencies**
|
|
```bash
|
|
pip install httpx click
|
|
```
|
|
|
|
3. **Email provider setup**
|
|
- Enron: No setup needed (uses local maildir)
|
|
- Gmail: Requires credentials file
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "vLLM server not available"
|
|
|
|
Check server status:
|
|
```bash
|
|
curl https://rtx3090.bobai.com.au/v1/models \
|
|
-H "Authorization: Bearer rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092"
|
|
```
|
|
|
|
Verify model is loaded:
|
|
```bash
|
|
python tools/batch_llm_classifier.py check
|
|
```
|
|
|
|
### High error rate (503 errors)
|
|
|
|
Reduce concurrent requests in `VLLM_CONFIG`:
|
|
```python
|
|
'max_concurrent': 2, # Lower if getting 503s
|
|
```
|
|
|
|
### Slow processing
|
|
|
|
- Check vLLM server isn't overloaded
|
|
- Verify network latency to rtx3090.bobai.com.au
|
|
- Consider using main ML pipeline for large batches
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
Potential additions (not implemented):
|
|
|
|
- Support for custom prompt templates
|
|
- JSON output mode for structured extraction
|
|
- Progress bar for large batches
|
|
- Retry logic for transient failures
|
|
- Multi-server load balancing
|
|
- Streaming responses for real-time feedback
|
|
|
|
---
|
|
|
|
**Remember**: This tool is supplementary. For production email classification, use the main ML pipeline (`src/cli.py run`).
|