email-sorter/tools/README.md

# Email Sorter - Supplementary Tools

This directory contains **optional** standalone tools that complement the main ML classification pipeline without interfering with it.

## Tools

### batch_llm_classifier.py

**Purpose**: Ask custom questions across batches of emails using vLLM server

**Prerequisite**: vLLM server must be running at configured endpoint

**When to use this:**
- One-off batch analysis with custom questions
- Exploratory queries ("find all emails mentioning budget cuts")
- Custom classification criteria not in trained ML model
- Quick ad-hoc analysis without retraining

**When to use RAG instead:**
- Searching across large email corpus (10k+ emails)
- Finding specific topics/keywords with semantic search
- Building knowledge base from email content
- Multi-step reasoning across many documents

**When to use main ML pipeline:**
- Regular ongoing classification of incoming emails
- High-volume processing (100k+ emails)
- Consistent categories that don't change
- Maximum speed (pure ML with no LLM calls)

---

## batch_llm_classifier.py Usage

### Check vLLM Server Status

```bash
python tools/batch_llm_classifier.py check
```

Expected output:
```
✓ vLLM server is running and ready
✓ Max concurrent requests: 4
✓ Estimated throughput: ~4.4 emails/sec
```

### Ask Custom Question

```bash
python tools/batch_llm_classifier.py ask \
  --source enron \
  --limit 100 \
  --question "Does this email contain any financial numbers or budget information?" \
  --output financial_emails.txt
```

**Parameters:**
- `--source`: Email provider (gmail, enron)
- `--credentials`: Path to credentials (for Gmail)
- `--limit`: Number of emails to process
- `--question`: Custom question to ask about each email
- `--output`: Output file for results

### Example Questions

**Finding specific content:**
```bash
--question "Is this email about a meeting or calendar event? Answer yes/no and provide date if found."
```

**Sentiment analysis:**
```bash
--question "What is the tone of this email? Professional/Casual/Urgent/Friendly?"
```

**Categorization with custom criteria:**
```bash
--question "Should this email be archived or kept for reference? Explain why."
```

**Data extraction:**
```bash
--question "Extract all names, dates, and dollar amounts mentioned in this email."
```

---

## Configuration

vLLM server settings are in `batch_llm_classifier.py`:

```python
VLLM_CONFIG = {
    'base_url': 'https://rtx3090.bobai.com.au/v1',
    'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
    'model': 'qwen3-coder-30b',
    'batch_size': 4,  # Tested optimal - 100% success rate
    'temperature': 0.1,
    'max_tokens': 500
}
```

**Note**: `batch_size: 4` is the tested optimal setting. Uses proper batch pooling (send 4, wait for completion, send next 4). Higher values cause 503 errors.

---

## Performance Benchmarks

Tested on rtx3090.bobai.com.au with qwen3-coder-30b:

| Emails | Batch Size | Time | Throughput | Success Rate |
|--------|-----------|------|------------|--------------|
| 500    | 4 (pooled)| 108s | 4.65/sec   | 100%         |
| 500    | 8 (pooled)| 62s  | 8.10/sec   | 60%          |
| 500    | 20 (pooled)| 23s | 21.8/sec   | 23%          |

**Conclusion**: batch_size=4 with proper batch pooling is optimal (100% reliability, ~4.7 req/sec)

---

## Architecture Notes

### Prompt Caching Optimization

Prompts are structured with static content first, variable content last:

```
STATIC (cached):
  - System instructions
  - Question
  - Output format guidelines

VARIABLE (not cached):
  - Email subject
  - Email sender
  - Email body
```

This allows vLLM to cache the static portion across all emails in the batch.

### Separation from Main Pipeline

This tool is **completely independent** from the main classification pipeline:

- **Main pipeline** (`src/cli.py run`):
  - Uses calibrated LightGBM model
  - Fast pure ML classification
  - Optional LLM fallback for low-confidence cases
  - Processes 10k emails in ~24s (pure ML) or ~5min (with LLM fallback)

- **Batch LLM tool** (`tools/batch_llm_classifier.py`):
  - Uses vLLM server exclusively
  - Custom questions per run
  - ~4.4 emails/sec throughput
  - For ad-hoc analysis, not production classification

### No Interference Guarantee

The batch LLM tool:
- ✓ Does NOT modify any files in `src/`
- ✓ Does NOT touch trained models in `src/models/`
- ✓ Does NOT affect config files
- ✓ Does NOT interfere with existing workflows
- ✓ Uses separate vLLM endpoint (not Ollama)

---

## Comparison: Batch LLM vs RAG

| Feature | Batch LLM (this tool) | RAG (rag-search) |
|---------|----------------------|------------------|
| **Speed** | 4.4 emails/sec | Instant (pre-indexed) |
| **Flexibility** | Custom questions | Semantic search queries |
| **Best for** | 50-500 email batches | 10k+ email corpus |
| **Prerequisite** | vLLM server running | RAG collection indexed |
| **Use case** | "Does this mention X?" | "Find all emails about X" |
| **Reasoning** | Per-email LLM analysis | Similarity + ranking |

**Rule of thumb:**
- < 500 emails + custom question = Use Batch LLM
- > 1000 emails + topic search = Use RAG
- Regular classification = Use main ML pipeline

---

## Prerequisites

1. **vLLM server must be running**
   - Endpoint: https://rtx3090.bobai.com.au/v1
   - Model loaded: qwen3-coder-30b
   - Check with: `python tools/batch_llm_classifier.py check`

2. **Python dependencies**
   ```bash
   pip install httpx click
   ```

3. **Email provider setup**
   - Enron: No setup needed (uses local maildir)
   - Gmail: Requires credentials file

---

## Troubleshooting

### "vLLM server not available"

Check server status:
```bash
curl https://rtx3090.bobai.com.au/v1/models \
  -H "Authorization: Bearer rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092"
```

Verify model is loaded:
```bash
python tools/batch_llm_classifier.py check
```

### High error rate (503 errors)

Reduce concurrent requests in `VLLM_CONFIG`:
```python
'max_concurrent': 2,  # Lower if getting 503s
```

### Slow processing

- Check vLLM server isn't overloaded
- Verify network latency to rtx3090.bobai.com.au
- Consider using main ML pipeline for large batches

---

## Future Enhancements

Potential additions (not implemented):

- Support for custom prompt templates
- JSON output mode for structured extraction
- Progress bar for large batches
- Retry logic for transient failures
- Multi-server load balancing
- Streaming responses for real-time feedback

---

**Remember**: This tool is supplementary. For production email classification, use the main ML pipeline (`src/cli.py run`).