email-sorter/BATCH_LLM_QUICKSTART.md

# Batch LLM Classifier - Quick Start

## Prerequisite Check

```bash
python tools/batch_llm_classifier.py check
```

Expected: `✓ vLLM server is running and ready`

If not running: Start vLLM server at rtx3090.bobai.com.au first

---

## Basic Usage

```bash
python tools/batch_llm_classifier.py ask \
  --source enron \
  --limit 50 \
  --question "YOUR QUESTION HERE" \
  --output results.txt
```

---

## Example Questions

### Find Urgent Emails
```bash
--question "Is this email urgent or time-sensitive? Answer yes/no and explain."
```

### Extract Financial Data
```bash
--question "List any dollar amounts, budgets, or financial numbers in this email."
```

### Meeting Detection
```bash
--question "Does this email mention a meeting? If yes, extract date/time/location."
```

### Sentiment Analysis
```bash
--question "What is the tone? Professional/Casual/Urgent/Frustrated? Explain."
```

### Custom Classification
```bash
--question "Should this email be archived or kept active? Why?"
```

---

## Performance

- **Throughput**: 4.65 requests/sec
- **Batch size**: 4 (proper batch pooling)
- **Reliability**: 100% success rate
- **Example**: 500 requests in 108 seconds

---

## When To Use

✅ **Use Batch LLM for:**
- Custom questions on 50-500 emails
- One-off exploratory analysis
- Flexible classification criteria
- Data extraction tasks

❌ **Use RAG instead for:**
- Searching 10k+ email corpus
- Semantic topic search
- Multi-document reasoning

❌ **Use Main ML Pipeline for:**
- Regular ongoing classification
- High-volume processing (10k+ emails)
- Consistent categories
- Maximum speed

---

## Quick Test

```bash
# Check server
python tools/batch_llm_classifier.py check

# Process 10 emails
python tools/batch_llm_classifier.py ask \
  --source enron \
  --limit 10 \
  --question "Summarize this email in one sentence." \
  --output test.txt

# Check results
cat test.txt
```

---

## Files Created

- `tools/batch_llm_classifier.py` - Main tool (executable)
- `tools/README.md` - Full documentation
- `test_llm_concurrent.py` - Performance testing script (root)

**No files in `src/` were modified - existing ML pipeline untouched**

---

## Configuration

Edit `VLLM_CONFIG` in `batch_llm_classifier.py`:

```python
VLLM_CONFIG = {
    'base_url': 'https://rtx3090.bobai.com.au/v1',
    'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
    'model': 'qwen3-coder-30b',
    'batch_size': 4,  # Don't increase - causes 503 errors
}
```

---

## Troubleshooting

**Server not available:**
```bash
curl https://rtx3090.bobai.com.au/v1/models -H "Authorization: Bearer rtx3090_..."
```

**503 errors:**
Lower `batch_size` to 2 in config (currently optimal is 4)

**Slow processing:**
Check vLLM server load - may be handling other requests

---

**Done!** Ready to ask custom questions across email batches.