- Created standalone batch_llm_classifier.py for custom email queries - Optimized all LLM prompts for caching (static instructions first, variables last) - Configured rtx3090 vLLM endpoint (qwen3-coder-30b) - Tested batch_size=4 optimal (100% success, 4.65 req/sec) - Added comprehensive documentation (tools/README.md, BATCH_LLM_QUICKSTART.md) Tool is completely separate from main ML pipeline - no interference. Prerequisite: vLLM server must be running at rtx3090.bobai.com.au
146 lines
2.8 KiB
Markdown
146 lines
2.8 KiB
Markdown
# Batch LLM Classifier - Quick Start
|
|
|
|
## Prerequisite Check
|
|
|
|
```bash
|
|
python tools/batch_llm_classifier.py check
|
|
```
|
|
|
|
Expected: `✓ vLLM server is running and ready`
|
|
|
|
If not running: Start vLLM server at rtx3090.bobai.com.au first
|
|
|
|
---
|
|
|
|
## Basic Usage
|
|
|
|
```bash
|
|
python tools/batch_llm_classifier.py ask \
|
|
--source enron \
|
|
--limit 50 \
|
|
--question "YOUR QUESTION HERE" \
|
|
--output results.txt
|
|
```
|
|
|
|
---
|
|
|
|
## Example Questions
|
|
|
|
### Find Urgent Emails
|
|
```bash
|
|
--question "Is this email urgent or time-sensitive? Answer yes/no and explain."
|
|
```
|
|
|
|
### Extract Financial Data
|
|
```bash
|
|
--question "List any dollar amounts, budgets, or financial numbers in this email."
|
|
```
|
|
|
|
### Meeting Detection
|
|
```bash
|
|
--question "Does this email mention a meeting? If yes, extract date/time/location."
|
|
```
|
|
|
|
### Sentiment Analysis
|
|
```bash
|
|
--question "What is the tone? Professional/Casual/Urgent/Frustrated? Explain."
|
|
```
|
|
|
|
### Custom Classification
|
|
```bash
|
|
--question "Should this email be archived or kept active? Why?"
|
|
```
|
|
|
|
---
|
|
|
|
## Performance
|
|
|
|
- **Throughput**: 4.65 requests/sec
|
|
- **Batch size**: 4 (proper batch pooling)
|
|
- **Reliability**: 100% success rate
|
|
- **Example**: 500 requests in 108 seconds
|
|
|
|
---
|
|
|
|
## When To Use
|
|
|
|
✅ **Use Batch LLM for:**
|
|
- Custom questions on 50-500 emails
|
|
- One-off exploratory analysis
|
|
- Flexible classification criteria
|
|
- Data extraction tasks
|
|
|
|
❌ **Use RAG instead for:**
|
|
- Searching 10k+ email corpus
|
|
- Semantic topic search
|
|
- Multi-document reasoning
|
|
|
|
❌ **Use Main ML Pipeline for:**
|
|
- Regular ongoing classification
|
|
- High-volume processing (10k+ emails)
|
|
- Consistent categories
|
|
- Maximum speed
|
|
|
|
---
|
|
|
|
## Quick Test
|
|
|
|
```bash
|
|
# Check server
|
|
python tools/batch_llm_classifier.py check
|
|
|
|
# Process 10 emails
|
|
python tools/batch_llm_classifier.py ask \
|
|
--source enron \
|
|
--limit 10 \
|
|
--question "Summarize this email in one sentence." \
|
|
--output test.txt
|
|
|
|
# Check results
|
|
cat test.txt
|
|
```
|
|
|
|
---
|
|
|
|
## Files Created
|
|
|
|
- `tools/batch_llm_classifier.py` - Main tool (executable)
|
|
- `tools/README.md` - Full documentation
|
|
- `test_llm_concurrent.py` - Performance testing script (root)
|
|
|
|
**No files in `src/` were modified - existing ML pipeline untouched**
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
Edit `VLLM_CONFIG` in `batch_llm_classifier.py`:
|
|
|
|
```python
|
|
VLLM_CONFIG = {
|
|
'base_url': 'https://rtx3090.bobai.com.au/v1',
|
|
'api_key': 'rtx3090_foxadmin_10_8034ecb47841f45ba1d5f3f5d875c092',
|
|
'model': 'qwen3-coder-30b',
|
|
'batch_size': 4, # Don't increase - causes 503 errors
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
**Server not available:**
|
|
```bash
|
|
curl https://rtx3090.bobai.com.au/v1/models -H "Authorization: Bearer rtx3090_..."
|
|
```
|
|
|
|
**503 errors:**
|
|
Lower `batch_size` to 2 in config (currently optimal is 4)
|
|
|
|
**Slow processing:**
|
|
Check vLLM server load - may be handling other requests
|
|
|
|
---
|
|
|
|
**Done!** Ready to ask custom questions across email batches.
|