# Email Sorter - Complete Workflow Diagram
## Full End-to-End Pipeline with LLM Calls
```mermaid
graph TB
Start([📧 Start: Enron Maildir
100,000 emails]) --> Parse[EnronParser
Stratified Sampling]
Parse --> CalibCheck{Need
Calibration?}
CalibCheck -->|Yes: No Model| CalibStart[🎯 CALIBRATION PHASE]
CalibCheck -->|No: Model Exists| ClassifyStart[📊 CLASSIFICATION PHASE]
%% CALIBRATION PHASE
CalibStart --> Sample[Sample 100 Emails
Stratified by user/folder]
Sample --> Split[Split: 50 train / 50 validation]
Split --> LLMBatch[📤 LLM CALL 1-5
Batch Discovery
5 batches × 20 emails]
LLMBatch -->|qwen3:8b-q4_K_M| Discover[Category Discovery
~15 raw categories]
Discover --> Consolidate[📤 LLM CALL 6
Consolidation
Merge similar categories]
Consolidate -->|qwen3:8b-q4_K_M| CacheSnap[Category Cache Snap
Semantic matching
10 final categories]
CacheSnap --> ExtractTrain[Extract Features
50 training emails
Batch embeddings]
ExtractTrain --> Embed1[📤 EMBEDDING CALLS
Ollama all-minilm:l6-v2
384-dim vectors]
Embed1 --> TrainModel[Train LightGBM
200 boosting rounds
22 total categories]
TrainModel --> SaveModel[💾 Save Model
classifier.pkl 1.1MB]
SaveModel --> ClassifyStart
%% CLASSIFICATION PHASE
ClassifyStart --> LoadModel[Load Model
classifier.pkl]
LoadModel --> FetchAll[Fetch All Emails
100,000 emails]
FetchAll --> BatchProcess[Process in Batches
5,000 emails per batch
20 batches total]
BatchProcess --> ExtractFeatures[Extract Features
Batch size: 512
Batched embeddings]
ExtractFeatures --> Embed2[📤 EMBEDDING CALLS
Ollama all-minilm:l6-v2
~200 batched calls]
Embed2 --> MLInference[LightGBM Inference
Predict categories
~2ms per email]
MLInference --> Results[💾 Save Results
results.json 19MB
summary.json 1.5KB
classifications.csv 8.6MB]
Results --> ValidationStart[🔍 VALIDATION PHASE]
%% VALIDATION PHASE
ValidationStart --> SelectSamples[Select Samples
50 low-conf + 25 random]
SelectSamples --> LoadEmails[Load Full Email Content
Subject + Body + Metadata]
LoadEmails --> LLMEval[📤 LLM CALLS 7-81
Individual Evaluation
75 total assessments]
LLMEval -->|qwen3:8b-q4_K_M
<no_think>| EvalResults[Collect Verdicts
YES/PARTIAL/NO
+ Reasoning]
EvalResults --> LLMSummary[📤 LLM CALL 82
Final Summary
Aggregate findings]
LLMSummary -->|qwen3:8b-q4_K_M| FinalReport[📊 Final Report
Accuracy metrics
Category quality
Recommendations]
FinalReport --> End([✅ Complete
100k classified
+ validated])
%% OPTIONAL FINE-TUNING LOOP
FinalReport -.->|If corrections needed| FineTune[🔄 FINE-TUNING
Collect LLM corrections
Continue training]
FineTune -.-> ClassifyStart
style Start fill:#e1f5e1
style End fill:#e1f5e1
style LLMBatch fill:#fff4e6
style Consolidate fill:#fff4e6
style Embed1 fill:#e6f3ff
style Embed2 fill:#e6f3ff
style LLMEval fill:#fff4e6
style LLMSummary fill:#fff4e6
style SaveModel fill:#ffe6f0
style Results fill:#ffe6f0
style FinalReport fill:#ffe6f0
```
---
## Pipeline Stages Breakdown
### STAGE 1: CALIBRATION (1 minute)
**Input:** 100 emails
**LLM Calls:** 6 calls
- 5 batch discovery calls (20 emails each)
- 1 consolidation call
**Embedding Calls:** ~50 calls (one per training email)
**Output:**
- 10 discovered categories
- Trained LightGBM model (1.1MB)
- Category cache
### STAGE 2: CLASSIFICATION (3.4 minutes)
**Input:** 100,000 emails
**LLM Calls:** 0 (pure ML inference)
**Embedding Calls:** ~200 batched calls (512 emails per batch)
**Output:**
- 100,000 classifications
- Confidence scores
- Results in JSON/CSV
### STAGE 3: VALIDATION (variable, ~5-10 minutes)
**Input:** 75 sample emails (50 low-conf + 25 random)
**LLM Calls:** 76 calls
- 75 individual evaluation calls
- 1 final summary call
**Output:**
- Quality assessment (YES/PARTIAL/NO)
- Accuracy metrics
- Recommendations
---
## LLM Call Summary
| Call # | Purpose | Model | Input | Output | Time |
|--------|---------|-------|-------|--------|------|
| 1-5 | Batch Discovery | qwen3:8b | 20 emails each | Categories | ~5-6s each |
| 6 | Consolidation | qwen3:8b | 15 categories | 10 merged | ~3s |
| 7-81 | Evaluation | qwen3:8b | 1 email + category | Verdict | ~2s each |
| 82 | Summary | qwen3:8b | 75 evaluations | Final report | ~5s |
**Total LLM Calls:** 82
**Total LLM Time:** ~3-4 minutes
**Embedding Calls:** ~250 (batched)
**Embedding Time:** ~30 seconds (batched)
---
## Performance Metrics
### Calibration Phase
- **Time:** 60 seconds
- **Samples:** 100 emails (50 for training)
- **Categories Discovered:** 10
- **Model Size:** 1.1MB
- **Accuracy on training:** 95%+
### Classification Phase
- **Time:** 202 seconds (3.4 minutes)
- **Emails:** 100,000
- **Speed:** 495 emails/second
- **Per Email:** 2ms total processing
- **Batch Size:** 512 (optimal)
- **GPU Utilization:** High (batched embeddings)
### Validation Phase
- **Time:** ~10 minutes (75 LLM calls)
- **Samples:** 75 emails
- **Per Sample:** ~8 seconds
- **Accuracy Found:** Model already accurate (0 corrections)
---
## Data Flow Details
### Email Processing Pipeline
```
Email File → Parse → Features → Embedding → Model → Category
(text) (dict) (struct) (384-dim) (22-cat) (label)
```
### Feature Extraction
```
Email Content
├─ Subject (text)
├─ Body (text)
├─ Sender (email address)
├─ Date (timestamp)
├─ Attachments (boolean + count)
└─ Patterns (regex matches)
↓
Structured Text
↓
Ollama Embedding (all-minilm:l6-v2)
↓
384-dimensional vector
```
### LightGBM Training
```
Features (384-dim) + Labels (10 categories)
↓
Training: 200 boosting rounds
↓
Model: 22 categories total (10 discovered + 12 hardcoded)
↓
Output: classifier.pkl (1.1MB)
```
---
## Category Distribution (100k Results)
```mermaid
pie title Category Distribution
"Work Communication" : 89807
"Financial" : 6534
"Forwarded" : 2457
"Technical Analysis" : 1129
"Other" : 73
```
---
## Confidence Distribution (100k Results)
```mermaid
pie title Confidence Levels
"High (≥0.7)" : 74777
"Medium (0.5-0.7)" : 17381
"Low (<0.5)" : 7842
```
---
## System Architecture
```mermaid
graph LR
A[Email Source
Gmail/IMAP/Enron] --> B[Email Provider]
B --> C[Feature Extractor]
C --> D[Ollama
Embeddings]
C --> E[Pattern Detector]
D --> F[LightGBM
Classifier]
E --> F
F --> G[Results
JSON/CSV]
F --> H[Sync Engine
Labels/Keywords]
I[LLM
qwen3:8b] -.->|Calibration| J[Category Discovery]
J -.-> F
I -.->|Validation| K[Quality Check]
K -.-> G
style D fill:#e6f3ff
style I fill:#fff4e6
style F fill:#f0e6ff
style G fill:#ffe6f0
```
---
## Next: Integrated End-to-End Script
Building comprehensive validation script with:
1. 50 low-confidence samples
2. 25 random samples
3. Final LLM summary call
4. Complete pipeline orchestration