# Email Sorter - Complete Workflow Diagram ## Full End-to-End Pipeline with LLM Calls ```mermaid graph TB Start([📧 Start: Enron Maildir
100,000 emails]) --> Parse[EnronParser
Stratified Sampling] Parse --> CalibCheck{Need
Calibration?} CalibCheck -->|Yes: No Model| CalibStart[🎯 CALIBRATION PHASE] CalibCheck -->|No: Model Exists| ClassifyStart[📊 CLASSIFICATION PHASE] %% CALIBRATION PHASE CalibStart --> Sample[Sample 100 Emails
Stratified by user/folder] Sample --> Split[Split: 50 train / 50 validation] Split --> LLMBatch[📤 LLM CALL 1-5
Batch Discovery
5 batches × 20 emails] LLMBatch -->|qwen3:8b-q4_K_M| Discover[Category Discovery
~15 raw categories] Discover --> Consolidate[📤 LLM CALL 6
Consolidation
Merge similar categories] Consolidate -->|qwen3:8b-q4_K_M| CacheSnap[Category Cache Snap
Semantic matching
10 final categories] CacheSnap --> ExtractTrain[Extract Features
50 training emails
Batch embeddings] ExtractTrain --> Embed1[📤 EMBEDDING CALLS
Ollama all-minilm:l6-v2
384-dim vectors] Embed1 --> TrainModel[Train LightGBM
200 boosting rounds
22 total categories] TrainModel --> SaveModel[💾 Save Model
classifier.pkl 1.1MB] SaveModel --> ClassifyStart %% CLASSIFICATION PHASE ClassifyStart --> LoadModel[Load Model
classifier.pkl] LoadModel --> FetchAll[Fetch All Emails
100,000 emails] FetchAll --> BatchProcess[Process in Batches
5,000 emails per batch
20 batches total] BatchProcess --> ExtractFeatures[Extract Features
Batch size: 512
Batched embeddings] ExtractFeatures --> Embed2[📤 EMBEDDING CALLS
Ollama all-minilm:l6-v2
~200 batched calls] Embed2 --> MLInference[LightGBM Inference
Predict categories
~2ms per email] MLInference --> Results[💾 Save Results
results.json 19MB
summary.json 1.5KB
classifications.csv 8.6MB] Results --> ValidationStart[🔍 VALIDATION PHASE] %% VALIDATION PHASE ValidationStart --> SelectSamples[Select Samples
50 low-conf + 25 random] SelectSamples --> LoadEmails[Load Full Email Content
Subject + Body + Metadata] LoadEmails --> LLMEval[📤 LLM CALLS 7-81
Individual Evaluation
75 total assessments] LLMEval -->|qwen3:8b-q4_K_M
<no_think>| EvalResults[Collect Verdicts
YES/PARTIAL/NO
+ Reasoning] EvalResults --> LLMSummary[📤 LLM CALL 82
Final Summary
Aggregate findings] LLMSummary -->|qwen3:8b-q4_K_M| FinalReport[📊 Final Report
Accuracy metrics
Category quality
Recommendations] FinalReport --> End([✅ Complete
100k classified
+ validated]) %% OPTIONAL FINE-TUNING LOOP FinalReport -.->|If corrections needed| FineTune[🔄 FINE-TUNING
Collect LLM corrections
Continue training] FineTune -.-> ClassifyStart style Start fill:#e1f5e1 style End fill:#e1f5e1 style LLMBatch fill:#fff4e6 style Consolidate fill:#fff4e6 style Embed1 fill:#e6f3ff style Embed2 fill:#e6f3ff style LLMEval fill:#fff4e6 style LLMSummary fill:#fff4e6 style SaveModel fill:#ffe6f0 style Results fill:#ffe6f0 style FinalReport fill:#ffe6f0 ``` --- ## Pipeline Stages Breakdown ### STAGE 1: CALIBRATION (1 minute) **Input:** 100 emails **LLM Calls:** 6 calls - 5 batch discovery calls (20 emails each) - 1 consolidation call **Embedding Calls:** ~50 calls (one per training email) **Output:** - 10 discovered categories - Trained LightGBM model (1.1MB) - Category cache ### STAGE 2: CLASSIFICATION (3.4 minutes) **Input:** 100,000 emails **LLM Calls:** 0 (pure ML inference) **Embedding Calls:** ~200 batched calls (512 emails per batch) **Output:** - 100,000 classifications - Confidence scores - Results in JSON/CSV ### STAGE 3: VALIDATION (variable, ~5-10 minutes) **Input:** 75 sample emails (50 low-conf + 25 random) **LLM Calls:** 76 calls - 75 individual evaluation calls - 1 final summary call **Output:** - Quality assessment (YES/PARTIAL/NO) - Accuracy metrics - Recommendations --- ## LLM Call Summary | Call # | Purpose | Model | Input | Output | Time | |--------|---------|-------|-------|--------|------| | 1-5 | Batch Discovery | qwen3:8b | 20 emails each | Categories | ~5-6s each | | 6 | Consolidation | qwen3:8b | 15 categories | 10 merged | ~3s | | 7-81 | Evaluation | qwen3:8b | 1 email + category | Verdict | ~2s each | | 82 | Summary | qwen3:8b | 75 evaluations | Final report | ~5s | **Total LLM Calls:** 82 **Total LLM Time:** ~3-4 minutes **Embedding Calls:** ~250 (batched) **Embedding Time:** ~30 seconds (batched) --- ## Performance Metrics ### Calibration Phase - **Time:** 60 seconds - **Samples:** 100 emails (50 for training) - **Categories Discovered:** 10 - **Model Size:** 1.1MB - **Accuracy on training:** 95%+ ### Classification Phase - **Time:** 202 seconds (3.4 minutes) - **Emails:** 100,000 - **Speed:** 495 emails/second - **Per Email:** 2ms total processing - **Batch Size:** 512 (optimal) - **GPU Utilization:** High (batched embeddings) ### Validation Phase - **Time:** ~10 minutes (75 LLM calls) - **Samples:** 75 emails - **Per Sample:** ~8 seconds - **Accuracy Found:** Model already accurate (0 corrections) --- ## Data Flow Details ### Email Processing Pipeline ``` Email File → Parse → Features → Embedding → Model → Category (text) (dict) (struct) (384-dim) (22-cat) (label) ``` ### Feature Extraction ``` Email Content ├─ Subject (text) ├─ Body (text) ├─ Sender (email address) ├─ Date (timestamp) ├─ Attachments (boolean + count) └─ Patterns (regex matches) ↓ Structured Text ↓ Ollama Embedding (all-minilm:l6-v2) ↓ 384-dimensional vector ``` ### LightGBM Training ``` Features (384-dim) + Labels (10 categories) ↓ Training: 200 boosting rounds ↓ Model: 22 categories total (10 discovered + 12 hardcoded) ↓ Output: classifier.pkl (1.1MB) ``` --- ## Category Distribution (100k Results) ```mermaid pie title Category Distribution "Work Communication" : 89807 "Financial" : 6534 "Forwarded" : 2457 "Technical Analysis" : 1129 "Other" : 73 ``` --- ## Confidence Distribution (100k Results) ```mermaid pie title Confidence Levels "High (≥0.7)" : 74777 "Medium (0.5-0.7)" : 17381 "Low (<0.5)" : 7842 ``` --- ## System Architecture ```mermaid graph LR A[Email Source
Gmail/IMAP/Enron] --> B[Email Provider] B --> C[Feature Extractor] C --> D[Ollama
Embeddings] C --> E[Pattern Detector] D --> F[LightGBM
Classifier] E --> F F --> G[Results
JSON/CSV] F --> H[Sync Engine
Labels/Keywords] I[LLM
qwen3:8b] -.->|Calibration| J[Category Discovery] J -.-> F I -.->|Validation| K[Quality Check] K -.-> G style D fill:#e6f3ff style I fill:#fff4e6 style F fill:#f0e6ff style G fill:#ffe6f0 ``` --- ## Next: Integrated End-to-End Script Building comprehensive validation script with: 1. 50 low-confidence samples 2. 25 random samples 3. Final LLM summary call 4. Complete pipeline orchestration