Email Sorter System Flow Documentation

1. Main Execution Flow

flowchart TD
    Start([python -m src.cli run]) --> LoadConfig[Load config/default_config.yaml]
    LoadConfig --> InitProviders[Initialize Email Provider
Enron/Gmail/IMAP] InitProviders --> FetchEmails[Fetch Emails
--limit N] FetchEmails --> CheckSize{Email Count?} CheckSize -->|"< 1000"| SetMockMode[Set ml_classifier.is_mock = True
LLM-only mode] CheckSize -->|">= 1000"| CheckModel{Model Exists?} CheckModel -->|No model at
src/models/pretrained/classifier.pkl| RunCalibration[CALIBRATION PHASE
LLM category discovery
Train ML model] CheckModel -->|Model exists| SkipCalibration[Skip Calibration
Load existing model] SetMockMode --> SkipCalibration RunCalibration --> ClassifyPhase[CLASSIFICATION PHASE] SkipCalibration --> ClassifyPhase ClassifyPhase --> Loop{For each email} Loop --> RuleCheck{Hard rule match?} RuleCheck -->|Yes| RuleClassify[Category by rule
confidence=1.0
method='rule'] RuleCheck -->|No| MLClassify[ML Classification
Get category + confidence] MLClassify --> ConfCheck{Confidence >= threshold?} ConfCheck -->|Yes| AcceptML[Accept ML result
method='ml'
needs_review=False] ConfCheck -->|No| LowConf[Low confidence detected
needs_review=True] LowConf --> FlagCheck{--no-llm-fallback?} FlagCheck -->|Yes| AcceptMLAnyway[Accept ML anyway
needs_review=False] FlagCheck -->|No| LLMCheck{LLM available?} LLMCheck -->|Yes| LLMReview[LLM Classification
~4 seconds
method='llm'] LLMCheck -->|No| AcceptMLAnyway RuleClassify --> NextEmail{More emails?} AcceptML --> NextEmail AcceptMLAnyway --> NextEmail LLMReview --> NextEmail NextEmail -->|Yes| Loop NextEmail -->|No| SaveResults[Save results.json] SaveResults --> End([Complete]) style RunCalibration fill:#ff6b6b style LLMReview fill:#ff6b6b style SetMockMode fill:#ffd93d style FlagCheck fill:#4ec9b0 style AcceptMLAnyway fill:#4ec9b0

2. Calibration Phase Detail (When Triggered)

flowchart TD
    Start([Calibration Triggered]) --> Sample[Stratified Sampling
3% of emails
min 250, max 1500] Sample --> LLMBatch[LLM Category Discovery
50 emails per batch] LLMBatch --> Batch1[Batch 1: 50 emails
~20 seconds] Batch1 --> Batch2[Batch 2: 50 emails
~20 seconds] Batch2 --> BatchN[... N batches
For 300 samples: 6 batches] BatchN --> Consolidate[LLM Consolidation
Merge similar categories
~5 seconds] Consolidate --> Categories[Final Categories
~10-12 unique categories] Categories --> Label[Label Training Emails
LLM labels each sample
~3 seconds per email] Label --> Extract[Feature Extraction
Embeddings + TF-IDF
~0.02 seconds per email] Extract --> Train[Train LightGBM Model
~5 seconds total] Train --> Validate[Validate on 100 samples
~2 seconds] Validate --> Save[Save Model
src/models/calibrated/classifier.pkl] Save --> End([Calibration Complete
Total time: 15-25 minutes for 10k emails]) style LLMBatch fill:#ff6b6b style Label fill:#ff6b6b style Consolidate fill:#ff6b6b style Train fill:#4ec9b0

3. Classification Phase Detail

flowchart TD
    Start([Classification Phase]) --> Email[Get Email]
    Email --> Rules{Check Hard Rules
Pattern matching} Rules -->|Match| RuleDone[Rule Match
~0.001 seconds
59 of 10000 emails] Rules -->|No match| Embed[Generate Embedding
all-minilm:l6-v2
~0.02 seconds] Embed --> TFIDF[TF-IDF Features
~0.001 seconds] TFIDF --> MLPredict[ML Prediction
LightGBM
~0.003 seconds] MLPredict --> Threshold{Confidence >= 0.55?} Threshold -->|Yes| MLDone[ML Classification
7842 of 10000 emails
78.4%] Threshold -->|No| Flag{--no-llm-fallback?} Flag -->|Yes| MLForced[Force ML result
No LLM call] Flag -->|No| LLM[LLM Classification
~4 seconds
2099 of 10000 emails
21%] RuleDone --> Next([Next Email]) MLDone --> Next MLForced --> Next LLM --> Next style LLM fill:#ff6b6b style MLDone fill:#4ec9b0 style MLForced fill:#ffd93d

4. Model Loading Logic

flowchart TD
    Start([MLClassifier.__init__]) --> CheckPath{model_path provided?}
    CheckPath -->|Yes| UsePath[Use provided path]
    CheckPath -->|No| Default[Default:
src/models/pretrained/classifier.pkl] UsePath --> FileCheck{File exists?} Default --> FileCheck FileCheck -->|Yes| Load[Load pickle file] FileCheck -->|No| CreateMock[Create MOCK model
Random Forest
12 hardcoded categories] Load --> ValidCheck{Valid model data?} ValidCheck -->|Yes| CheckMock{is_mock flag?} ValidCheck -->|No| CreateMock CheckMock -->|True| WarnMock[Warn: MOCK model active] CheckMock -->|False| RealModel[Real trained model loaded] CreateMock --> MockWarnings[Multiple warnings printed
NOT for production] WarnMock --> Ready[Model Ready] RealModel --> Ready MockWarnings --> Ready Ready --> End([Classification can start]) style CreateMock fill:#ff6b6b style RealModel fill:#4ec9b0 style WarnMock fill:#ffd93d

5. Flag Conditions & Effects

--no-llm-fallback

Location: src/cli.py:46, src/classification/adaptive_classifier.py:152-161

Effect: When ML confidence < threshold, accept ML result anyway instead of calling LLM

Use case: Test pure ML performance, avoid LLM costs

Code path:

if self.disable_llm_fallback:
  # Just return ML result without LLM fallback
  return ClassificationResult(needs_review=False)

--limit N

Location: src/cli.py:38

Effect: Limits number of emails fetched from source

Calibration trigger: If N < 1000, forces LLM-only mode (no ML training)

Code path:

if total_emails < 1000:
  ml_classifier.is_mock = True # Skip ML, use LLM only

Model Path Override

Location: src/classification/ml_classifier.py:43

Default: src/models/pretrained/classifier.pkl

Calibration saves to: src/models/calibrated/classifier.pkl

Problem: Calibration saves to different location than default load location

Solution: Copy calibrated model to pretrained location OR pass model_path parameter

6. Timing Breakdown (10,000 emails)

Phase Operation Time per Email Total Time (10k) LLM Required?
Calibration
(if model doesn't exist)
Stratified sampling (300 emails) - ~1 second No
LLM category discovery (6 batches) ~0.4 sec/email ~2 minutes YES
LLM consolidation - ~5 seconds YES
LLM labeling (300 samples) ~3 sec/email ~15 minutes YES
Feature extraction (300 samples) ~0.02 sec/email ~6 seconds No (embeddings)
Model training (LightGBM) - ~5 seconds No
CALIBRATION TOTAL ~17-20 minutes YES
Classification
(with model)
Hard rule matching ~0.001 sec ~10 seconds (all 10k) No
Embedding generation ~0.02 sec ~200 seconds (all 10k) No (Ollama embed)
ML prediction ~0.003 sec ~30 seconds (all 10k) No
LLM fallback (21% of emails) ~4 sec/email ~140 minutes (2100 emails) YES
Saving results - ~1 second No
CLASSIFICATION TOTAL (with LLM fallback) ~2.5 hours YES (21%)
CLASSIFICATION TOTAL (--no-llm-fallback) ~4 minutes No

7. Why LLM Still Loads

flowchart TD
    Start([CLI startup]) --> Always1[ALWAYS: Load LLM provider
src/cli.py:98-117] Always1 --> Reason1[Reason: Needed for calibration
if model doesn't exist] Reason1 --> Check{Model exists?} Check -->|No| NeedLLM1[LLM required for calibration
Category discovery
Sample labeling] Check -->|Yes| SkipCal[Skip calibration] SkipCal --> ClassStart[Start classification] NeedLLM1 --> DoCalibration[Run calibration
Uses LLM] DoCalibration --> ClassStart ClassStart --> Always2[ALWAYS: LLM provider is available
llm.is_available = True] Always2 --> EmailLoop[For each email...] EmailLoop --> LowConf{Low confidence?} LowConf -->|No| NoLLM[No LLM call] LowConf -->|Yes| FlagCheck{--no-llm-fallback?} FlagCheck -->|Yes| NoLLMCall[No LLM call
Accept ML result] FlagCheck -->|No| LLMAvail{llm.is_available?} LLMAvail -->|Yes| CallLLM[LLM called
src/cli.py:227-228] LLMAvail -->|No| NoLLMCall NoLLM --> End([Next email]) NoLLMCall --> End CallLLM --> End style Always1 fill:#ffd93d style Always2 fill:#ffd93d style CallLLM fill:#ff6b6b style NoLLMCall fill:#4ec9b0

Why LLM Provider is Always Initialized:

8. Command Scenarios

Command Model Exists? Calibration Runs? LLM Used for Classification? Total Time (10k)
python -m src.cli run --source enron --limit 10000 No YES (~20 min) YES (~2.5 hours) ~2 hours 50 min
python -m src.cli run --source enron --limit 10000 Yes No YES (~2.5 hours) ~2.5 hours
python -m src.cli run --source enron --limit 10000 --no-llm-fallback No YES (~20 min) NO ~24 minutes
python -m src.cli run --source enron --limit 10000 --no-llm-fallback Yes No NO ~4 minutes
python -m src.cli run --source enron --limit 500 Any No (too few emails) YES (100% LLM-only) ~35 minutes

9. Current System State

Model Status

Threshold Configuration

Last Run Results (10k emails)

10. To Run ML-Only Test (No LLM Calls During Classification)

Requirements:

  1. Model must exist at src/models/pretrained/classifier.pkl ✓ (done)
  2. Use --no-llm-fallback flag
  3. Ensure sufficient emails (≥1000) to avoid LLM-only mode

Command:

python -m src.cli run --source enron --limit 10000 --output ml_only_10k/ --no-llm-fallback

Expected Results: