flowchart TD
Start([python -m src.cli run]) --> LoadConfig[Load config/default_config.yaml]
LoadConfig --> InitProviders[Initialize Email Provider
Enron/Gmail/IMAP]
InitProviders --> FetchEmails[Fetch Emails
--limit N]
FetchEmails --> CheckSize{Email Count?}
CheckSize -->|"< 1000"| SetMockMode[Set ml_classifier.is_mock = True
LLM-only mode]
CheckSize -->|">= 1000"| CheckModel{Model Exists?}
CheckModel -->|No model at
src/models/pretrained/classifier.pkl| RunCalibration[CALIBRATION PHASE
LLM category discovery
Train ML model]
CheckModel -->|Model exists| SkipCalibration[Skip Calibration
Load existing model]
SetMockMode --> SkipCalibration
RunCalibration --> ClassifyPhase[CLASSIFICATION PHASE]
SkipCalibration --> ClassifyPhase
ClassifyPhase --> Loop{For each email}
Loop --> RuleCheck{Hard rule match?}
RuleCheck -->|Yes| RuleClassify[Category by rule
confidence=1.0
method='rule']
RuleCheck -->|No| MLClassify[ML Classification
Get category + confidence]
MLClassify --> ConfCheck{Confidence >= threshold?}
ConfCheck -->|Yes| AcceptML[Accept ML result
method='ml'
needs_review=False]
ConfCheck -->|No| LowConf[Low confidence detected
needs_review=True]
LowConf --> FlagCheck{--no-llm-fallback?}
FlagCheck -->|Yes| AcceptMLAnyway[Accept ML anyway
needs_review=False]
FlagCheck -->|No| LLMCheck{LLM available?}
LLMCheck -->|Yes| LLMReview[LLM Classification
~4 seconds
method='llm']
LLMCheck -->|No| AcceptMLAnyway
RuleClassify --> NextEmail{More emails?}
AcceptML --> NextEmail
AcceptMLAnyway --> NextEmail
LLMReview --> NextEmail
NextEmail -->|Yes| Loop
NextEmail -->|No| SaveResults[Save results.json]
SaveResults --> End([Complete])
style RunCalibration fill:#ff6b6b
style LLMReview fill:#ff6b6b
style SetMockMode fill:#ffd93d
style FlagCheck fill:#4ec9b0
style AcceptMLAnyway fill:#4ec9b0
flowchart TD
Start([Calibration Triggered]) --> Sample[Stratified Sampling
3% of emails
min 250, max 1500]
Sample --> LLMBatch[LLM Category Discovery
50 emails per batch]
LLMBatch --> Batch1[Batch 1: 50 emails
~20 seconds]
Batch1 --> Batch2[Batch 2: 50 emails
~20 seconds]
Batch2 --> BatchN[... N batches
For 300 samples: 6 batches]
BatchN --> Consolidate[LLM Consolidation
Merge similar categories
~5 seconds]
Consolidate --> Categories[Final Categories
~10-12 unique categories]
Categories --> Label[Label Training Emails
LLM labels each sample
~3 seconds per email]
Label --> Extract[Feature Extraction
Embeddings + TF-IDF
~0.02 seconds per email]
Extract --> Train[Train LightGBM Model
~5 seconds total]
Train --> Validate[Validate on 100 samples
~2 seconds]
Validate --> Save[Save Model
src/models/calibrated/classifier.pkl]
Save --> End([Calibration Complete
Total time: 15-25 minutes for 10k emails])
style LLMBatch fill:#ff6b6b
style Label fill:#ff6b6b
style Consolidate fill:#ff6b6b
style Train fill:#4ec9b0
flowchart TD
Start([Classification Phase]) --> Email[Get Email]
Email --> Rules{Check Hard Rules
Pattern matching}
Rules -->|Match| RuleDone[Rule Match
~0.001 seconds
59 of 10000 emails]
Rules -->|No match| Embed[Generate Embedding
all-minilm:l6-v2
~0.02 seconds]
Embed --> TFIDF[TF-IDF Features
~0.001 seconds]
TFIDF --> MLPredict[ML Prediction
LightGBM
~0.003 seconds]
MLPredict --> Threshold{Confidence >= 0.55?}
Threshold -->|Yes| MLDone[ML Classification
7842 of 10000 emails
78.4%]
Threshold -->|No| Flag{--no-llm-fallback?}
Flag -->|Yes| MLForced[Force ML result
No LLM call]
Flag -->|No| LLM[LLM Classification
~4 seconds
2099 of 10000 emails
21%]
RuleDone --> Next([Next Email])
MLDone --> Next
MLForced --> Next
LLM --> Next
style LLM fill:#ff6b6b
style MLDone fill:#4ec9b0
style MLForced fill:#ffd93d
flowchart TD
Start([MLClassifier.__init__]) --> CheckPath{model_path provided?}
CheckPath -->|Yes| UsePath[Use provided path]
CheckPath -->|No| Default[Default:
src/models/pretrained/classifier.pkl]
UsePath --> FileCheck{File exists?}
Default --> FileCheck
FileCheck -->|Yes| Load[Load pickle file]
FileCheck -->|No| CreateMock[Create MOCK model
Random Forest
12 hardcoded categories]
Load --> ValidCheck{Valid model data?}
ValidCheck -->|Yes| CheckMock{is_mock flag?}
ValidCheck -->|No| CreateMock
CheckMock -->|True| WarnMock[Warn: MOCK model active]
CheckMock -->|False| RealModel[Real trained model loaded]
CreateMock --> MockWarnings[Multiple warnings printed
NOT for production]
WarnMock --> Ready[Model Ready]
RealModel --> Ready
MockWarnings --> Ready
Ready --> End([Classification can start])
style CreateMock fill:#ff6b6b
style RealModel fill:#4ec9b0
style WarnMock fill:#ffd93d
Location: src/cli.py:46, src/classification/adaptive_classifier.py:152-161
Effect: When ML confidence < threshold, accept ML result anyway instead of calling LLM
Use case: Test pure ML performance, avoid LLM costs
Code path:
if self.disable_llm_fallback:
# Just return ML result without LLM fallback
return ClassificationResult(needs_review=False)
Location: src/cli.py:38
Effect: Limits number of emails fetched from source
Calibration trigger: If N < 1000, forces LLM-only mode (no ML training)
Code path:
if total_emails < 1000:
ml_classifier.is_mock = True # Skip ML, use LLM only
Location: src/classification/ml_classifier.py:43
Default: src/models/pretrained/classifier.pkl
Calibration saves to: src/models/calibrated/classifier.pkl
Problem: Calibration saves to different location than default load location
Solution: Copy calibrated model to pretrained location OR pass model_path parameter
| Phase | Operation | Time per Email | Total Time (10k) | LLM Required? |
|---|---|---|---|---|
| Calibration (if model doesn't exist) |
Stratified sampling (300 emails) | - | ~1 second | No |
| LLM category discovery (6 batches) | ~0.4 sec/email | ~2 minutes | YES | |
| LLM consolidation | - | ~5 seconds | YES | |
| LLM labeling (300 samples) | ~3 sec/email | ~15 minutes | YES | |
| Feature extraction (300 samples) | ~0.02 sec/email | ~6 seconds | No (embeddings) | |
| Model training (LightGBM) | - | ~5 seconds | No | |
| CALIBRATION TOTAL | ~17-20 minutes | YES | ||
| Classification (with model) |
Hard rule matching | ~0.001 sec | ~10 seconds (all 10k) | No |
| Embedding generation | ~0.02 sec | ~200 seconds (all 10k) | No (Ollama embed) | |
| ML prediction | ~0.003 sec | ~30 seconds (all 10k) | No | |
| LLM fallback (21% of emails) | ~4 sec/email | ~140 minutes (2100 emails) | YES | |
| Saving results | - | ~1 second | No | |
| CLASSIFICATION TOTAL (with LLM fallback) | ~2.5 hours | YES (21%) | ||
| CLASSIFICATION TOTAL (--no-llm-fallback) | ~4 minutes | No | ||
flowchart TD
Start([CLI startup]) --> Always1[ALWAYS: Load LLM provider
src/cli.py:98-117]
Always1 --> Reason1[Reason: Needed for calibration
if model doesn't exist]
Reason1 --> Check{Model exists?}
Check -->|No| NeedLLM1[LLM required for calibration
Category discovery
Sample labeling]
Check -->|Yes| SkipCal[Skip calibration]
SkipCal --> ClassStart[Start classification]
NeedLLM1 --> DoCalibration[Run calibration
Uses LLM]
DoCalibration --> ClassStart
ClassStart --> Always2[ALWAYS: LLM provider is available
llm.is_available = True]
Always2 --> EmailLoop[For each email...]
EmailLoop --> LowConf{Low confidence?}
LowConf -->|No| NoLLM[No LLM call]
LowConf -->|Yes| FlagCheck{--no-llm-fallback?}
FlagCheck -->|Yes| NoLLMCall[No LLM call
Accept ML result]
FlagCheck -->|No| LLMAvail{llm.is_available?}
LLMAvail -->|Yes| CallLLM[LLM called
src/cli.py:227-228]
LLMAvail -->|No| NoLLMCall
NoLLM --> End([Next email])
NoLLMCall --> End
CallLLM --> End
style Always1 fill:#ffd93d
style Always2 fill:#ffd93d
style CallLLM fill:#ff6b6b
style NoLLMCall fill:#4ec9b0
| Command | Model Exists? | Calibration Runs? | LLM Used for Classification? | Total Time (10k) |
|---|---|---|---|---|
python -m src.cli run --source enron --limit 10000 |
No | YES (~20 min) | YES (~2.5 hours) | ~2 hours 50 min |
python -m src.cli run --source enron --limit 10000 |
Yes | No | YES (~2.5 hours) | ~2.5 hours |
python -m src.cli run --source enron --limit 10000 --no-llm-fallback |
No | YES (~20 min) | NO | ~24 minutes |
python -m src.cli run --source enron --limit 10000 --no-llm-fallback |
Yes | No | NO | ~4 minutes |
python -m src.cli run --source enron --limit 500 |
Any | No (too few emails) | YES (100% LLM-only) | ~35 minutes |
src/models/pretrained/classifier.pkl ✓ (done)--no-llm-fallback flag
python -m src.cli run --source enron --limit 10000 --output ml_only_10k/ --no-llm-fallback