diff --git a/.gitignore b/.gitignore
index 736422f..82901ba 100644
--- a/.gitignore
+++ b/.gitignore
@@ -17,6 +17,7 @@ src/models/pretrained/*.joblib
 *.h5
 *.joblib
 enron_mail_20150507
+maildir
 
 # Credentials
 .env
@@ -61,3 +62,4 @@ dmypy.json
 *.tmp
 *.bak
 *~
+enron_mail_20150507.tar.gz
\ No newline at end of file
diff --git a/COMPLETION_ASSESSMENT.md b/COMPLETION_ASSESSMENT.md
index c86f888..e756cb3 100644
--- a/COMPLETION_ASSESSMENT.md
+++ b/COMPLETION_ASSESSMENT.md
@@ -3,13 +3,13 @@
 **Date**: 2025-10-21
 **Status**: FEATURE COMPLETE - All 16 Phases Implemented
 **Test Results**: 27/30 passing (90% success rate)
-**Code Quality**: Production-ready with clear mock labeling
+**Code Quality**: Complete with full type hints and clear mock labeling
 
 ---
 
 ## Executive Summary
 
-The Email Sorter framework is **100% feature-complete** with all 16 development phases implemented. The system is production-ready for:
+The Email Sorter framework is **100% feature-complete** with all 16 development phases implemented. The system is ready for:
 
 1. **Immediate Use**: Framework testing with mock model (~90% test pass rate)
 2. **Real Model Integration**: Download/train LightGBM model and deploy
@@ -27,7 +27,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] Rich-based logging with file output
 - [x] Email data models with full type hints
 - [x] Pydantic validation
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 4: Email Providers ✅
 - [x] MockProvider (fully functional for testing)
@@ -43,7 +43,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] Attachment analysis (PDF, DOCX, XLSX text extraction)
 - [x] Embedding cache with MD5 hashing
 - [x] Batch processing for efficiency
-- **Status**: Production-ready with 90%+ test coverage
+- **Status**: Complete with 90%+ test coverage
 
 ### Phase 6: ML Classifier ✅
 - [x] Mock Random Forest (clearly labeled)
@@ -58,7 +58,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] OpenAIProvider (API-compatible)
 - [x] Graceful degradation when unavailable
 - [x] Batch processing support
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 8: Adaptive Classifier ✅
 - [x] Three-tier classification system
@@ -67,7 +67,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] LLM review (uncertain cases, ~5%)
 - [x] Dynamic threshold management
 - [x] Statistics tracking
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 9: Processing Pipeline ✅
 - [x] BulkProcessor with checkpointing
@@ -75,14 +75,14 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] Batch-based processing
 - [x] Progress tracking
 - [x] Error recovery
-- **Status**: Production-ready with test coverage
+- **Status**: Complete with test coverage
 
 ### Phase 10: Calibration System ✅
 - [x] EmailSampler (stratified + random)
 - [x] LLMAnalyzer (discover natural categories)
 - [x] CalibrationWorkflow (end-to-end)
 - [x] Category validation
-- **Status**: Production-ready with Enron dataset support
+- **Status**: Complete with Enron dataset support
 
 ### Phase 11: Export & Reporting ✅
 - [x] JSON export with metadata
@@ -90,7 +90,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] Organization by category
 - [x] Human-readable reports
 - [x] Statistics and metrics
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 12: Threshold & Pattern Learning ✅
 - [x] ThresholdAdjuster (learn from LLM feedback)
@@ -99,7 +99,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] PatternLearner (sender-specific rules)
 - [x] Category distribution tracking
 - [x] Hard rule suggestions
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 13: Advanced Processing ✅
 - [x] EnronParser (maildir format support)
@@ -108,7 +108,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] EmbeddingCache (MD5-based with disk persistence)
 - [x] EmbeddingBatcher (parallel processing)
 - [x] QueueManager (batch persistence)
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 14: Provider Sync ✅
 - [x] GmailSync (sync to Gmail labels)
@@ -116,7 +116,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] Configurable label mapping
 - [x] Batch update support
 - [x] Error handling and retry logic
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 15: Orchestration ✅
 - [x] EmailSorterOrchestrator (4-phase pipeline)
@@ -124,7 +124,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] Timing and metrics
 - [x] Error recovery
 - [x] Modular component design
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 16: Packaging ✅
 - [x] setup.py with setuptools
@@ -132,7 +132,7 @@ All core infrastructure, classifiers, learning systems, and export/sync mechanis
 - [x] Optional dependencies (dev, gmail, ollama, openai)
 - [x] Console script entry point
 - [x] Git history with 11 commits
-- **Status**: Production-ready
+- **Status**: Complete
 
 ### Phase 17: Testing ✅
 - [x] 23 unit tests
@@ -258,7 +258,7 @@ Total Size:         ~450 MB (includes venv + Enron dataset)
 
 ## Current Framework Status
 
-### What's Production-Ready Now
+### What's Complete Now
 ✅ All core infrastructure
 ✅ Feature extraction system
 ✅ Three-tier adaptive classifier
@@ -503,12 +503,12 @@ python setup.py sdist bdist_wheel
 
 ## Conclusion
 
-The Email Sorter framework is **100% feature-complete** and production-ready. All 16 development phases are implemented with:
+The Email Sorter framework is **100% feature-complete** and ready to use. All 16 development phases are implemented with:
 
 - ✅ 38 Python modules with full type hints
 - ✅ 27/30 tests passing (90% success rate)
-- ✅ ~6,000 lines of production code
-- ✅ Clear mock vs production separation
+- ✅ ~6,000 lines of code
+- ✅ Clear mock vs real model separation
 - ✅ Comprehensive logging and error handling
 - ✅ Graceful degradation
 - ✅ Batch processing optimization
diff --git a/NEXT_STEPS.md b/NEXT_STEPS.md
index 6f3cb0c..a165a0b 100644
--- a/NEXT_STEPS.md
+++ b/NEXT_STEPS.md
@@ -34,7 +34,7 @@ pytest tests/ -v --tb=short
 python -m src.cli test-config
 python -m src.cli run --source mock --output test_results/
 ```
-**Result**: Confirms framework is production-ready
+**Result**: Confirms framework works correctly
 
 ### Path B: Real Model Integration (30-60 minutes)
 **Goal**: Replace mock model with real LightGBM model
@@ -123,7 +123,7 @@ python tools/setup_real_model.py --check
 
 ## What's Ready Right Now
 
-### ✅ Framework Components (All Production-Ready)
+### ✅ Framework Components (All Complete)
 - [x] Feature extraction (embeddings + patterns + structural)
 - [x] Three-tier adaptive classifier (hard rules → ML → LLM)
 - [x] Embedding cache and batch processing
@@ -432,6 +432,6 @@ Your Email Sorter framework is **100% complete and tested**. The next step is si
 2. **When home**: Integrate real model (30-60 min)
 3. **When ready**: Process all 80k emails (20-30 min)
 
-All tools are provided. All documentation is complete. Framework is production-ready.
+All tools are provided. All documentation is complete. Framework is ready to use.
 
 **Choose your path above and get started!**
diff --git a/PROJECT_COMPLETE.md b/PROJECT_COMPLETE.md
index 5c6e834..4a9657f 100644
--- a/PROJECT_COMPLETE.md
+++ b/PROJECT_COMPLETE.md
@@ -1,16 +1,16 @@
 # EMAIL SORTER - PROJECT COMPLETE
 
 **Date**: October 21, 2025
-**Status**: FEATURE COMPLETE - Ready for Production
-**Framework Maturity**: Production-Ready
+**Status**: FEATURE COMPLETE - Ready to Use
+**Framework Maturity**: All Features Implemented
 **Test Coverage**: 90% (27/30 passing)
-**Code Quality**: Enterprise-Grade with Full Type Hints
+**Code Quality**: Full Type Hints and Comprehensive Error Handling
 
 ---
 
 ## The Bottom Line
 
-✅ **Email Sorter framework is 100% complete and production-ready**
+✅ **Email Sorter framework is 100% complete and ready to use**
 
 All 16 planned development phases are implemented. The system is ready to process Marion's 80k+ emails with high accuracy. All you need to do is:
 
@@ -25,7 +25,7 @@ That's it. No more building. No more architecture decisions. Framework is done.
 ## What You Have
 
 ### Core System (Ready to Use)
-- ✅ 38 Python modules (~6,000 lines of production code)
+- ✅ 38 Python modules (~6,000 lines of code)
 - ✅ 12-category email classifier
 - ✅ Hybrid ML/LLM classification system
 - ✅ Smart feature extraction (embeddings + patterns + structure)
@@ -124,7 +124,7 @@ WARNINGS: 16 (All Pydantic deprecation - cosmetic, code works fine)
 
 Duration: ~90 seconds
 Coverage: All critical paths
-Quality: Enterprise-grade
+Quality: Comprehensive with full type hints
 ```
 
 ---
@@ -445,7 +445,7 @@ python -m src.cli run --source gmail --output marion_results/
 
 ## Success Criteria
 
-### ✅ Framework is Production-Ready
+### ✅ Framework is Complete
 - [x] All 16 phases implemented
 - [x] 90% test pass rate
 - [x] Full type hints
@@ -465,7 +465,7 @@ python -m src.cli run --source gmail --output marion_results/
 - [x] Label mapping configured
 - [x] Batch update support
 
-### ✅ Ready for Production
+### ✅ Ready for Deployment
 - [x] Checkpointing and resumability
 - [x] Error recovery
 - [x] Performance optimized
@@ -487,7 +487,7 @@ You have three paths:
 - Effort: Run one command or training script
 - Result: Real LightGBM model installed
 
-### Path C: Production Deployment (Do When Ready)
+### Path C: Full Deployment (Do When Ready)
 - Runtime: 2-3 hours
 - Effort: Setup Gmail OAuth + run processing
 - Result: All 80k emails sorted and labeled
@@ -498,9 +498,9 @@ You have three paths:
 
 ## The Reality
 
-This is a **production-grade email classification system** with:
+This is a **complete email classification system** with:
 
-- Enterprise-quality code (type hints, comprehensive logging, error handling)
+- High-quality code (type hints, comprehensive logging, error handling)
 - Smart hybrid classification (hard rules → ML → LLM)
 - Proven ML framework (LightGBM)
 - Real email data for training (Enron dataset)
@@ -526,9 +526,9 @@ But none of that is required to start using the system.
 PROJECT COMPLETE
 Date:                2025-10-21
 Status:              100% FEATURE COMPLETE
-Framework Maturity:  Production-Ready
+Framework Maturity:  All Features Implemented
 Test Coverage:       90% (27/30 passing)
-Code Quality:        Enterprise-grade
+Code Quality:        Full type hints and comprehensive error handling
 Documentation:       Comprehensive
 Ready for:           Immediate use or real model integration
 
@@ -561,6 +561,6 @@ Bottom Line:
 
 **Built with Python, LightGBM, Sentence-Transformers, Ollama, and Google APIs**
 
-**Ready for production email classification and Marion's 80k+ emails**
+**Ready for email classification and Marion's 80k+ emails**
 
 **What are you waiting for? Start processing!**
diff --git a/PROJECT_STATUS.md b/PROJECT_STATUS.md
index 1fd96e6..5568da8 100644
--- a/PROJECT_STATUS.md
+++ b/PROJECT_STATUS.md
@@ -8,7 +8,7 @@
 
 ## EXECUTIVE SUMMARY
 
-Email Sorter framework is **100% code-complete and tested**. All 16 planned phases have been implemented with production-ready code. The system is ready for:
+Email Sorter framework is **100% code-complete and tested**. All 16 planned phases have been implemented. The system is ready for:
 
 1. **Real data training** (when you get home with Enron dataset access)
 2. **Gmail/IMAP credential configuration** (OAuth setup)
@@ -196,7 +196,7 @@ Git Commits:         10 commits tracking all work
 
 ## WHAT'S READY RIGHT NOW
 
-### ✅ Framework (Production-Ready)
+### ✅ Framework (Complete)
 - All core infrastructure
 - Config management
 - Logging system
diff --git a/START_HERE.md b/START_HERE.md
index 43290b6..825bd6f 100644
--- a/START_HERE.md
+++ b/START_HERE.md
@@ -1,12 +1,12 @@
 # EMAIL SORTER - START HERE
 
-**Welcome to Email Sorter v1.0 - Your Production-Ready Email Classification System**
+**Welcome to Email Sorter v1.0 - Your Email Classification System**
 
 ---
 
 ## What Is This?
 
-A **complete, production-grade email classification system** that:
+A **complete email classification system** that:
 - Uses hybrid ML/LLM classification for 90-94% accuracy
 - Processes emails with smart rules, machine learning, and AI
 - Works with Gmail, IMAP, or any email dataset
@@ -19,7 +19,7 @@ A **complete, production-grade email classification system** that:
 ### ✅ The Good News
 - **Framework is 100% complete** - all 16 planned phases are done
 - **Ready to use immediately** - with mock model or real model
-- **Production-grade code** - 6000+ lines, full type hints, comprehensive logging
+- **Complete codebase** - 6000+ lines, full type hints, comprehensive logging
 - **90% test pass rate** - 27/30 tests passing
 - **Comprehensive documentation** - 10 guides covering everything
 
@@ -150,7 +150,7 @@ python tools/download_pretrained_model.py --url URL  # Download model
 ### Q: Do I need to do anything right now?
 **A:** No! But you can run `pytest tests/ -v` to verify everything works.
 
-### Q: Is the framework production-ready?
+### Q: Is the framework ready to use?
 **A:** YES! All 16 phases are complete. 90% test pass rate. Ready to use.
 
 ### Q: How do I get better accuracy than the mock model?
@@ -176,19 +176,19 @@ python tools/download_pretrained_model.py --url URL  # Download model
 - ✅ Confirm framework works
 - ✅ See mock classification in action
 - ✅ Verify all tests pass
-- ❌ Not production-grade accuracy
+- ❌ Not real-world accuracy yet
 
 ### Path B Results (30-60 minutes)
 - ✅ Real LightGBM model trained
 - ✅ 85-90% classification accuracy
-- ✅ Production-ready predictions
+- ✅ Ready for real data
 - ❌ Haven't processed real emails yet
 
 ### Path C Results (2-3 hours)
 - ✅ All emails classified
 - ✅ 90-94% overall accuracy
 - ✅ Synced to Gmail labels
-- ✅ Full production deployment
+- ✅ Full deployment complete
 - ✅ Marion's 80k+ emails processed
 
 ---
@@ -241,7 +241,7 @@ Status: Ready to explore
 ```
 ✅ Real model installed
 ✅ Model check shows: is_mock: False
-✅ Ready for production classification
+✅ Ready for real classification
 Status: Ready for real data
 ```
 
@@ -258,7 +258,7 @@ Status: Complete and deployed
 
 ## One More Thing...
 
-**This framework is production-ready NOW.** You don't need to:
+**This framework is complete and ready to use NOW.** You don't need to:
 - Fix anything ✅
 - Add components ✅
 - Change architecture ✅
diff --git a/config/default_config.yaml b/config/default_config.yaml
index 8f97e8b..4fdae8e 100644
--- a/config/default_config.yaml
+++ b/config/default_config.yaml
@@ -32,10 +32,10 @@ llm:
 
   ollama:
     base_url: "http://localhost:11434"
-    calibration_model: "qwen3:4b"
+    calibration_model: "qwen3:8b-q4_K_M"
     classification_model: "qwen3:1.7b"
     temperature: 0.1
-    max_tokens: 500
+    max_tokens: 2000
     timeout: 30
     retry_attempts: 3
 
diff --git a/src/calibration/enron_parser.py b/src/calibration/enron_parser.py
index 7b00490..00cff0c 100644
--- a/src/calibration/enron_parser.py
+++ b/src/calibration/enron_parser.py
@@ -1,7 +1,8 @@
 """Parse Enron dataset for training."""
 import logging
 import os
-import email
+import email.message
+import email.parser
 from pathlib import Path
 from typing import List, Optional
 from datetime import datetime
@@ -91,6 +92,10 @@ class EnronParser:
             with open(filepath, 'rb') as f:
                 msg = email.message_from_bytes(f.read())
 
+            # Extract folder name from filepath
+            # filepath structure: maildir/user-name/folder-name/123
+            folder_name = filepath.parent.name
+
             # Extract basic info
             msg_id = str(filepath).replace('/', '_').replace('\\', '_')
             subject = msg.get('subject', 'No Subject')
@@ -117,7 +122,8 @@ class EnronParser:
                 body=body,
                 body_snippet=body_snippet,
                 has_attachments=self._has_attachments(msg),
-                provider='enron'
+                provider='enron',
+                headers={'X-Folder': folder_name}
             )
 
         except Exception as e:
diff --git a/src/calibration/llm_analyzer.py b/src/calibration/llm_analyzer.py
index aaa1a6c..7fb2723 100644
--- a/src/calibration/llm_analyzer.py
+++ b/src/calibration/llm_analyzer.py
@@ -61,56 +61,81 @@ class CalibrationAnalyzer:
             batch = sample_emails[batch_idx:batch_idx + batch_size]
 
             try:
-                batch_results = self._analyze_batch(batch)
+                batch_results = self._analyze_batch(batch, batch_idx)
+
+                logger.debug(f"Batch results: {len(batch_results.get('categories', {}))} categories, {len(batch_results.get('labels', []))} labels")
 
                 # Merge categories
                 for category, desc in batch_results.get('categories', {}).items():
                     if category not in discovered_categories:
                         discovered_categories[category] = desc
+                        logger.debug(f"Discovered new category: {category}")
 
                 # Collect labels
                 for email_id, category in batch_results.get('labels', []):
                     email_labels.append((email_id, category))
+                    logger.debug(f"Label: {email_id} -> {category}")
 
             except Exception as e:
-                logger.error(f"Error analyzing batch: {e}")
+                logger.error(f"Error analyzing batch {batch_idx}: {e}", exc_info=True)
 
         logger.info(f"Discovery complete: {len(discovered_categories)} categories found")
 
         return discovered_categories, email_labels
 
-    def _analyze_batch(self, batch: List[Email]) -> Dict[str, Any]:
+    def _analyze_batch(self, batch: List[Email], batch_idx: int = 0) -> Dict[str, Any]:
         """Analyze single batch of emails."""
-        # Build email summary
-        email_summary = "\n".join([
-            f"Email {i+1}:\n"
-            f"  From: {e.sender}\n"
-            f"  Subject: {e.subject}\n"
-            f"  Preview: {e.body_snippet[:100]}...\n"
-            for i, e in enumerate(batch)
-        ])
+        # Build email summary with actual IDs
+        email_list = []
+        for i, e in enumerate(batch):
+            email_list.append(f"{i+1}. ID: {e.id}\n   From: {e.sender}\n   Subject: {e.subject}\n   Preview: {e.body_snippet[:100]}...")
 
-        prompt = f"""Analyze these emails and identify natural categories they belong to.
-For each email, assign ONE category. Create new categories as needed based on the emails.
+        email_summary = "\n\n".join(email_list)
+
+        # Use first email ID as example
+        example_id = batch[0].id if batch else "maildir_example__sent_1"
+
+        prompt = f"""<no_think>Categorize these emails. You MUST copy the exact ID string for each email.
 
 EMAILS:
 {email_summary}
 
-Respond with JSON only:
+CRITICAL: Copy the EXACT ID from each email above. For example, if email #1 has ID "{example_id}", you must write exactly "{example_id}" in the labels array, not "email1" or anything else.
+
+Return JSON:
 {{
-  "categories": {{"category_name": "brief description", ...}},
-  "labels": [["email_1_id", "category_name"], ["email_2_id", "category_name"], ...]
+  "categories": {{"category_name": "description", ...}},
+  "labels": [["{example_id}", "category"], ...]
 }}
+
+JSON:
 """
 
         try:
             response = self.llm_provider.complete(
                 prompt,
                 temperature=0.1,
-                max_tokens=1000
+                max_tokens=2000
             )
 
-            return self._parse_response(response)
+            # Save first batch for debugging
+            if batch_idx == 0:
+                with open('debug_prompt.txt', 'w') as f:
+                    f.write(prompt)
+                with open('debug_response.txt', 'w') as f:
+                    f.write(response)
+                logger.info("Saved first batch prompt and response to debug_*.txt")
+
+            logger.debug(f"LLM raw response preview: {response[:500]}")
+            parsed = self._parse_response(response)
+
+            # Log parsing result
+            if batch_idx == 0:
+                with open('debug_parsed.txt', 'w') as f:
+                    import json
+                    f.write(json.dumps(parsed, indent=2))
+
+            return parsed
 
         except Exception as e:
             logger.error(f"LLM analysis failed: {e}")
@@ -119,12 +144,20 @@ Respond with JSON only:
     def _parse_response(self, response: str) -> Dict[str, Any]:
         """Parse LLM JSON response."""
         try:
-            json_match = re.search(r'\{.*\}', response, re.DOTALL)
-            if json_match:
-                return json.loads(json_match.group())
-        except json.JSONDecodeError as e:
-            logger.debug(f"JSON parse error: {e}")
+            # Strip <think> tags if present
+            cleaned = re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL)
 
+            # Extract JSON
+            json_match = re.search(r'\{.*\}', cleaned, re.DOTALL)
+            if json_match:
+                parsed = json.loads(json_match.group())
+                logger.debug(f"Successfully parsed JSON: {len(parsed.get('categories', {}))} categories, {len(parsed.get('labels', []))} labels")
+                return parsed
+        except json.JSONDecodeError as e:
+            logger.warning(f"JSON parse error: {e}")
+            logger.debug(f"Response preview: {response[:200]}")
+
+        logger.warning(f"Failed to parse LLM response, returning empty")
         return {'categories': {}, 'labels': []}
 
     def _default_categories(self) -> Dict[str, Any]:
diff --git a/src/calibration/workflow.py b/src/calibration/workflow.py
index accffd8..65d6ac9 100644
--- a/src/calibration/workflow.py
+++ b/src/calibration/workflow.py
@@ -84,24 +84,52 @@ class CalibrationWorkflow:
         logger.info("\nStep 2: LLM category discovery...")
         discovered_categories, sample_labels = self.analyzer.discover_categories(sample_emails)
 
+        logger.info(f"ANALYZER RETURNED: {len(discovered_categories)} categories, {len(sample_labels)} labels")
         logger.info(f"Discovered {len(discovered_categories)} categories:")
         for cat, desc in discovered_categories.items():
             logger.info(f"  - {cat}: {desc}")
 
+        if len(sample_labels) > 0:
+            logger.info(f"Sample labels (first 3): {sample_labels[:3]}")
+
         # Step 3: Label emails
         logger.info("\nStep 3: Labeling emails...")
 
         # Create lookup for LLM labels
         label_map = {email_id: category for email_id, category in sample_labels}
 
+        # Update categories to include discovered ones
+        all_categories = list(set(self.categories) | set(discovered_categories.keys()))
+        logger.info(f"Using categories: {all_categories}")
+
+        # Update trainer with discovered categories
+        self.trainer.categories = all_categories
+        self.trainer.category_to_idx = {cat: idx for idx, cat in enumerate(all_categories)}
+        self.trainer.idx_to_category = {idx: cat for cat, idx in self.trainer.category_to_idx.items()}
+
         # Build training set
         training_data = []
+        matched = 0
         for email in sample_emails:
             category = label_map.get(email.id)
-            if category and category in self.categories:
+            if category:
                 training_data.append((email, category))
+                matched += 1
 
-        logger.info(f"Training data: {len(training_data)} labeled emails")
+        logger.info(f"Training data: {len(training_data)} labeled emails (matched {matched}/{len(sample_emails)} emails)")
+
+        if not training_data and len(label_map) > 0:
+            logger.error(f"CRITICAL: Label ID mismatch! LLM returned {len(label_map)} labels but NONE match email IDs")
+            logger.error(f"First 3 email IDs from sample: {[repr(e.id) for e in sample_emails[:3]]}")
+            logger.error(f"First 3 label IDs from LLM: {[repr(k) for k in list(label_map.keys())[:3]]}")
+
+            # Check for pattern differences
+            if len(label_map) > 0 and len(sample_emails) > 0:
+                sample_email_id = sample_emails[0].id
+                sample_label_id = list(label_map.keys())[0]
+                logger.error(f"Length: email_id={len(sample_email_id)}, label_id={len(sample_label_id)}")
+                logger.error(f"Email ID bytes: {sample_email_id.encode()}")
+                logger.error(f"Label ID bytes: {sample_label_id.encode()}")
 
         if not training_data:
             logger.error("No labeled training data!")
diff --git a/src/classification/feature_extractor.py b/src/classification/feature_extractor.py
index 1284153..098e337 100644
--- a/src/classification/feature_extractor.py
+++ b/src/classification/feature_extractor.py
@@ -57,19 +57,26 @@ class FeatureExtractor:
         }
 
     def _initialize_embedder(self) -> None:
-        """Initialize sentence embedding model."""
-        if SentenceTransformer is None:
-            logger.warning("sentence-transformers not installed, embeddings will be unavailable")
-            self.embedder = None
-            return
+        """
+        Initialize embedding model via Ollama.
 
+        NOTE: We use Ollama's all-minilm:l6-v2 model instead of downloading sentence-transformers.
+        This is MUCH faster (2-3 seconds vs 90 seconds) since Ollama caches the model.
+
+        TODO: The original design used sentence-transformers which downloads the model each time.
+        We bypassed it to use Ollama for speed. If sentence-transformers had proper caching,
+        it would also be 2-3 seconds. Keep this Ollama approach for now.
+        """
         try:
-            model_name = self.config.get('embedding_model', 'all-MiniLM-L6-v2')
-            logger.info(f"Loading embedding model: {model_name}")
-            self.embedder = SentenceTransformer(model_name)
-            logger.info(f"Embedder initialized ({self.embedder.get_sentence_embedding_dimension()} dims)")
+            import ollama
+            self.embedder = ollama.Client(host="http://localhost:11434")
+            logger.info("Embedder initialized: using Ollama (all-minilm:l6-v2)")
+            logger.info("Embedding dimension: 384 dims")
+        except ImportError:
+            logger.error("ollama package not installed: pip install ollama")
+            self.embedder = None
         except Exception as e:
-            logger.error(f"Failed to initialize embedder: {e}")
+            logger.error(f"Failed to initialize Ollama embedder: {e}")
             self.embedder = None
 
     def _initialize_vectorizer(self) -> None:
@@ -224,14 +231,25 @@ class FeatureExtractor:
         return features
 
     def _extract_embedding(self, email: Email) -> np.ndarray:
-        """Generate semantic embedding for email."""
+        """
+        Generate semantic embedding for email using Ollama.
+
+        Uses all-minilm:l6-v2 via Ollama (384 dimensions).
+        Falls back to zero vector if Ollama unavailable.
+        """
         if not self.embedder:
             return np.zeros(384)
 
         try:
             # Build structured text for embedding
             text = self._build_embedding_text(email)
-            embedding = self.embedder.encode(text, convert_to_numpy=True)
+
+            # Get embedding from Ollama
+            response = self.embedder.embeddings(
+                model='all-minilm:l6-v2',
+                prompt=text
+            )
+            embedding = np.array(response['embedding'], dtype=np.float32)
             return embedding
         except Exception as e:
             logger.error(f"Error generating embedding: {e}")
diff --git a/src/classification/ml_classifier.py b/src/classification/ml_classifier.py
index e2facc2..fbd4bb6 100644
--- a/src/classification/ml_classifier.py
+++ b/src/classification/ml_classifier.py
@@ -43,10 +43,12 @@ class MLClassifier:
         self.model_path = model_path or "src/models/pretrained/classifier.pkl"
 
         # Try to load pre-trained model
-        if model_path and Path(model_path).exists():
-            self._load_model(model_path)
+        logger.info(f"Checking for model at: {self.model_path}")
+        if Path(self.model_path).exists():
+            logger.info(f"Model file found, loading...")
+            self._load_model(self.model_path)
         else:
-            logger.warning("Pre-trained model not found, creating MOCK model for testing")
+            logger.warning(f"Pre-trained model not found at {self.model_path}, creating MOCK model for testing")
             self._create_mock_model()
 
     def _load_model(self, model_path: str) -> None:
@@ -155,8 +157,14 @@ class MLClassifier:
             if len(features.shape) == 1:
                 features = features.reshape(1, -1)
 
-            # Get probabilities
-            probs = self.model.predict_proba(features)[0]
+            # Get probabilities - handle both LightGBM and sklearn models
+            if hasattr(self.model, 'predict_proba'):
+                # sklearn API (RandomForest, etc.)
+                probs = self.model.predict_proba(features)[0]
+            else:
+                # LightGBM API (Booster object)
+                probs = self.model.predict(features)[0]
+
             pred_class = np.argmax(probs)
             category = self.categories[pred_class]
             confidence = float(probs[pred_class])
diff --git a/src/cli.py b/src/cli.py
index de01391..fef4334 100644
--- a/src/cli.py
+++ b/src/cli.py
@@ -11,6 +11,7 @@ from src.utils.logging import setup_logging
 from src.email_providers.base import MockProvider
 from src.email_providers.gmail import GmailProvider
 from src.email_providers.imap import IMAPProvider
+from src.email_providers.enron import EnronProvider
 from src.classification.feature_extractor import FeatureExtractor
 from src.classification.ml_classifier import MLClassifier
 from src.classification.llm_classifier import LLMClassifier
@@ -26,7 +27,7 @@ def cli():
 
 
 @cli.command()
-@click.option('--source', type=click.Choice(['gmail', 'imap', 'mock']), default='mock',
+@click.option('--source', type=click.Choice(['gmail', 'imap', 'mock', 'enron']), default='mock',
               help='Email provider')
 @click.option('--credentials', type=click.Path(exists=False),
               help='Path to credentials file')
@@ -80,6 +81,9 @@ def run(
         if not credentials:
             logger.error("IMAP provider requires --credentials")
             sys.exit(1)
+    elif source == 'enron':
+        provider = EnronProvider(maildir_path=".")
+        credentials = None
     else:  # mock
         logger.warning("Using MOCK provider for testing")
         provider = MockProvider()
@@ -134,6 +138,46 @@ def run(
 
     logger.info(f"Fetched {len(emails)} emails")
 
+    # Check if we need calibration (no good ML model)
+    if ml_classifier.is_mock or not ml_classifier.model:
+        logger.info("=" * 80)
+        logger.info("RUNNING CALIBRATION - Training ML model on LLM-labeled samples")
+        logger.info("=" * 80)
+
+        from src.calibration.workflow import CalibrationWorkflow, CalibrationConfig
+
+        # Create calibration LLM provider with larger model
+        calibration_llm = OllamaProvider(
+            base_url=cfg.llm.ollama.base_url,
+            model=cfg.llm.ollama.calibration_model,
+            temperature=cfg.llm.ollama.temperature,
+            max_tokens=cfg.llm.ollama.max_tokens
+        )
+        logger.info(f"Using calibration model: {cfg.llm.ollama.calibration_model}")
+
+        calibration_config = CalibrationConfig(
+            sample_size=min(1500, len(emails) // 2),  # Use 1500 or half the emails
+            validation_size=300,
+            llm_batch_size=50
+        )
+
+        calibration = CalibrationWorkflow(
+            llm_provider=calibration_llm,
+            feature_extractor=feature_extractor,
+            categories=categories,
+            config=calibration_config
+        )
+
+        # Run calibration to train ML model
+        cal_results = calibration.run(emails, model_output_path="src/models/calibrated/classifier.pkl")
+
+        # Reload the ML classifier with the new model
+        ml_classifier = MLClassifier(model_path="src/models/calibrated/classifier.pkl")
+        adaptive_classifier.ml_classifier = ml_classifier
+
+        logger.info(f"Calibration complete! Accuracy: {cal_results.get('validation_accuracy', 0):.1%}")
+        logger.info("=" * 80)
+
     # Classify emails
     logger.info("Starting classification")
     results = []
diff --git a/src/email_providers/enron.py b/src/email_providers/enron.py
new file mode 100644
index 0000000..3ab5d6c
--- /dev/null
+++ b/src/email_providers/enron.py
@@ -0,0 +1,114 @@
+"""Enron dataset provider - uses same interface as Gmail/IMAP."""
+import logging
+from typing import List, Dict, Optional
+from pathlib import Path
+
+from .base import BaseProvider, Email
+from src.calibration.enron_parser import EnronParser
+
+logger = logging.getLogger(__name__)
+
+
+class EnronProvider(BaseProvider):
+    """
+    Enron dataset provider.
+
+    Uses the same Email data model and BaseProvider interface as Gmail/IMAP,
+    ensuring test code paths are identical to production.
+    """
+
+    def __init__(self, maildir_path: str = "."):
+        """
+        Initialize Enron provider.
+
+        Args:
+            maildir_path: Path to directory containing maildir/ folder
+        """
+        self.parser = EnronParser(maildir_path)
+        self.connected = False
+
+    def connect(self, credentials: Dict = None) -> bool:
+        """
+        Connect to Enron dataset (no auth needed).
+
+        Args:
+            credentials: Not used for Enron dataset
+
+        Returns:
+            Always True for Enron
+        """
+        self.connected = True
+        logger.info("Connected to Enron dataset")
+        return True
+
+    def fetch_emails(self, limit: int = None, filters: Dict = None) -> List[Email]:
+        """
+        Fetch emails from Enron dataset.
+
+        Args:
+            limit: Maximum number of emails to fetch
+            filters: Optional filters (not implemented for Enron)
+
+        Returns:
+            List of Email objects
+        """
+        if not self.connected:
+            logger.warning("Not connected to Enron dataset")
+            return []
+
+        logger.info(f"Fetching up to {limit or 'all'} emails from Enron dataset")
+        emails = self.parser.parse_emails(limit=limit)
+        logger.info(f"Fetched {len(emails)} emails")
+
+        return emails
+
+    def get_ground_truth_label(self, email: Email) -> str:
+        """
+        Extract ground truth category from email metadata.
+
+        For Enron emails, the folder name is the ground truth label:
+        - inbox -> conversational/work
+        - sent -> conversational
+        - deleted_items -> junk
+        - etc.
+
+        Args:
+            email: Email object with metadata
+
+        Returns:
+            Folder name as ground truth category
+        """
+        # EnronParser should set this in metadata
+        return email.headers.get('X-Folder', 'unknown')
+
+    def update_labels(self, email_id: str, labels: List[str]) -> bool:
+        """
+        Update labels (not supported for Enron dataset).
+
+        Args:
+            email_id: Email ID
+            labels: List of labels to add
+
+        Returns:
+            Always False for Enron
+        """
+        logger.warning("Label updates not supported for Enron dataset")
+        return False
+
+    def batch_update(self, updates: List[Dict]) -> bool:
+        """
+        Batch update (not supported for Enron dataset).
+
+        Args:
+            updates: List of update operations
+
+        Returns:
+            Always False for Enron
+        """
+        logger.warning("Batch updates not supported for Enron dataset")
+        return False
+
+    def disconnect(self):
+        """Disconnect from Enron dataset."""
+        self.connected = False
+        logger.info("Disconnected from Enron dataset")
diff --git a/src/llm/ollama.py b/src/llm/ollama.py
index 1d5bdfa..3897c79 100644
--- a/src/llm/ollama.py
+++ b/src/llm/ollama.py
@@ -119,8 +119,8 @@ class OllamaProvider(BaseLLMProvider):
 
         try:
             # Try to list available models
-            models = self.client.list()
-            available_models = [m.get('name', '') for m in models.get('models', [])]
+            response = self.client.list()
+            available_models = [m.model for m in response.models]
 
             # Check if requested model is available
             if any(self.model in m for m in available_models):
diff --git a/src/models/pretrained/classifier.pkl.DISABLED b/src/models/pretrained/classifier.pkl.DISABLED
new file mode 100644
index 0000000..8f0f1d4
Binary files /dev/null and b/src/models/pretrained/classifier.pkl.DISABLED differ