email-sorter

BobAi/email-sorter

Fork 0

Commit Graph

Author	SHA1	Message	Date
Brett Fox	02be616c5c	Phase 9-14: Complete processing pipeline, calibration, export, and orchestration PHASE 9: Processing Pipeline & Queue Management (bulk_processor.py) - BulkProcessor class for batch processing with checkpointing - ProcessingCheckpoint: Save/resume state for resumable processing - Handles batches with periodic checkpoints every N emails - Tracks completed, queued_for_llm, and failed emails - Progress callbacks for UI integration PHASE 10: Calibration System (sampler.py, llm_analyzer.py) - EmailSampler: Stratified and random sampling - Stratifies by sender domain type for representativeness - CalibrationAnalyzer: Use LLM to discover natural categories - Batched analysis to control LLM load - Maps discovered categories to universal schema PHASE 11: Export & Reporting (exporter.py) - ResultsExporter: Export to JSON, CSV, organized by category - ReportGenerator: Generate human-readable text reports - Category statistics and method breakdown - Accuracy metrics and processing time tracking PHASE 13: Enron Dataset Parser (enron_parser.py) - Parses Enron maildir format into Email objects - Handles multipart emails and attachments - Date parsing with fallback for malformed dates - Ready to train mock model on real data PHASE 14: Main Orchestration (orchestration.py) - EmailSorterOrchestrator: Coordinates entire pipeline - 4-phase workflow: Calibration → Bulk → LLM → Export - Lazy initialization of components - Progress tracking and timing - Full pipeline runner with resume support Components Now Available: ✅ Sampling (stratified and random) ✅ Calibration (LLM-driven category discovery) ✅ Bulk processing (with checkpointing) ✅ LLM review (batched) ✅ Export (JSON, CSV, by category) ✅ Reporting (text summaries) ✅ Enron parsing (ready for training) ✅ Full orchestration (4 phases) What's Left (Phases 15-16): - E2E pipeline tests - Integration test with Enron data - Setup.py and wheel packaging - Deployment documentation Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-21 11:52:09 +11:00

Author

SHA1

Message

Date

Brett Fox

02be616c5c

Phase 9-14: Complete processing pipeline, calibration, export, and orchestration

PHASE 9: Processing Pipeline & Queue Management (bulk_processor.py)
- BulkProcessor class for batch processing with checkpointing
- ProcessingCheckpoint: Save/resume state for resumable processing
- Handles batches with periodic checkpoints every N emails
- Tracks completed, queued_for_llm, and failed emails
- Progress callbacks for UI integration

PHASE 10: Calibration System (sampler.py, llm_analyzer.py)
- EmailSampler: Stratified and random sampling
- Stratifies by sender domain type for representativeness
- CalibrationAnalyzer: Use LLM to discover natural categories
- Batched analysis to control LLM load
- Maps discovered categories to universal schema

PHASE 11: Export & Reporting (exporter.py)
- ResultsExporter: Export to JSON, CSV, organized by category
- ReportGenerator: Generate human-readable text reports
- Category statistics and method breakdown
- Accuracy metrics and processing time tracking

PHASE 13: Enron Dataset Parser (enron_parser.py)
- Parses Enron maildir format into Email objects
- Handles multipart emails and attachments
- Date parsing with fallback for malformed dates
- Ready to train mock model on real data

PHASE 14: Main Orchestration (orchestration.py)
- EmailSorterOrchestrator: Coordinates entire pipeline
- 4-phase workflow: Calibration → Bulk → LLM → Export
- Lazy initialization of components
- Progress tracking and timing
- Full pipeline runner with resume support

Components Now Available:
✅ Sampling (stratified and random)
✅ Calibration (LLM-driven category discovery)
✅ Bulk processing (with checkpointing)
✅ LLM review (batched)
✅ Export (JSON, CSV, by category)
✅ Reporting (text summaries)
✅ Enron parsing (ready for training)
✅ Full orchestration (4 phases)

What's Left (Phases 15-16):
- E2E pipeline tests
- Integration test with Enron data
- Setup.py and wheel packaging
- Deployment documentation

Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-21 11:52:09 +11:00

1 Commits