email-sorter

History

Brett Fox ee6c27693d Add queue management, embedding optimization, and calibration workflow

Queue Manager (queue_manager.py)
- LLMQueue: Manage emails awaiting LLM review
  * Batching with configurable batch size
  * Persistence to disk (JSON format)
  * Retry management (up to 3 retries)
  * Status tracking: queue, processing, completed, failed
  * Statistics tracking

Embedding Cache & Batch Processing (embedding_cache.py)
- EmbeddingCache: Cache embeddings by text hash
  * MD5 hashing of text
  * Memory and disk caching
  * Cache hit/miss statistics
  * Persistent storage support
- EmbeddingBatcher: Efficient batch embedding generation
  * Parallel batch processing
  * Cache-aware to avoid recomputation
  * Configurable batch size
  * Error handling with zero fallback

Calibration Workflow (workflow.py)
- CalibrationWorkflow: Complete end-to-end calibration
  * Step 1: Stratified email sampling
  * Step 2: LLM category discovery
  * Step 3: Label emails from discovery
  * Step 4: Train LightGBM model
  * Step 5: Validate on held-out set
  * Save trained model
- CalibrationConfig: Configurable workflow parameters
  * Sample size (1500)
  * Validation size (300)
  * Model hyperparameters
  * LLM batch size

NOW ALL MISSING COMPONENTS COMPLETE:
✅ Threshold adjustment (learns from LLM)
✅ Pattern learning (sender-specific rules)
✅ Attachment analysis (PDF, DOCX, etc.)
✅ Real model trainer (LightGBM)
✅ Provider sync (Gmail + IMAP)
✅ Queue management (batching + persistence)
✅ Embedding optimization (caching + batching)
✅ Complete calibration workflow

SYSTEM NOW COMPLETE WITH ALL COMPONENTS

Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-21 12:00:26 +11:00

adjustment

CRITICAL: Add missing Phase 12 modules and advanced features

2025-10-21 11:59:25 +11:00

calibration

Add queue management, embedding optimization, and calibration workflow

2025-10-21 12:00:26 +11:00

classification

Add queue management, embedding optimization, and calibration workflow

2025-10-21 12:00:26 +11:00

email_providers

Complete IMAP provider import fixes - all type hints now use Message instead of email.message.Message