Brett Fox
ee6c27693d
Add queue management, embedding optimization, and calibration workflow
Queue Manager (queue_manager.py)
- LLMQueue: Manage emails awaiting LLM review
* Batching with configurable batch size
* Persistence to disk (JSON format)
* Retry management (up to 3 retries)
* Status tracking: queue, processing, completed, failed
* Statistics tracking
Embedding Cache & Batch Processing (embedding_cache.py)
- EmbeddingCache: Cache embeddings by text hash
* MD5 hashing of text
* Memory and disk caching
* Cache hit/miss statistics
* Persistent storage support
- EmbeddingBatcher: Efficient batch embedding generation
* Parallel batch processing
* Cache-aware to avoid recomputation
* Configurable batch size
* Error handling with zero fallback
Calibration Workflow (workflow.py)
- CalibrationWorkflow: Complete end-to-end calibration
* Step 1: Stratified email sampling
* Step 2: LLM category discovery
* Step 3: Label emails from discovery
* Step 4: Train LightGBM model
* Step 5: Validate on held-out set
* Save trained model
- CalibrationConfig: Configurable workflow parameters
* Sample size (1500)
* Validation size (300)
* Model hyperparameters
* LLM batch size
NOW ALL MISSING COMPONENTS COMPLETE:
✅ Threshold adjustment (learns from LLM)
✅ Pattern learning (sender-specific rules)
✅ Attachment analysis (PDF, DOCX, etc.)
✅ Real model trainer (LightGBM)
✅ Provider sync (Gmail + IMAP)
✅ Queue management (batching + persistence)
✅ Embedding optimization (caching + batching)
✅ Complete calibration workflow
SYSTEM NOW COMPLETE WITH ALL COMPONENTS
Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 12:00:26 +11:00
..
2025-10-21 11:59:25 +11:00
2025-10-21 12:00:26 +11:00
2025-10-21 12:00:26 +11:00
2025-10-21 11:45:06 +11:00
2025-10-21 11:59:25 +11:00
2025-10-21 11:36:51 +11:00
2025-10-21 11:36:51 +11:00
2025-10-21 12:00:26 +11:00
2025-10-21 11:36:51 +11:00
2025-10-21 11:36:51 +11:00
2025-10-21 11:36:51 +11:00
2025-10-21 11:36:51 +11:00
2025-10-21 11:52:09 +11:00