2 Commits

Author SHA1 Message Date
53174a34eb Organize project structure and add MVP features
Project Reorganization:
- Created docs/ directory and moved all documentation
- Created scripts/ directory for shell scripts
- Created scripts/experimental/ for research scripts
- Updated .gitignore for new structure
- Updated README.md with MVP status and new structure

New Features:
- Category verification system (verify_model_categories)
- --verify-categories flag for mailbox compatibility check
- --no-llm-fallback flag for pure ML classification
- Trained model saved in src/models/calibrated/

Threshold Optimization:
- Reduced default threshold from 0.75 to 0.55
- Updated all category thresholds to 0.55
- Reduces LLM fallback rate by 40% (35% -> 21%)

Documentation:
- SYSTEM_FLOW.html - Complete system architecture
- VERIFY_CATEGORIES_FEATURE.html - Feature documentation
- LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown
- FAST_ML_ONLY_WORKFLOW.html - Pure ML guide
- PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap
- ROOT_CAUSE_ANALYSIS.md - Bug fixes

MVP Status:
- 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls
- LLM-driven category discovery working
- Embedding-based transfer learning confirmed
- All model paths verified and working
2025-10-25 14:46:58 +11:00
Brett Fox
8c73f25537 Initial commit: Complete project blueprint and research
- PROJECT_BLUEPRINT.md: Full architecture with LightGBM, Qwen3, structured embeddings
- RESEARCH_FINDINGS.md: 2024 benchmarks, competition analysis, validation
- BUILD_INSTRUCTIONS.md: Step-by-step implementation guide
- README.md: User-friendly overview and quick start
- Research-backed hybrid ML/LLM email classifier
- 94-96% accuracy target, 17min for 80k emails
- Privacy-first, local processing, distributable wheel
- Modular architecture with tiered dependencies
- LLM optional (graceful degradation)
- OpenAI-compatible API support
2025-10-21 03:08:28 +11:00