email-sorter

BobAi/email-sorter

Fork 0

Commit Graph

Author	SHA1	Message	Date
FSSCoding	8f25e30f52	Rewrite CLAUDE.md and clean project structure - Rewrote CLAUDE.md with comprehensive development guide - Archived 20 old docs to docs/archive/ - Added PROJECT_ROADMAP_2025.md with research learnings - Added CLASSIFICATION_METHODS_COMPARISON.md - Added SESSION_HANDOVER_20251128.md - Added tools for analysis (brett_gmail/microsoft analyzers) - Updated .gitignore for archive folders - Config changes for local vLLM endpoint	2025-11-28 13:07:27 +11:00
FSSCoding	4eee962c09	Add local file provider for .msg and .eml email files - Created LocalFileParser for parsing Outlook .msg and .eml files - Created LocalFileProvider implementing BaseProvider interface - Updated CLI to support --source local --directory path - Supports recursive directory scanning - Parses 952 emails in ~3 seconds Enables classification of local email file archives without needing email account credentials.	2025-11-14 17:13:10 +11:00
FSSCoding	53174a34eb	Organize project structure and add MVP features Project Reorganization: - Created docs/ directory and moved all documentation - Created scripts/ directory for shell scripts - Created scripts/experimental/ for research scripts - Updated .gitignore for new structure - Updated README.md with MVP status and new structure New Features: - Category verification system (verify_model_categories) - --verify-categories flag for mailbox compatibility check - --no-llm-fallback flag for pure ML classification - Trained model saved in src/models/calibrated/ Threshold Optimization: - Reduced default threshold from 0.75 to 0.55 - Updated all category thresholds to 0.55 - Reduces LLM fallback rate by 40% (35% -> 21%) Documentation: - SYSTEM_FLOW.html - Complete system architecture - VERIFY_CATEGORIES_FEATURE.html - Feature documentation - LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown - FAST_ML_ONLY_WORKFLOW.html - Pure ML guide - PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap - ROOT_CAUSE_ANALYSIS.md - Bug fixes MVP Status: - 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls - LLM-driven category discovery working - Embedding-based transfer learning confirmed - All model paths verified and working	2025-10-25 14:46:58 +11:00

Author

SHA1

Message

Date

FSSCoding

8f25e30f52

Rewrite CLAUDE.md and clean project structure

- Rewrote CLAUDE.md with comprehensive development guide
- Archived 20 old docs to docs/archive/
- Added PROJECT_ROADMAP_2025.md with research learnings
- Added CLASSIFICATION_METHODS_COMPARISON.md
- Added SESSION_HANDOVER_20251128.md
- Added tools for analysis (brett_gmail/microsoft analyzers)
- Updated .gitignore for archive folders
- Config changes for local vLLM endpoint

2025-11-28 13:07:27 +11:00

FSSCoding

4eee962c09

Add local file provider for .msg and .eml email files

- Created LocalFileParser for parsing Outlook .msg and .eml files
- Created LocalFileProvider implementing BaseProvider interface
- Updated CLI to support --source local --directory path
- Supports recursive directory scanning
- Parses 952 emails in ~3 seconds

Enables classification of local email file archives without needing
email account credentials.

2025-11-14 17:13:10 +11:00

FSSCoding

53174a34eb

Organize project structure and add MVP features

Project Reorganization:
- Created docs/ directory and moved all documentation
- Created scripts/ directory for shell scripts
- Created scripts/experimental/ for research scripts
- Updated .gitignore for new structure
- Updated README.md with MVP status and new structure

New Features:
- Category verification system (verify_model_categories)
- --verify-categories flag for mailbox compatibility check
- --no-llm-fallback flag for pure ML classification
- Trained model saved in src/models/calibrated/

Threshold Optimization:
- Reduced default threshold from 0.75 to 0.55
- Updated all category thresholds to 0.55
- Reduces LLM fallback rate by 40% (35% -> 21%)

Documentation:
- SYSTEM_FLOW.html - Complete system architecture
- VERIFY_CATEGORIES_FEATURE.html - Feature documentation
- LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown
- FAST_ML_ONLY_WORKFLOW.html - Pure ML guide
- PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap
- ROOT_CAUSE_ANALYSIS.md - Bug fixes

MVP Status:
- 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls
- LLM-driven category discovery working
- Embedding-based transfer learning confirmed
- All model paths verified and working

2025-10-25 14:46:58 +11:00

3 Commits