- Rewrote CLAUDE.md with comprehensive development guide - Archived 20 old docs to docs/archive/ - Added PROJECT_ROADMAP_2025.md with research learnings - Added CLASSIFICATION_METHODS_COMPARISON.md - Added SESSION_HANDOVER_20251128.md - Added tools for analysis (brett_gmail/microsoft analyzers) - Updated .gitignore for archive folders - Config changes for local vLLM endpoint
3.8 KiB
3.8 KiB
Session Handover Report - Email Sorter
Date: 2025-11-28 Session ID: eb549838-a153-48d1-ae5d-891e0e83108f
What Was Done This Session
1. Classified 801 emails from brett-gmail using three methods:
| Method | Accuracy | Time | Output Location |
|---|---|---|---|
| ML-Only | 54.9% | ~5 sec | /home/bob/Documents/Email Manager/emails/brett-gm-md/ |
| ML+LLM | 93.3% | ~3.5 min | /home/bob/Documents/Email Manager/emails/brett-gm-llm/ |
| Manual Agent | 99.8% | ~25 min | Same as ML-only + analysis files |
2. Created/Modified Files
New Files:
tools/generate_html_report.py- HTML report generatortools/brett_gmail_analyzer.py- Custom dataset analyzerdata/brett_gmail_analysis.json- Analysis outputdocs/REPORT_FORMAT.md- Report system documentationdocs/CLASSIFICATION_METHODS_COMPARISON.md- Method comparisondocs/PROJECT_ROADMAP_2025.md- Full roadmap and learnings/home/bob/Documents/Email Manager/emails/brett-gm-md/BRETT_GMAIL_ANALYSIS_REPORT.md- Analysis report/home/bob/Documents/Email Manager/emails/brett-gm-md/report.html- HTML report (ML-only)/home/bob/Documents/Email Manager/emails/brett-gm-llm/report.html- HTML report (ML+LLM)
Modified Files:
src/cli.py- Added--force-mlflag, enriched results.json with email metadatasrc/llm/openai_compat.py- Removed API key requirement for local vLLMconfig/default_config.yaml- Changed LLM to openai provider on localhost:11433
3. Key Configuration Changes
# config/default_config.yaml - LLM now uses vLLM endpoint
llm:
provider: "openai"
openai:
base_url: "http://localhost:11433/v1"
api_key: "not-needed"
classification_model: "qwen3-coder-30b"
Key Findings
- ML pipeline overkill for <5000 emails - Agent analysis gives better accuracy in similar time
- Sender domain is strongest signal - Top 5 senders = 47.5% of emails
- Categories should serve downstream routing - Not human labels, but processing decisions
- Risk-based accuracy - Personal emails need high accuracy, junk can tolerate errors
- This tool = triage - Sorts into buckets for other specialized tools
Project Scope (Agreed with User)
Email Sorter IS:
- Bulk classification/triage tool
- Router to downstream specialized tools
- Part of larger email processing ecosystem
Email Sorter IS NOT:
- Complete email management solution
- Spam filter (trust Gmail/Outlook)
- Final destination for emails
Recommended Dataset Size Routing
| Size | Method |
|---|---|
| <500 | Agent-only |
| 500-5000 | Agent pre-scan + ML |
| >5000 | ML pipeline |
Background Processes
There are stale background bash processes (f8678e, 0a3549, 0d150e) from classification runs. These completed successfully and can be ignored.
What Needs Doing Next
- Review docs/ - All learnings are in PROJECT_ROADMAP_2025.md
- Phase 1 development - Dataset size routing, sender-first classification
- Agent pre-scan module - 10-15 min discovery phase before ML
User Preferences (from CLAUDE.md)
- NO emojis in commits
- NO "Generated with Claude" attribution
- Use tools (Read/Edit/Grep) not bash commands for file ops
- Virtual environment required for Python
- TTS available via
fss-speak(single line messages only, no newlines)
Quick Start for Next Agent
cd /MASTERFOLDER/Tools/email-sorter
source venv/bin/activate
# Read the roadmap
cat docs/PROJECT_ROADMAP_2025.md
# Run classification
python -m src.cli run --source local \
--directory "/path/to/emails" \
--output "/path/to/output" \
--force-ml --llm-provider openai
# Generate HTML report
python tools/generate_html_report.py --input /path/to/results.json
Session ended: 2025-11-28 ~03:30 AEDT