FSSCoding 8f25e30f52 Rewrite CLAUDE.md and clean project structure

- Rewrote CLAUDE.md with comprehensive development guide
- Archived 20 old docs to docs/archive/
- Added PROJECT_ROADMAP_2025.md with research learnings
- Added CLASSIFICATION_METHODS_COMPARISON.md
- Added SESSION_HANDOVER_20251128.md
- Added tools for analysis (brett_gmail/microsoft analyzers)
- Updated .gitignore for archive folders
- Config changes for local vLLM endpoint

2025-11-28 13:07:27 +11:00

3.8 KiB

Raw Permalink Blame History

Session Handover Report - Email Sorter

Date: 2025-11-28 Session ID: eb549838-a153-48d1-ae5d-891e0e83108f

What Was Done This Session

1. Classified 801 emails from brett-gmail using three methods:

Method	Accuracy	Time	Output Location
ML-Only	54.9%	~5 sec	`/home/bob/Documents/Email Manager/emails/brett-gm-md/`
ML+LLM	93.3%	~3.5 min	`/home/bob/Documents/Email Manager/emails/brett-gm-llm/`
Manual Agent	99.8%	~25 min	Same as ML-only + analysis files

2. Created/Modified Files

New Files:

tools/generate_html_report.py - HTML report generator
tools/brett_gmail_analyzer.py - Custom dataset analyzer
data/brett_gmail_analysis.json - Analysis output
docs/REPORT_FORMAT.md - Report system documentation
docs/CLASSIFICATION_METHODS_COMPARISON.md - Method comparison
docs/PROJECT_ROADMAP_2025.md - Full roadmap and learnings
/home/bob/Documents/Email Manager/emails/brett-gm-md/BRETT_GMAIL_ANALYSIS_REPORT.md - Analysis report
/home/bob/Documents/Email Manager/emails/brett-gm-md/report.html - HTML report (ML-only)
/home/bob/Documents/Email Manager/emails/brett-gm-llm/report.html - HTML report (ML+LLM)

Modified Files:

src/cli.py - Added --force-ml flag, enriched results.json with email metadata
src/llm/openai_compat.py - Removed API key requirement for local vLLM
config/default_config.yaml - Changed LLM to openai provider on localhost:11433

3. Key Configuration Changes

# config/default_config.yaml - LLM now uses vLLM endpoint
llm:
  provider: "openai"
  openai:
    base_url: "http://localhost:11433/v1"
    api_key: "not-needed"
    classification_model: "qwen3-coder-30b"

Key Findings

ML pipeline overkill for <5000 emails - Agent analysis gives better accuracy in similar time
Sender domain is strongest signal - Top 5 senders = 47.5% of emails
Categories should serve downstream routing - Not human labels, but processing decisions
Risk-based accuracy - Personal emails need high accuracy, junk can tolerate errors
This tool = triage - Sorts into buckets for other specialized tools

Project Scope (Agreed with User)

Email Sorter IS:

Bulk classification/triage tool
Router to downstream specialized tools
Part of larger email processing ecosystem

Email Sorter IS NOT:

Complete email management solution
Spam filter (trust Gmail/Outlook)
Final destination for emails

Recommended Dataset Size Routing

Size	Method
<500	Agent-only
500-5000	Agent pre-scan + ML
>5000	ML pipeline

Background Processes

There are stale background bash processes (f8678e, 0a3549, 0d150e) from classification runs. These completed successfully and can be ignored.

What Needs Doing Next

Review docs/ - All learnings are in PROJECT_ROADMAP_2025.md
Phase 1 development - Dataset size routing, sender-first classification
Agent pre-scan module - 10-15 min discovery phase before ML

User Preferences (from CLAUDE.md)

NO emojis in commits
NO "Generated with Claude" attribution
Use tools (Read/Edit/Grep) not bash commands for file ops
Virtual environment required for Python
TTS available via fss-speak (single line messages only, no newlines)

Quick Start for Next Agent

cd /MASTERFOLDER/Tools/email-sorter
source venv/bin/activate

# Read the roadmap
cat docs/PROJECT_ROADMAP_2025.md

# Run classification
python -m src.cli run --source local \
  --directory "/path/to/emails" \
  --output "/path/to/output" \
  --force-ml --llm-provider openai

# Generate HTML report
python tools/generate_html_report.py --input /path/to/results.json

Session ended: 2025-11-28 ~03:30 AEDT

3.8 KiB Raw Permalink Blame History