email-sorter/config/categories.yaml
FSSCoding 53174a34eb Organize project structure and add MVP features
Project Reorganization:
- Created docs/ directory and moved all documentation
- Created scripts/ directory for shell scripts
- Created scripts/experimental/ for research scripts
- Updated .gitignore for new structure
- Updated README.md with MVP status and new structure

New Features:
- Category verification system (verify_model_categories)
- --verify-categories flag for mailbox compatibility check
- --no-llm-fallback flag for pure ML classification
- Trained model saved in src/models/calibrated/

Threshold Optimization:
- Reduced default threshold from 0.75 to 0.55
- Updated all category thresholds to 0.55
- Reduces LLM fallback rate by 40% (35% -> 21%)

Documentation:
- SYSTEM_FLOW.html - Complete system architecture
- VERIFY_CATEGORIES_FEATURE.html - Feature documentation
- LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown
- FAST_ML_ONLY_WORKFLOW.html - Pure ML guide
- PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap
- ROOT_CAUSE_ANALYSIS.md - Bug fixes

MVP Status:
- 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls
- LLM-driven category discovery working
- Embedding-based transfer learning confirmed
- All model paths verified and working
2025-10-25 14:46:58 +11:00

139 lines
2.7 KiB
YAML

categories:
junk:
description: "Spam, unwanted marketing, phishing attempts"
patterns:
- "unsubscribe"
- "click here"
- "limited time"
threshold: 0.55
priority: 1
transactional:
description: "Receipts, invoices, confirmations, order tracking"
patterns:
- "receipt"
- "invoice"
- "order"
- "shipped"
- "tracking"
- "confirmation"
threshold: 0.55
priority: 2
auth:
description: "OTPs, password resets, 2FA codes, security alerts"
patterns:
- "verification code"
- "otp"
- "reset password"
- "verify your account"
- "confirm your identity"
threshold: 0.55
priority: 1
newsletters:
description: "Subscribed newsletters, marketing emails, digests"
patterns:
- "newsletter"
- "weekly digest"
- "monthly update"
- "subscribe"
threshold: 0.55
priority: 3
social:
description: "Social media notifications, mentions, friend requests"
patterns:
- "mentioned you"
- "friend request"
- "liked your"
- "followed you"
threshold: 0.55
priority: 3
automated:
description: "System notifications, alerts, automated messages"
patterns:
- "automated"
- "system notification"
- "do not reply"
- "noreply"
threshold: 0.55
priority: 2
conversational:
description: "Human-to-human correspondence, replies, discussions"
patterns:
- "hi"
- "hello"
- "thanks"
- "regards"
- "best regards"
threshold: 0.55
priority: 3
work:
description: "Business correspondence, meetings, projects, deadlines"
patterns:
- "meeting"
- "project"
- "deadline"
- "team"
- "discussion"
threshold: 0.55
priority: 2
personal:
description: "Friends and family, personal matters"
patterns:
- "love"
- "family"
- "dinner"
- "weekend"
- "friend"
threshold: 0.55
priority: 3
finance:
description: "Bank statements, credit cards, investments, bills"
patterns:
- "statement"
- "balance"
- "account"
- "payment due"
- "card"
threshold: 0.55
priority: 2
travel:
description: "Flight bookings, hotels, reservations, itineraries"
patterns:
- "flight"
- "booking"
- "reservation"
- "check-in"
- "hotel"
threshold: 0.55
priority: 2
unknown:
description: "Doesn't fit any category (requires review)"
patterns: []
threshold: 0.50
priority: 4
# Category order for processing
processing_order:
- auth
- finance
- transactional
- work
- travel
- conversational
- personal
- social
- newsletters
- automated
- junk
- unknown