Project Reorganization: - Created docs/ directory and moved all documentation - Created scripts/ directory for shell scripts - Created scripts/experimental/ for research scripts - Updated .gitignore for new structure - Updated README.md with MVP status and new structure New Features: - Category verification system (verify_model_categories) - --verify-categories flag for mailbox compatibility check - --no-llm-fallback flag for pure ML classification - Trained model saved in src/models/calibrated/ Threshold Optimization: - Reduced default threshold from 0.75 to 0.55 - Updated all category thresholds to 0.55 - Reduces LLM fallback rate by 40% (35% -> 21%) Documentation: - SYSTEM_FLOW.html - Complete system architecture - VERIFY_CATEGORIES_FEATURE.html - Feature documentation - LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown - FAST_ML_ONLY_WORKFLOW.html - Pure ML guide - PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap - ROOT_CAUSE_ANALYSIS.md - Bug fixes MVP Status: - 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls - LLM-driven category discovery working - Embedding-based transfer learning confirmed - All model paths verified and working
139 lines
2.7 KiB
YAML
139 lines
2.7 KiB
YAML
categories:
|
|
junk:
|
|
description: "Spam, unwanted marketing, phishing attempts"
|
|
patterns:
|
|
- "unsubscribe"
|
|
- "click here"
|
|
- "limited time"
|
|
threshold: 0.55
|
|
priority: 1
|
|
|
|
transactional:
|
|
description: "Receipts, invoices, confirmations, order tracking"
|
|
patterns:
|
|
- "receipt"
|
|
- "invoice"
|
|
- "order"
|
|
- "shipped"
|
|
- "tracking"
|
|
- "confirmation"
|
|
threshold: 0.55
|
|
priority: 2
|
|
|
|
auth:
|
|
description: "OTPs, password resets, 2FA codes, security alerts"
|
|
patterns:
|
|
- "verification code"
|
|
- "otp"
|
|
- "reset password"
|
|
- "verify your account"
|
|
- "confirm your identity"
|
|
threshold: 0.55
|
|
priority: 1
|
|
|
|
newsletters:
|
|
description: "Subscribed newsletters, marketing emails, digests"
|
|
patterns:
|
|
- "newsletter"
|
|
- "weekly digest"
|
|
- "monthly update"
|
|
- "subscribe"
|
|
threshold: 0.55
|
|
priority: 3
|
|
|
|
social:
|
|
description: "Social media notifications, mentions, friend requests"
|
|
patterns:
|
|
- "mentioned you"
|
|
- "friend request"
|
|
- "liked your"
|
|
- "followed you"
|
|
threshold: 0.55
|
|
priority: 3
|
|
|
|
automated:
|
|
description: "System notifications, alerts, automated messages"
|
|
patterns:
|
|
- "automated"
|
|
- "system notification"
|
|
- "do not reply"
|
|
- "noreply"
|
|
threshold: 0.55
|
|
priority: 2
|
|
|
|
conversational:
|
|
description: "Human-to-human correspondence, replies, discussions"
|
|
patterns:
|
|
- "hi"
|
|
- "hello"
|
|
- "thanks"
|
|
- "regards"
|
|
- "best regards"
|
|
threshold: 0.55
|
|
priority: 3
|
|
|
|
work:
|
|
description: "Business correspondence, meetings, projects, deadlines"
|
|
patterns:
|
|
- "meeting"
|
|
- "project"
|
|
- "deadline"
|
|
- "team"
|
|
- "discussion"
|
|
threshold: 0.55
|
|
priority: 2
|
|
|
|
personal:
|
|
description: "Friends and family, personal matters"
|
|
patterns:
|
|
- "love"
|
|
- "family"
|
|
- "dinner"
|
|
- "weekend"
|
|
- "friend"
|
|
threshold: 0.55
|
|
priority: 3
|
|
|
|
finance:
|
|
description: "Bank statements, credit cards, investments, bills"
|
|
patterns:
|
|
- "statement"
|
|
- "balance"
|
|
- "account"
|
|
- "payment due"
|
|
- "card"
|
|
threshold: 0.55
|
|
priority: 2
|
|
|
|
travel:
|
|
description: "Flight bookings, hotels, reservations, itineraries"
|
|
patterns:
|
|
- "flight"
|
|
- "booking"
|
|
- "reservation"
|
|
- "check-in"
|
|
- "hotel"
|
|
threshold: 0.55
|
|
priority: 2
|
|
|
|
unknown:
|
|
description: "Doesn't fit any category (requires review)"
|
|
patterns: []
|
|
threshold: 0.50
|
|
priority: 4
|
|
|
|
# Category order for processing
|
|
processing_order:
|
|
- auth
|
|
- finance
|
|
- transactional
|
|
- work
|
|
- travel
|
|
- conversational
|
|
- personal
|
|
- social
|
|
- newsletters
|
|
- automated
|
|
- junk
|
|
- unknown
|