email-sorter/scripts/experimental/spot_check_results.txt
FSSCoding 53174a34eb Organize project structure and add MVP features
Project Reorganization:
- Created docs/ directory and moved all documentation
- Created scripts/ directory for shell scripts
- Created scripts/experimental/ for research scripts
- Updated .gitignore for new structure
- Updated README.md with MVP status and new structure

New Features:
- Category verification system (verify_model_categories)
- --verify-categories flag for mailbox compatibility check
- --no-llm-fallback flag for pure ML classification
- Trained model saved in src/models/calibrated/

Threshold Optimization:
- Reduced default threshold from 0.75 to 0.55
- Updated all category thresholds to 0.55
- Reduces LLM fallback rate by 40% (35% -> 21%)

Documentation:
- SYSTEM_FLOW.html - Complete system architecture
- VERIFY_CATEGORIES_FEATURE.html - Feature documentation
- LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown
- FAST_ML_ONLY_WORKFLOW.html - Pure ML guide
- PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap
- ROOT_CAUSE_ANALYSIS.md - Bug fixes

MVP Status:
- 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls
- LLM-driven category discovery working
- Embedding-based transfer learning confirmed
- All model paths verified and working
2025-10-25 14:46:58 +11:00

304 lines
9.0 KiB
Plaintext

================================================================================
SMART CLASSIFICATION SPOT-CHECK
================================================================================
Loading results from: results_100k/results.json
Total emails: 100,000
Analyzing classification patterns...
Selected 30 emails for spot-checking
- high_conf_suspicious: 10 samples
- low_conf_obvious: 2 samples
- mid_conf_edge_cases: 0 samples
- category_anomalies: 8 samples
- random_check: 10 samples
Loading email content...
Loaded 100,000 emails
================================================================================
SPOT-CHECK SAMPLES
================================================================================
[1] HIGH CONFIDENCE - Potential Overconfidence
--------------------------------------------------------------------------------
These have very high confidence. Check if they're actually correct.
Sample 1:
Category: Administrative
Confidence: 1.000
Method: ml
From: john.arnold@enron.com
Subject: RE:
Body preview: i'll get the movie and wine. my suggestion is something from central market but i'm easy
-----Original Message-----
From: Ward, Kim S (Houston)
Sent: Monday, July 02, 2001 5:29 PM
To: Arnold, Jo...
Sample 2:
Category: Administrative
Confidence: 1.000
Method: ml
From: eric.bass@enron.com
Subject: Re: New deals
Body preview: Can you spell S-N-O-O-T-Y?
e
From: Ami Chokshi @ ENRON 01/06/2000 05:38 PM
To: Eric Bass/HOU/ECT@ECT
cc:
Subject: Re: New deals
Was E-R-I-C too hard to w...
Sample 3:
Category: Meeting
Confidence: 1.000
Method: ml
From: amy.fitzpatrick@enron.com
Subject: MEETING TONIGHT - 6:00 pm Central Time at The Houstonian
Body preview: Throughout this week, we have a team from UBS in Houston to introduce and discuss the NETCO business and associated HR matters.
In this regard, please make yourself available for a meeting tonight b...
Sample 4:
Category: Meeting
Confidence: 1.000
Method: ml
From: james.steffes@enron.com
Subject:
Body preview: Jeff --
Please add John Neslage to your e-mail list.
Jim...
Sample 5:
Category: Financial
Confidence: 1.000
Method: ml
From: sheri.thomas@enron.com
Subject: Fercinfo2 (The Whole Picture)
Body preview: Sally - just an fyi... Jeff Hodge requested that we send him the information
below. Evidently, the FERC has requested that several US wholesale companies
provide a great deal of information to the...
[2] LOW CONFIDENCE - Might Be Obvious
--------------------------------------------------------------------------------
These have low confidence. Check if they're actually obvious.
Sample 1:
Category: unknown
Confidence: 0.500
Method: llm
From: k..allen@enron.com
Subject: FW:
Body preview: Greg,
After making an election in October to receive a full distribution of my deferral account under Section 6.3 of the plan, a disagreement has arisen regarding the Phantom Stock Account.
Se...
Sample 2:
Category: unknown
Confidence: 0.500
Method: llm
From: mitch.robinson@enron.com
Subject: Running Units
Body preview: Given the sale, etc of the units, don't sell any power off the units, and
don't run the units (any of the six plants) for any reason without first
getting my specific permission.
Thanks,
Mitch...
[3] MIDDLE CONFIDENCE - Edge Cases
--------------------------------------------------------------------------------
These are in the middle. Most likely to be tricky classifications.
[4] CATEGORY ANOMALIES - Rare Categories with High Confidence
--------------------------------------------------------------------------------
These are high confidence but in small categories. Might be mislabeled.
Sample 1:
Category: California Market
Confidence: 1.000
Method: ml
From: dhunter@s-k-w.com
Subject: FW: Direct Access Language
Body preview: -----Original Message-----
From: Mike Florio [mailto:mflorio@turn.org]
Sent: Tuesday, September 11, 2001 3:23 AM
To: Delaney Hunter
Subject: Direct Access Language
Delaney-- DJ asked me to forward ...
Sample 2:
Category: auth
Confidence: 0.990
Method: rule
From: david.roland@enron.com
Subject: FW: Notices and Agenda for Dec 21 ServiceCo Board Meeting
Body preview: Vicki, Dave, Mark and Jimmie,
We're scheduling a pre-meeting to the ServiceCo Board meeting at 11:30 a.m. tomorrow (Friday) in Dave's office.
Thanks,
David
-----Original Message-----
From: Rolan...
Sample 3:
Category: transactional
Confidence: 0.970
Method: rule
From: orders@amazon.com
Subject: Cancellation from Amazon.com Order (#107-0663988-7584503)
Body preview: Greetings from Amazon.com. You have successfully cancelled an item
from your order #107-0663988-7584503
For your reference, here is a summary of your order:
Order #107-0663988-7584503 - placed Dec...
Sample 4:
Category: Forwarded
Confidence: 1.000
Method: ml
From: jefferson.sorenson@enron.com
Subject: UNIFY TO SAP INTERFACES
Body preview: ---------------------- Forwarded by Jefferson D Sorenson/HOU/ECT on
07/05/2000 04:58 PM ---------------------------
Bob Klein
07/05/2000 04:57 PM
To: Jefferson D Sorenson/HOU/ECT@ECT
cc: Rebecca Fo...
Sample 5:
Category: Urgent
Confidence: 1.000
Method: ml
From: l..garcia@enron.com
Subject: RE: LUNCH
Body preview: You Idiot! Why are you sending emails to people who wont get them (Reese, Dustin, Blaine, Greer, Reeves), and who the hell is AC? Mr. Huddle and the Horseman?????????????? Did you fall and hit your he...
[5] RANDOM CHECK - General Quality Check
--------------------------------------------------------------------------------
Random samples from each category for general quality assessment.
Sample 1:
Category: Administrative
Confidence: 1.000
Method: ml
From: cameron@perfect.com
Subject: RE: Directions
Body preview: I will send this out. Yes, we can talk tonight. When will you be at the
house?
Cameron Sellers
Vice President, Business Development
PERFECT
1860 Embarcadero Road - Suite 210
Palo Alto, CA 94303
ca...
Sample 2:
Category: Meeting
Confidence: 1.000
Method: ml
From: perfmgmt@enron.com
Subject: Mid-Year 2001 Performance Feedback
Body preview: DEAN, CLINT E,
?
You have been selected to participate in the Mid Year 2001 Performance
Management process. Your feedback plays an important role in the process,
and your participation is critical ...
Sample 3:
Category: Financial
Confidence: 1.000
Method: ml
From: schwabalerts.marketupdates@schwab.com
Subject: Midday Market View for June 7, 2001
Body preview: Charles Schwab & Co., Inc.
Midday Market View(TM) for Thursday, June 7, 2001
as of 1:00PM EDT
Information provided by Standard & Poor's
==============================================================...
Sample 4:
Category: Work
Confidence: 1.000
Method: ml
From: enron.announcements@enron.com
Subject: SUPPLEMENTAL Weekend Outage Report for 11-10-00
Body preview: ------------------------------------------------------------------------------
------------------------
W E E K E N D S Y S T E M S A V A I L A B I L I T Y
F O R
November 10, 2000 5:00pm through...
Sample 5:
Category: Operational
Confidence: 1.000
Method: ml
From: phillip.allen@enron.com
Subject: Re: Insight Hardware
Body preview: I have not received the aircard 300 yet.
Phillip...
================================================================================
CATEGORY DISTRIBUTION
================================================================================
Category Total High Conf Low Conf Avg Conf
--------------------------------------------------------------------------------
Administrative 67,195 67,191 0 1.000
Work 14,223 14,213 0 1.000
Meeting 7,785 7,783 0 1.000
Financial 5,943 5,943 0 1.000
Operational 3,274 3,272 0 1.000
junk 394 394 0 0.960
work 368 368 0 0.950
Miscellaneous 238 238 0 1.000
Technical 193 193 0 1.000
External 137 137 0 1.000
Announcements 113 112 0 0.999
transactional 44 44 0 0.970
auth 37 37 0 0.990
unknown 23 0 23 0.500
Forwarded 16 16 0 0.999
California Market 6 6 0 1.000
Prehearing 6 6 0 0.974
Change 3 3 0 1.000
Urgent 1 1 0 1.000
Monitoring 1 1 0 1.000
================================================================================
DONE!
================================================================================