Project Reorganization: - Created docs/ directory and moved all documentation - Created scripts/ directory for shell scripts - Created scripts/experimental/ for research scripts - Updated .gitignore for new structure - Updated README.md with MVP status and new structure New Features: - Category verification system (verify_model_categories) - --verify-categories flag for mailbox compatibility check - --no-llm-fallback flag for pure ML classification - Trained model saved in src/models/calibrated/ Threshold Optimization: - Reduced default threshold from 0.75 to 0.55 - Updated all category thresholds to 0.55 - Reduces LLM fallback rate by 40% (35% -> 21%) Documentation: - SYSTEM_FLOW.html - Complete system architecture - VERIFY_CATEGORIES_FEATURE.html - Feature documentation - LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown - FAST_ML_ONLY_WORKFLOW.html - Pure ML guide - PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap - ROOT_CAUSE_ANALYSIS.md - Bug fixes MVP Status: - 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls - LLM-driven category discovery working - Embedding-based transfer learning confirmed - All model paths verified and working
304 lines
9.0 KiB
Plaintext
304 lines
9.0 KiB
Plaintext
================================================================================
|
|
SMART CLASSIFICATION SPOT-CHECK
|
|
================================================================================
|
|
|
|
Loading results from: results_100k/results.json
|
|
Total emails: 100,000
|
|
|
|
Analyzing classification patterns...
|
|
Selected 30 emails for spot-checking
|
|
|
|
- high_conf_suspicious: 10 samples
|
|
- low_conf_obvious: 2 samples
|
|
- mid_conf_edge_cases: 0 samples
|
|
- category_anomalies: 8 samples
|
|
- random_check: 10 samples
|
|
|
|
Loading email content...
|
|
Loaded 100,000 emails
|
|
|
|
================================================================================
|
|
SPOT-CHECK SAMPLES
|
|
================================================================================
|
|
|
|
[1] HIGH CONFIDENCE - Potential Overconfidence
|
|
--------------------------------------------------------------------------------
|
|
These have very high confidence. Check if they're actually correct.
|
|
|
|
Sample 1:
|
|
Category: Administrative
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: john.arnold@enron.com
|
|
Subject: RE:
|
|
Body preview: i'll get the movie and wine. my suggestion is something from central market but i'm easy
|
|
|
|
-----Original Message-----
|
|
From: Ward, Kim S (Houston)
|
|
Sent: Monday, July 02, 2001 5:29 PM
|
|
To: Arnold, Jo...
|
|
|
|
Sample 2:
|
|
Category: Administrative
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: eric.bass@enron.com
|
|
Subject: Re: New deals
|
|
Body preview: Can you spell S-N-O-O-T-Y?
|
|
|
|
e
|
|
|
|
|
|
|
|
|
|
|
|
From: Ami Chokshi @ ENRON 01/06/2000 05:38 PM
|
|
|
|
|
|
To: Eric Bass/HOU/ECT@ECT
|
|
cc:
|
|
Subject: Re: New deals
|
|
|
|
Was E-R-I-C too hard to w...
|
|
|
|
Sample 3:
|
|
Category: Meeting
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: amy.fitzpatrick@enron.com
|
|
Subject: MEETING TONIGHT - 6:00 pm Central Time at The Houstonian
|
|
Body preview: Throughout this week, we have a team from UBS in Houston to introduce and discuss the NETCO business and associated HR matters.
|
|
|
|
In this regard, please make yourself available for a meeting tonight b...
|
|
|
|
Sample 4:
|
|
Category: Meeting
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: james.steffes@enron.com
|
|
Subject:
|
|
Body preview: Jeff --
|
|
|
|
Please add John Neslage to your e-mail list.
|
|
|
|
Jim...
|
|
|
|
Sample 5:
|
|
Category: Financial
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: sheri.thomas@enron.com
|
|
Subject: Fercinfo2 (The Whole Picture)
|
|
Body preview: Sally - just an fyi... Jeff Hodge requested that we send him the information
|
|
below. Evidently, the FERC has requested that several US wholesale companies
|
|
provide a great deal of information to the...
|
|
|
|
[2] LOW CONFIDENCE - Might Be Obvious
|
|
--------------------------------------------------------------------------------
|
|
These have low confidence. Check if they're actually obvious.
|
|
|
|
Sample 1:
|
|
Category: unknown
|
|
Confidence: 0.500
|
|
Method: llm
|
|
From: k..allen@enron.com
|
|
Subject: FW:
|
|
Body preview: Greg,
|
|
|
|
After making an election in October to receive a full distribution of my deferral account under Section 6.3 of the plan, a disagreement has arisen regarding the Phantom Stock Account.
|
|
|
|
Se...
|
|
|
|
Sample 2:
|
|
Category: unknown
|
|
Confidence: 0.500
|
|
Method: llm
|
|
From: mitch.robinson@enron.com
|
|
Subject: Running Units
|
|
Body preview: Given the sale, etc of the units, don't sell any power off the units, and
|
|
don't run the units (any of the six plants) for any reason without first
|
|
getting my specific permission.
|
|
|
|
Thanks,
|
|
|
|
Mitch...
|
|
|
|
[3] MIDDLE CONFIDENCE - Edge Cases
|
|
--------------------------------------------------------------------------------
|
|
These are in the middle. Most likely to be tricky classifications.
|
|
|
|
[4] CATEGORY ANOMALIES - Rare Categories with High Confidence
|
|
--------------------------------------------------------------------------------
|
|
These are high confidence but in small categories. Might be mislabeled.
|
|
|
|
Sample 1:
|
|
Category: California Market
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: dhunter@s-k-w.com
|
|
Subject: FW: Direct Access Language
|
|
Body preview: -----Original Message-----
|
|
From: Mike Florio [mailto:mflorio@turn.org]
|
|
Sent: Tuesday, September 11, 2001 3:23 AM
|
|
To: Delaney Hunter
|
|
Subject: Direct Access Language
|
|
|
|
|
|
Delaney-- DJ asked me to forward ...
|
|
|
|
Sample 2:
|
|
Category: auth
|
|
Confidence: 0.990
|
|
Method: rule
|
|
From: david.roland@enron.com
|
|
Subject: FW: Notices and Agenda for Dec 21 ServiceCo Board Meeting
|
|
Body preview: Vicki, Dave, Mark and Jimmie,
|
|
|
|
We're scheduling a pre-meeting to the ServiceCo Board meeting at 11:30 a.m. tomorrow (Friday) in Dave's office.
|
|
|
|
Thanks,
|
|
David
|
|
|
|
|
|
-----Original Message-----
|
|
From: Rolan...
|
|
|
|
Sample 3:
|
|
Category: transactional
|
|
Confidence: 0.970
|
|
Method: rule
|
|
From: orders@amazon.com
|
|
Subject: Cancellation from Amazon.com Order (#107-0663988-7584503)
|
|
Body preview: Greetings from Amazon.com. You have successfully cancelled an item
|
|
from your order #107-0663988-7584503
|
|
|
|
For your reference, here is a summary of your order:
|
|
|
|
|
|
Order #107-0663988-7584503 - placed Dec...
|
|
|
|
Sample 4:
|
|
Category: Forwarded
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: jefferson.sorenson@enron.com
|
|
Subject: UNIFY TO SAP INTERFACES
|
|
Body preview: ---------------------- Forwarded by Jefferson D Sorenson/HOU/ECT on
|
|
07/05/2000 04:58 PM ---------------------------
|
|
|
|
|
|
Bob Klein
|
|
07/05/2000 04:57 PM
|
|
To: Jefferson D Sorenson/HOU/ECT@ECT
|
|
cc: Rebecca Fo...
|
|
|
|
Sample 5:
|
|
Category: Urgent
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: l..garcia@enron.com
|
|
Subject: RE: LUNCH
|
|
Body preview: You Idiot! Why are you sending emails to people who wont get them (Reese, Dustin, Blaine, Greer, Reeves), and who the hell is AC? Mr. Huddle and the Horseman?????????????? Did you fall and hit your he...
|
|
|
|
[5] RANDOM CHECK - General Quality Check
|
|
--------------------------------------------------------------------------------
|
|
Random samples from each category for general quality assessment.
|
|
|
|
Sample 1:
|
|
Category: Administrative
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: cameron@perfect.com
|
|
Subject: RE: Directions
|
|
Body preview: I will send this out. Yes, we can talk tonight. When will you be at the
|
|
house?
|
|
|
|
|
|
Cameron Sellers
|
|
Vice President, Business Development
|
|
PERFECT
|
|
1860 Embarcadero Road - Suite 210
|
|
Palo Alto, CA 94303
|
|
ca...
|
|
|
|
Sample 2:
|
|
Category: Meeting
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: perfmgmt@enron.com
|
|
Subject: Mid-Year 2001 Performance Feedback
|
|
Body preview: DEAN, CLINT E,
|
|
?
|
|
You have been selected to participate in the Mid Year 2001 Performance
|
|
Management process. Your feedback plays an important role in the process,
|
|
and your participation is critical ...
|
|
|
|
Sample 3:
|
|
Category: Financial
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: schwabalerts.marketupdates@schwab.com
|
|
Subject: Midday Market View for June 7, 2001
|
|
Body preview: Charles Schwab & Co., Inc.
|
|
|
|
Midday Market View(TM) for Thursday, June 7, 2001
|
|
as of 1:00PM EDT
|
|
Information provided by Standard & Poor's
|
|
|
|
==============================================================...
|
|
|
|
Sample 4:
|
|
Category: Work
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: enron.announcements@enron.com
|
|
Subject: SUPPLEMENTAL Weekend Outage Report for 11-10-00
|
|
Body preview: ------------------------------------------------------------------------------
|
|
------------------------
|
|
W E E K E N D S Y S T E M S A V A I L A B I L I T Y
|
|
|
|
F O R
|
|
|
|
November 10, 2000 5:00pm through...
|
|
|
|
Sample 5:
|
|
Category: Operational
|
|
Confidence: 1.000
|
|
Method: ml
|
|
From: phillip.allen@enron.com
|
|
Subject: Re: Insight Hardware
|
|
Body preview: I have not received the aircard 300 yet.
|
|
|
|
Phillip...
|
|
|
|
================================================================================
|
|
CATEGORY DISTRIBUTION
|
|
================================================================================
|
|
|
|
Category Total High Conf Low Conf Avg Conf
|
|
--------------------------------------------------------------------------------
|
|
Administrative 67,195 67,191 0 1.000
|
|
Work 14,223 14,213 0 1.000
|
|
Meeting 7,785 7,783 0 1.000
|
|
Financial 5,943 5,943 0 1.000
|
|
Operational 3,274 3,272 0 1.000
|
|
junk 394 394 0 0.960
|
|
work 368 368 0 0.950
|
|
Miscellaneous 238 238 0 1.000
|
|
Technical 193 193 0 1.000
|
|
External 137 137 0 1.000
|
|
Announcements 113 112 0 0.999
|
|
transactional 44 44 0 0.970
|
|
auth 37 37 0 0.990
|
|
unknown 23 0 23 0.500
|
|
Forwarded 16 16 0 0.999
|
|
California Market 6 6 0 1.000
|
|
Prehearing 6 6 0 0.974
|
|
Change 3 3 0 1.000
|
|
Urgent 1 1 0 1.000
|
|
Monitoring 1 1 0 1.000
|
|
|
|
================================================================================
|
|
DONE!
|
|
================================================================================
|