- Rewrote CLAUDE.md with comprehensive development guide - Archived 20 old docs to docs/archive/ - Added PROJECT_ROADMAP_2025.md with research learnings - Added CLASSIFICATION_METHODS_COMPARISON.md - Added SESSION_HANDOVER_20251128.md - Added tools for analysis (brett_gmail/microsoft analyzers) - Updated .gitignore for archive folders - Config changes for local vLLM endpoint
5.7 KiB
5.7 KiB
Email Classification Report Format
This document explains the HTML report generation system, its data sources, and how to customize it.
Overview
The report generator creates a static HTML file from classification results. It requires enriched results.json with email metadata (subject, sender, date, etc.) - not just classification data.
Files Involved
| File | Purpose |
|---|---|
tools/generate_html_report.py |
Main report generator script |
src/cli.py |
Classification CLI - outputs enriched results.json |
src/export/exporter.py |
Legacy exporter (JSON/CSV) - not used for HTML |
Data Flow
Email Source (.eml/.msg files)
↓
src/cli.py (classification)
↓
results.json (enriched with metadata)
↓
tools/generate_html_report.py
↓
report.html (static, self-contained)
Usage
Generate Report
python tools/generate_html_report.py \
--input /path/to/results.json \
--output /path/to/report.html
If --output is omitted, creates report.html in same directory as input.
Full Workflow
# 1. Classify emails
python -m src.cli run \
--source local \
--directory "/path/to/emails" \
--output "/path/to/output" \
--no-llm-fallback
# 2. Generate report
python tools/generate_html_report.py \
--input "/path/to/output/results.json"
results.json Format
The report generator expects this structure:
{
"metadata": {
"total_emails": 801,
"accuracy_estimate": 0.55,
"classification_stats": {
"rule_matched": 9,
"ml_classified": 468,
"llm_classified": 0,
"needs_review": 324
},
"generated_at": "2025-11-28T02:34:00.680196",
"source": "local",
"source_path": "/path/to/emails"
},
"classifications": [
{
"email_id": "unique_id.eml",
"subject": "Email subject line",
"sender": "sender@example.com",
"sender_name": "Sender Name",
"date": "2023-04-13T09:43:29+10:00",
"has_attachments": false,
"category": "Work",
"confidence": 0.81,
"method": "ml"
}
]
}
Required Fields
| Field | Type | Description |
|---|---|---|
email_id |
string | Unique identifier (usually filename) |
subject |
string | Email subject line |
sender |
string | Sender email address |
category |
string | Assigned category |
confidence |
float | Classification confidence (0-1) |
method |
string | Classification method: ml, rule, or llm |
Optional Fields
| Field | Type | Description |
|---|---|---|
sender_name |
string | Display name of sender |
date |
string | ISO 8601 date string |
has_attachments |
boolean | Whether email has attachments |
Report Sections
1. Header
- Report title
- Generation timestamp
- Source info
- Total email count
2. Stats Grid
- Total emails
- Number of categories
- High confidence count (>=70%)
- Unique sender domains
3. Category Distribution
- Horizontal bar chart
- Count and percentage per category
- Sorted by count (descending)
4. Classification Methods
- Breakdown of ML vs Rule vs LLM
- Shows which method handled what percentage
5. Confidence Distribution
- High (>=70%): Green
- Medium (50-70%): Yellow
- Low (<50%): Red
6. Top Senders
- Top 20 senders by email count
- Grid layout
7. Email Tables (Tabbed)
- "All" tab shows all emails
- Category tabs filter by category
- Search box filters by subject/sender
- Columns: Date, Subject, Sender, Category, Confidence, Method
- Sorted by date (newest first)
- Attachment indicator (📎)
Customization
Changing Colors
Edit the CSS variables in generate_html_report.py:
:root {
--bg-primary: #1a1a2e; /* Main background */
--bg-secondary: #16213e; /* Card backgrounds */
--bg-card: #0f3460; /* Nested elements */
--text-primary: #eee; /* Main text */
--text-secondary: #aaa; /* Muted text */
--accent: #e94560; /* Accent color (red) */
--accent-hover: #ff6b6b; /* Accent hover */
--success: #00d9a5; /* Green (high confidence) */
--warning: #ffc107; /* Yellow (medium confidence) */
--border: #2a2a4a; /* Border color */
}
Light Theme Example
:root {
--bg-primary: #f5f5f5;
--bg-secondary: #ffffff;
--bg-card: #e8e8e8;
--text-primary: #333;
--text-secondary: #666;
--accent: #2563eb;
--accent-hover: #3b82f6;
--success: #10b981;
--warning: #f59e0b;
--border: #d1d5db;
}
Adding New Sections
- Add data extraction in
generate_html_report()function - Add HTML section in the main template string
- Style with existing CSS classes or add new ones
Adding New Table Columns
- Modify
generate_email_row()function - Add
<th>in table header - Add
<td>in row template
Performance Notes
- Report is fully static (no server required)
- JavaScript is minimal (tab switching, search filtering)
- Handles 1000+ emails without performance issues
- For 10k+ emails, consider pagination (not yet implemented)
Future Enhancements (TODO)
- Pagination for large datasets
- Export to PDF option
- Configurable color themes via CLI
- Column sorting (click headers)
- Date range filter
- Sender domain grouping
- Category confidence heatmap
- Email body preview on hover
Troubleshooting
"KeyError: 'subject'"
Results.json lacks email metadata. Re-run classification with latest cli.py.
Empty tables
Check that results.json has classifications array with data.
Dates showing "N/A"
Date parsing failed. Check date format in results.json is ISO 8601.
Search not working
JavaScript error. Check browser console. Ensure no HTML entities in data.