# Email Classification Report Format This document explains the HTML report generation system, its data sources, and how to customize it. ## Overview The report generator creates a static HTML file from classification results. It requires enriched `results.json` with email metadata (subject, sender, date, etc.) - not just classification data. ## Files Involved | File | Purpose | |------|---------| | `tools/generate_html_report.py` | Main report generator script | | `src/cli.py` | Classification CLI - outputs enriched `results.json` | | `src/export/exporter.py` | Legacy exporter (JSON/CSV) - not used for HTML | ## Data Flow ``` Email Source (.eml/.msg files) ↓ src/cli.py (classification) ↓ results.json (enriched with metadata) ↓ tools/generate_html_report.py ↓ report.html (static, self-contained) ``` ## Usage ### Generate Report ```bash python tools/generate_html_report.py \ --input /path/to/results.json \ --output /path/to/report.html ``` If `--output` is omitted, creates `report.html` in same directory as input. ### Full Workflow ```bash # 1. Classify emails python -m src.cli run \ --source local \ --directory "/path/to/emails" \ --output "/path/to/output" \ --no-llm-fallback # 2. Generate report python tools/generate_html_report.py \ --input "/path/to/output/results.json" ``` ## results.json Format The report generator expects this structure: ```json { "metadata": { "total_emails": 801, "accuracy_estimate": 0.55, "classification_stats": { "rule_matched": 9, "ml_classified": 468, "llm_classified": 0, "needs_review": 324 }, "generated_at": "2025-11-28T02:34:00.680196", "source": "local", "source_path": "/path/to/emails" }, "classifications": [ { "email_id": "unique_id.eml", "subject": "Email subject line", "sender": "sender@example.com", "sender_name": "Sender Name", "date": "2023-04-13T09:43:29+10:00", "has_attachments": false, "category": "Work", "confidence": 0.81, "method": "ml" } ] } ``` ### Required Fields | Field | Type | Description | |-------|------|-------------| | `email_id` | string | Unique identifier (usually filename) | | `subject` | string | Email subject line | | `sender` | string | Sender email address | | `category` | string | Assigned category | | `confidence` | float | Classification confidence (0-1) | | `method` | string | Classification method: `ml`, `rule`, or `llm` | ### Optional Fields | Field | Type | Description | |-------|------|-------------| | `sender_name` | string | Display name of sender | | `date` | string | ISO 8601 date string | | `has_attachments` | boolean | Whether email has attachments | ## Report Sections ### 1. Header - Report title - Generation timestamp - Source info - Total email count ### 2. Stats Grid - Total emails - Number of categories - High confidence count (>=70%) - Unique sender domains ### 3. Category Distribution - Horizontal bar chart - Count and percentage per category - Sorted by count (descending) ### 4. Classification Methods - Breakdown of ML vs Rule vs LLM - Shows which method handled what percentage ### 5. Confidence Distribution - High (>=70%): Green - Medium (50-70%): Yellow - Low (<50%): Red ### 6. Top Senders - Top 20 senders by email count - Grid layout ### 7. Email Tables (Tabbed) - "All" tab shows all emails - Category tabs filter by category - Search box filters by subject/sender - Columns: Date, Subject, Sender, Category, Confidence, Method - Sorted by date (newest first) - Attachment indicator (📎) ## Customization ### Changing Colors Edit the CSS variables in `generate_html_report.py`: ```css :root { --bg-primary: #1a1a2e; /* Main background */ --bg-secondary: #16213e; /* Card backgrounds */ --bg-card: #0f3460; /* Nested elements */ --text-primary: #eee; /* Main text */ --text-secondary: #aaa; /* Muted text */ --accent: #e94560; /* Accent color (red) */ --accent-hover: #ff6b6b; /* Accent hover */ --success: #00d9a5; /* Green (high confidence) */ --warning: #ffc107; /* Yellow (medium confidence) */ --border: #2a2a4a; /* Border color */ } ``` ### Light Theme Example ```css :root { --bg-primary: #f5f5f5; --bg-secondary: #ffffff; --bg-card: #e8e8e8; --text-primary: #333; --text-secondary: #666; --accent: #2563eb; --accent-hover: #3b82f6; --success: #10b981; --warning: #f59e0b; --border: #d1d5db; } ``` ### Adding New Sections 1. Add data extraction in `generate_html_report()` function 2. Add HTML section in the main template string 3. Style with existing CSS classes or add new ones ### Adding New Table Columns 1. Modify `generate_email_row()` function 2. Add `