- Rewrote CLAUDE.md with comprehensive development guide - Archived 20 old docs to docs/archive/ - Added PROJECT_ROADMAP_2025.md with research learnings - Added CLASSIFICATION_METHODS_COMPARISON.md - Added SESSION_HANDOVER_20251128.md - Added tools for analysis (brett_gmail/microsoft analyzers) - Updated .gitignore for archive folders - Config changes for local vLLM endpoint
233 lines
5.7 KiB
Markdown
233 lines
5.7 KiB
Markdown
# Email Classification Report Format
|
|
|
|
This document explains the HTML report generation system, its data sources, and how to customize it.
|
|
|
|
## Overview
|
|
|
|
The report generator creates a static HTML file from classification results. It requires enriched `results.json` with email metadata (subject, sender, date, etc.) - not just classification data.
|
|
|
|
## Files Involved
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `tools/generate_html_report.py` | Main report generator script |
|
|
| `src/cli.py` | Classification CLI - outputs enriched `results.json` |
|
|
| `src/export/exporter.py` | Legacy exporter (JSON/CSV) - not used for HTML |
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
Email Source (.eml/.msg files)
|
|
↓
|
|
src/cli.py (classification)
|
|
↓
|
|
results.json (enriched with metadata)
|
|
↓
|
|
tools/generate_html_report.py
|
|
↓
|
|
report.html (static, self-contained)
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Generate Report
|
|
|
|
```bash
|
|
python tools/generate_html_report.py \
|
|
--input /path/to/results.json \
|
|
--output /path/to/report.html
|
|
```
|
|
|
|
If `--output` is omitted, creates `report.html` in same directory as input.
|
|
|
|
### Full Workflow
|
|
|
|
```bash
|
|
# 1. Classify emails
|
|
python -m src.cli run \
|
|
--source local \
|
|
--directory "/path/to/emails" \
|
|
--output "/path/to/output" \
|
|
--no-llm-fallback
|
|
|
|
# 2. Generate report
|
|
python tools/generate_html_report.py \
|
|
--input "/path/to/output/results.json"
|
|
```
|
|
|
|
## results.json Format
|
|
|
|
The report generator expects this structure:
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"total_emails": 801,
|
|
"accuracy_estimate": 0.55,
|
|
"classification_stats": {
|
|
"rule_matched": 9,
|
|
"ml_classified": 468,
|
|
"llm_classified": 0,
|
|
"needs_review": 324
|
|
},
|
|
"generated_at": "2025-11-28T02:34:00.680196",
|
|
"source": "local",
|
|
"source_path": "/path/to/emails"
|
|
},
|
|
"classifications": [
|
|
{
|
|
"email_id": "unique_id.eml",
|
|
"subject": "Email subject line",
|
|
"sender": "sender@example.com",
|
|
"sender_name": "Sender Name",
|
|
"date": "2023-04-13T09:43:29+10:00",
|
|
"has_attachments": false,
|
|
"category": "Work",
|
|
"confidence": 0.81,
|
|
"method": "ml"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Required Fields
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `email_id` | string | Unique identifier (usually filename) |
|
|
| `subject` | string | Email subject line |
|
|
| `sender` | string | Sender email address |
|
|
| `category` | string | Assigned category |
|
|
| `confidence` | float | Classification confidence (0-1) |
|
|
| `method` | string | Classification method: `ml`, `rule`, or `llm` |
|
|
|
|
### Optional Fields
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `sender_name` | string | Display name of sender |
|
|
| `date` | string | ISO 8601 date string |
|
|
| `has_attachments` | boolean | Whether email has attachments |
|
|
|
|
## Report Sections
|
|
|
|
### 1. Header
|
|
- Report title
|
|
- Generation timestamp
|
|
- Source info
|
|
- Total email count
|
|
|
|
### 2. Stats Grid
|
|
- Total emails
|
|
- Number of categories
|
|
- High confidence count (>=70%)
|
|
- Unique sender domains
|
|
|
|
### 3. Category Distribution
|
|
- Horizontal bar chart
|
|
- Count and percentage per category
|
|
- Sorted by count (descending)
|
|
|
|
### 4. Classification Methods
|
|
- Breakdown of ML vs Rule vs LLM
|
|
- Shows which method handled what percentage
|
|
|
|
### 5. Confidence Distribution
|
|
- High (>=70%): Green
|
|
- Medium (50-70%): Yellow
|
|
- Low (<50%): Red
|
|
|
|
### 6. Top Senders
|
|
- Top 20 senders by email count
|
|
- Grid layout
|
|
|
|
### 7. Email Tables (Tabbed)
|
|
- "All" tab shows all emails
|
|
- Category tabs filter by category
|
|
- Search box filters by subject/sender
|
|
- Columns: Date, Subject, Sender, Category, Confidence, Method
|
|
- Sorted by date (newest first)
|
|
- Attachment indicator (📎)
|
|
|
|
## Customization
|
|
|
|
### Changing Colors
|
|
|
|
Edit the CSS variables in `generate_html_report.py`:
|
|
|
|
```css
|
|
:root {
|
|
--bg-primary: #1a1a2e; /* Main background */
|
|
--bg-secondary: #16213e; /* Card backgrounds */
|
|
--bg-card: #0f3460; /* Nested elements */
|
|
--text-primary: #eee; /* Main text */
|
|
--text-secondary: #aaa; /* Muted text */
|
|
--accent: #e94560; /* Accent color (red) */
|
|
--accent-hover: #ff6b6b; /* Accent hover */
|
|
--success: #00d9a5; /* Green (high confidence) */
|
|
--warning: #ffc107; /* Yellow (medium confidence) */
|
|
--border: #2a2a4a; /* Border color */
|
|
}
|
|
```
|
|
|
|
### Light Theme Example
|
|
|
|
```css
|
|
:root {
|
|
--bg-primary: #f5f5f5;
|
|
--bg-secondary: #ffffff;
|
|
--bg-card: #e8e8e8;
|
|
--text-primary: #333;
|
|
--text-secondary: #666;
|
|
--accent: #2563eb;
|
|
--accent-hover: #3b82f6;
|
|
--success: #10b981;
|
|
--warning: #f59e0b;
|
|
--border: #d1d5db;
|
|
}
|
|
```
|
|
|
|
### Adding New Sections
|
|
|
|
1. Add data extraction in `generate_html_report()` function
|
|
2. Add HTML section in the main template string
|
|
3. Style with existing CSS classes or add new ones
|
|
|
|
### Adding New Table Columns
|
|
|
|
1. Modify `generate_email_row()` function
|
|
2. Add `<th>` in table header
|
|
3. Add `<td>` in row template
|
|
|
|
## Performance Notes
|
|
|
|
- Report is fully static (no server required)
|
|
- JavaScript is minimal (tab switching, search filtering)
|
|
- Handles 1000+ emails without performance issues
|
|
- For 10k+ emails, consider pagination (not yet implemented)
|
|
|
|
## Future Enhancements (TODO)
|
|
|
|
- [ ] Pagination for large datasets
|
|
- [ ] Export to PDF option
|
|
- [ ] Configurable color themes via CLI
|
|
- [ ] Column sorting (click headers)
|
|
- [ ] Date range filter
|
|
- [ ] Sender domain grouping
|
|
- [ ] Category confidence heatmap
|
|
- [ ] Email body preview on hover
|
|
|
|
## Troubleshooting
|
|
|
|
### "KeyError: 'subject'"
|
|
Results.json lacks email metadata. Re-run classification with latest cli.py.
|
|
|
|
### Empty tables
|
|
Check that results.json has `classifications` array with data.
|
|
|
|
### Dates showing "N/A"
|
|
Date parsing failed. Check date format in results.json is ISO 8601.
|
|
|
|
### Search not working
|
|
JavaScript error. Check browser console. Ensure no HTML entities in data.
|