From 183b12c9b4332a15a4c31211916eb471516928bb Mon Sep 17 00:00:00 2001
From: FSSCoding <brett@foxsoftwaresolutions.com.au>
Date: Thu, 23 Oct 2025 14:15:17 +1100
Subject: [PATCH] Improve LLM prompts with proper context and purpose

Both discovery and consolidation prompts now explain:
- What the system does (train ML classifier for auto-sorting)
- What makes good categories (broad, timeless, learnable)
- Why this matters (user needs, ML training requirements)
- How to think about the task (user-focused, functional)

Discovery prompt changes:
- Explains goal of identifying natural categories for ML training
- Lists guidelines for good categories (broad, user-focused, learnable)
- Provides concrete examples of functional categories
- Emphasizes PURPOSE over topic

Consolidation prompt changes:
- Explains full system context (LightGBM, auto-labeling, user search)
- Defines what makes categories effective for ML and users
- Provides user-centric thinking framework
- Emphasizes reusability and timelessness

Prompts now give the brilliant 8b model proper context to deliver
excellent category decisions instead of lazy generic categorization.
---
 src/calibration/llm_analyzer.py | 65 +++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git a/src/calibration/llm_analyzer.py b/src/calibration/llm_analyzer.py
index e76fc2f..685bc48 100644
--- a/src/calibration/llm_analyzer.py
+++ b/src/calibration/llm_analyzer.py
@@ -105,16 +105,36 @@ class CalibrationAnalyzer:
         # Use first email ID as example
         example_id = batch[0].id if batch else "maildir_example__sent_1"
 
-        prompt = f"""<no_think>Categorize these emails. You MUST copy the exact ID string for each email.
+        prompt = f"""<no_think>You are analyzing emails to discover natural categories for an automatic classification system.
 
-EMAILS:
+GOAL: Identify broad, reusable categories that will help train a machine learning model to sort thousands of emails automatically.
+
+GUIDELINES FOR GOOD CATEGORIES:
+- BROAD & TIMELESS: "Financial" not "Q3 Budget Review"
+- USER-FOCUSED: Think "what would help someone find this email later?"
+- LEARNABLE: ML model needs consistent patterns (sender domains, keywords, structure)
+- FUNCTIONAL: Each category serves a distinct purpose
+- 3-10 categories ideal: Too many = noise, too few = useless
+
+EMAILS TO ANALYZE:
 {email_summary}
 
-CRITICAL: Copy the EXACT ID from each email above. For example, if email #1 has ID "{example_id}", you must write exactly "{example_id}" in the labels array, not "email1" or anything else.
+TASK:
+1. Identify natural groupings based on PURPOSE, not just topic
+2. Create SHORT (1-3 word) category names
+3. Assign each email to exactly one category
+4. CRITICAL: Copy EXACT email IDs - if email #1 shows ID "{example_id}", use exactly "{example_id}" in labels
+
+EXAMPLES OF GOOD CATEGORIES:
+- "Work Communication" (daily business emails)
+- "Financial" (invoices, budgets, reports)
+- "Urgent" (time-sensitive requests)
+- "Technical" (system alerts, dev discussions)
+- "Administrative" (HR, policies, announcements)
 
 Return JSON:
 {{
-  "categories": {{"category_name": "description", ...}},
+  "categories": {{"category_name": "what user need this serves", ...}},
   "labels": [["{example_id}", "category"], ...]
 }}
 
@@ -257,28 +277,51 @@ JSON:
         rules_text = "\n".join(rules)
 
         # Build prompt
-        prompt = f"""<no_think>Consolidate email categories by merging duplicates and overlaps.
+        prompt = f"""<no_think>You are helping build an email classification system that will automatically sort thousands of emails.
+
+TASK: Consolidate the discovered categories below into a lean, effective set for training a machine learning classifier.
+
+WHY THIS MATTERS:
+These categories will be used to:
+1. Train a LightGBM classifier on email features (embeddings, patterns, structure)
+2. Automatically label thousands of emails without human intervention
+3. Help users quickly find emails by category (like Gmail labels)
+
+WHAT MAKES GOOD CATEGORIES:
+- BROAD & REUSABLE: "Meetings" not "Q3 Planning Meeting" - applies to many emails
+- FUNCTIONALLY DISTINCT: Each category serves a different user need
+- BALANCED: Avoid 1 huge category + many tiny ones
+- LEARNABLE: ML model needs clear patterns to distinguish categories
+- TIMELESS: "Financial Reports" not "2023 Budget Review"
+- ACTION-ORIENTED: Users ask "show me all X" - what is X?
 
 DISCOVERED CATEGORIES (sorted by email count):
 {category_list}
 
-{context_section}CONSOLIDATION RULES:
+{context_section}CONSOLIDATION STRATEGY:
 {rules_text}
 
+THINK LIKE A USER: If you had to sort 10,000 emails, what categories would help you find things fast?
+- "Work Communication" catches daily business emails
+- "Urgent" flags time-sensitive items
+- "Financial" groups all money-related emails
+- "Technical" vs "Administrative" serves different workflows
+
 OUTPUT FORMAT - Return JSON with consolidated categories and mapping:
 {{
   "consolidated": {{
-    "FinalCategoryName": "Clear, generic description of what emails fit here"
+    "FinalCategoryName": "Clear description of what user need this serves"
   }},
   "mappings": {{
     "OldCategoryName": "FinalCategoryName"
   }}
 }}
 
-IMPORTANT:
-- consolidated dict should have {target_categories} or fewer entries
-- mappings dict must map EVERY old category name to a final category
-- Final category names should be present in both consolidated and mappings
+CRITICAL REQUIREMENTS:
+- Maximum {target_categories} final categories (strict limit)
+- Map EVERY old category to exactly one final category
+- Final category names must be SHORT (1-3 words), GENERIC, and REUSABLE
+- Think: "Would this category still make sense in 5 years?"
 
 JSON:
 """