Improve LLM prompts with proper context and purpose

Both discovery and consolidation prompts now explain: - What the system does (train ML classifier for auto-sorting) - What makes good categories (broad, timeless, learnable) - Why this matters (user needs, ML training requirements) - How to think about the task (user-focused, functional) Discovery prompt changes: - Explains goal of identifying natural categories for ML training - Lists guidelines for good categories (broad, user-focused, learnable) - Provides concrete examples of functional categories - Emphasizes PURPOSE over topic Consolidation prompt changes: - Explains full system context (LightGBM, auto-labeling, user search) - Defines what makes categories effective for ML and users - Provides user-centric thinking framework - Emphasizes reusability and timelessness Prompts now give the brilliant 8b model proper context to deliver excellent category decisions instead of lazy generic categorization.
2025-10-23 14:15:17 +11:00 · 2025-10-23 14:15:17 +11:00 · 183b12c9b4
commit 183b12c9b4
parent 88ef570fed
1 changed files with 54 additions and 11 deletions
--- a/src/calibration/llm_analyzer.py
+++ b/src/calibration/llm_analyzer.py
@ -105,16 +105,36 @@ class CalibrationAnalyzer:
        # Use first email ID as example
        example_id = batch[0].id if batch else "maildir_example__sent_1"
-        prompt = f"""<no_think>Categorize these emails. You MUST copy the exact ID string for each email.
+        prompt = f"""<no_think>You are analyzing emails to discover natural categories for an automatic classification system.
-EMAILS:
+GOAL: Identify broad, reusable categories that will help train a machine learning model to sort thousands of emails automatically.
 GUIDELINES FOR GOOD CATEGORIES:
 - BROAD & TIMELESS: "Financial" not "Q3 Budget Review"
 - USER-FOCUSED: Think "what would help someone find this email later?"
 - LEARNABLE: ML model needs consistent patterns (sender domains, keywords, structure)
 - FUNCTIONAL: Each category serves a distinct purpose
 - 3-10 categories ideal: Too many = noise, too few = useless
 EMAILS TO ANALYZE:
 {email_summary}
-CRITICAL: Copy the EXACT ID from each email above. For example, if email #1 has ID "{example_id}", you must write exactly "{example_id}" in the labels array, not "email1" or anything else.
+TASK:
 1. Identify natural groupings based on PURPOSE, not just topic
 2. Create SHORT (1-3 word) category names
 3. Assign each email to exactly one category
 4. CRITICAL: Copy EXACT email IDs - if email #1 shows ID "{example_id}", use exactly "{example_id}" in labels
 EXAMPLES OF GOOD CATEGORIES:
 - "Work Communication" (daily business emails)
 - "Financial" (invoices, budgets, reports)
 - "Urgent" (time-sensitive requests)
 - "Technical" (system alerts, dev discussions)
 - "Administrative" (HR, policies, announcements)
 Return JSON:
 {{
-  "categories": {{"category_name": "description", ...}},
+  "categories": {{"category_name": "what user need this serves", ...}},
  "labels": [["{example_id}", "category"], ...]
 }}
@ -257,28 +277,51 @@ JSON:
        rules_text = "\n".join(rules)
        # Build prompt
-        prompt = f"""<no_think>Consolidate email categories by merging duplicates and overlaps.
+        prompt = f"""<no_think>You are helping build an email classification system that will automatically sort thousands of emails.
 TASK: Consolidate the discovered categories below into a lean, effective set for training a machine learning classifier.
 WHY THIS MATTERS:
 These categories will be used to:
 1. Train a LightGBM classifier on email features (embeddings, patterns, structure)
 2. Automatically label thousands of emails without human intervention
 3. Help users quickly find emails by category (like Gmail labels)
 WHAT MAKES GOOD CATEGORIES:
 - BROAD & REUSABLE: "Meetings" not "Q3 Planning Meeting" - applies to many emails
 - FUNCTIONALLY DISTINCT: Each category serves a different user need
 - BALANCED: Avoid 1 huge category + many tiny ones
 - LEARNABLE: ML model needs clear patterns to distinguish categories
 - TIMELESS: "Financial Reports" not "2023 Budget Review"
 - ACTION-ORIENTED: Users ask "show me all X" - what is X?
 DISCOVERED CATEGORIES (sorted by email count):
 {category_list}
-{context_section}CONSOLIDATION RULES:
+{context_section}CONSOLIDATION STRATEGY:
 {rules_text}
 THINK LIKE A USER: If you had to sort 10,000 emails, what categories would help you find things fast?
 - "Work Communication" catches daily business emails
 - "Urgent" flags time-sensitive items
 - "Financial" groups all money-related emails
 - "Technical" vs "Administrative" serves different workflows
 OUTPUT FORMAT - Return JSON with consolidated categories and mapping:
 {{
  "consolidated": {{
-    "FinalCategoryName": "Clear, generic description of what emails fit here"
+    "FinalCategoryName": "Clear description of what user need this serves"
  }},
  "mappings": {{
    "OldCategoryName": "FinalCategoryName"
  }}
 }}
-IMPORTANT:
+CRITICAL REQUIREMENTS:
- consolidated dict should have {target_categories} or fewer entries
+- Maximum {target_categories} final categories (strict limit)
- mappings dict must map EVERY old category name to a final category
+- Map EVERY old category to exactly one final category
- Final category names should be present in both consolidated and mappings
+- Final category names must be SHORT (1-3 words), GENERIC, and REUSABLE
 - Think: "Would this category still make sense in 5 years?"
 JSON:
 """