Improve LLM prompts with proper context and purpose

Both discovery and consolidation prompts now explain: - What the system does (train ML classifier for auto-sorting) - What makes good categories (broad, timeless, learnable) - Why this matters (user needs, ML training requirements) - How to think about the task (user-focused, functional) Discovery prompt changes: - Explains goal of identifying natural categories for ML training - Lists guidelines for good categories (broad, user-focused, learnable) - Provides concrete examples of functional categories - Emphasizes PURPOSE over topic Consolidation prompt changes: - Explains full system context (LightGBM, auto-labeling, user search) - Defines what makes categories effective for ML and users - Provides user-centric thinking framework - Emphasizes reusability and timelessness Prompts now give the brilliant 8b model proper context to deliver excellent category decisions instead of lazy generic categorization.
2025-10-23 14:15:17 +11:00 · 2025-10-23 14:15:17 +11:00 · 183b12c9b4
commit 183b12c9b4
parent 88ef570fed
1 changed files with 54 additions and 11 deletions
--- a/src/calibration/llm_analyzer.py
+++ b/src/calibration/llm_analyzer.py
@ -105,16 +105,36 @@ class CalibrationAnalyzer:
        # Use first email ID as example
        example_id = batch[0].id if batch else "maildir_example__sent_1"

-        prompt = f"""<no_think>Categorize these emails. You MUST copy the exact ID string for each email.
+        prompt = f"""<no_think>You are analyzing emails to discover natural categories for an automatic classification system.

-EMAILS:
+GOAL: Identify broad, reusable categories that will help train a machine learning model to sort thousands of emails automatically.
+
+GUIDELINES FOR GOOD CATEGORIES:
+- BROAD & TIMELESS: "Financial" not "Q3 Budget Review"
+- USER-FOCUSED: Think "what would help someone find this email later?"
+- LEARNABLE: ML model needs consistent patterns (sender domains, keywords, structure)
+- FUNCTIONAL: Each category serves a distinct purpose
+- 3-10 categories ideal: Too many = noise, too few = useless
+
+EMAILS TO ANALYZE:
 {email_summary}

-CRITICAL: Copy the EXACT ID from each email above. For example, if email #1 has ID "{example_id}", you must write exactly "{example_id}" in the labels array, not "email1" or anything else.
+TASK:
+1. Identify natural groupings based on PURPOSE, not just topic
+2. Create SHORT (1-3 word) category names
+3. Assign each email to exactly one category
+4. CRITICAL: Copy EXACT email IDs - if email #1 shows ID "{example_id}", use exactly "{example_id}" in labels
+
+EXAMPLES OF GOOD CATEGORIES:
+- "Work Communication" (daily business emails)
+- "Financial" (invoices, budgets, reports)
+- "Urgent" (time-sensitive requests)
+- "Technical" (system alerts, dev discussions)
+- "Administrative" (HR, policies, announcements)

 Return JSON:
 {{
-  "categories": {{"category_name": "description", ...}},
+  "categories": {{"category_name": "what user need this serves", ...}},
  "labels": [["{example_id}", "category"], ...]
 }}

@ -257,28 +277,51 @@ JSON:
        rules_text = "\n".join(rules)

        # Build prompt
-        prompt = f"""<no_think>Consolidate email categories by merging duplicates and overlaps.
+        prompt = f"""<no_think>You are helping build an email classification system that will automatically sort thousands of emails.
+
+TASK: Consolidate the discovered categories below into a lean, effective set for training a machine learning classifier.
+
+WHY THIS MATTERS:
+These categories will be used to:
+1. Train a LightGBM classifier on email features (embeddings, patterns, structure)
+2. Automatically label thousands of emails without human intervention
+3. Help users quickly find emails by category (like Gmail labels)
+
+WHAT MAKES GOOD CATEGORIES:
+- BROAD & REUSABLE: "Meetings" not "Q3 Planning Meeting" - applies to many emails
+- FUNCTIONALLY DISTINCT: Each category serves a different user need
+- BALANCED: Avoid 1 huge category + many tiny ones
+- LEARNABLE: ML model needs clear patterns to distinguish categories
+- TIMELESS: "Financial Reports" not "2023 Budget Review"
+- ACTION-ORIENTED: Users ask "show me all X" - what is X?

 DISCOVERED CATEGORIES (sorted by email count):
 {category_list}

-{context_section}CONSOLIDATION RULES:
+{context_section}CONSOLIDATION STRATEGY:
 {rules_text}

+THINK LIKE A USER: If you had to sort 10,000 emails, what categories would help you find things fast?
+- "Work Communication" catches daily business emails
+- "Urgent" flags time-sensitive items
+- "Financial" groups all money-related emails
+- "Technical" vs "Administrative" serves different workflows
+
 OUTPUT FORMAT - Return JSON with consolidated categories and mapping:
 {{
  "consolidated": {{
-    "FinalCategoryName": "Clear, generic description of what emails fit here"
+    "FinalCategoryName": "Clear description of what user need this serves"
  }},
  "mappings": {{
    "OldCategoryName": "FinalCategoryName"
  }}
 }}

-IMPORTANT:
- consolidated dict should have {target_categories} or fewer entries
- mappings dict must map EVERY old category name to a final category
- Final category names should be present in both consolidated and mappings
+CRITICAL REQUIREMENTS:
+- Maximum {target_categories} final categories (strict limit)
+- Map EVERY old category to exactly one final category
+- Final category names must be SHORT (1-3 words), GENERIC, and REUSABLE
+- Think: "Would this category still make sense in 5 years?"

 JSON:
 """