Improve LLM prompts with proper context and purpose
Both discovery and consolidation prompts now explain: - What the system does (train ML classifier for auto-sorting) - What makes good categories (broad, timeless, learnable) - Why this matters (user needs, ML training requirements) - How to think about the task (user-focused, functional) Discovery prompt changes: - Explains goal of identifying natural categories for ML training - Lists guidelines for good categories (broad, user-focused, learnable) - Provides concrete examples of functional categories - Emphasizes PURPOSE over topic Consolidation prompt changes: - Explains full system context (LightGBM, auto-labeling, user search) - Defines what makes categories effective for ML and users - Provides user-centric thinking framework - Emphasizes reusability and timelessness Prompts now give the brilliant 8b model proper context to deliver excellent category decisions instead of lazy generic categorization.
This commit is contained in:
parent
88ef570fed
commit
183b12c9b4
@ -105,16 +105,36 @@ class CalibrationAnalyzer:
|
|||||||
# Use first email ID as example
|
# Use first email ID as example
|
||||||
example_id = batch[0].id if batch else "maildir_example__sent_1"
|
example_id = batch[0].id if batch else "maildir_example__sent_1"
|
||||||
|
|
||||||
prompt = f"""<no_think>Categorize these emails. You MUST copy the exact ID string for each email.
|
prompt = f"""<no_think>You are analyzing emails to discover natural categories for an automatic classification system.
|
||||||
|
|
||||||
EMAILS:
|
GOAL: Identify broad, reusable categories that will help train a machine learning model to sort thousands of emails automatically.
|
||||||
|
|
||||||
|
GUIDELINES FOR GOOD CATEGORIES:
|
||||||
|
- BROAD & TIMELESS: "Financial" not "Q3 Budget Review"
|
||||||
|
- USER-FOCUSED: Think "what would help someone find this email later?"
|
||||||
|
- LEARNABLE: ML model needs consistent patterns (sender domains, keywords, structure)
|
||||||
|
- FUNCTIONAL: Each category serves a distinct purpose
|
||||||
|
- 3-10 categories ideal: Too many = noise, too few = useless
|
||||||
|
|
||||||
|
EMAILS TO ANALYZE:
|
||||||
{email_summary}
|
{email_summary}
|
||||||
|
|
||||||
CRITICAL: Copy the EXACT ID from each email above. For example, if email #1 has ID "{example_id}", you must write exactly "{example_id}" in the labels array, not "email1" or anything else.
|
TASK:
|
||||||
|
1. Identify natural groupings based on PURPOSE, not just topic
|
||||||
|
2. Create SHORT (1-3 word) category names
|
||||||
|
3. Assign each email to exactly one category
|
||||||
|
4. CRITICAL: Copy EXACT email IDs - if email #1 shows ID "{example_id}", use exactly "{example_id}" in labels
|
||||||
|
|
||||||
|
EXAMPLES OF GOOD CATEGORIES:
|
||||||
|
- "Work Communication" (daily business emails)
|
||||||
|
- "Financial" (invoices, budgets, reports)
|
||||||
|
- "Urgent" (time-sensitive requests)
|
||||||
|
- "Technical" (system alerts, dev discussions)
|
||||||
|
- "Administrative" (HR, policies, announcements)
|
||||||
|
|
||||||
Return JSON:
|
Return JSON:
|
||||||
{{
|
{{
|
||||||
"categories": {{"category_name": "description", ...}},
|
"categories": {{"category_name": "what user need this serves", ...}},
|
||||||
"labels": [["{example_id}", "category"], ...]
|
"labels": [["{example_id}", "category"], ...]
|
||||||
}}
|
}}
|
||||||
|
|
||||||
@ -257,28 +277,51 @@ JSON:
|
|||||||
rules_text = "\n".join(rules)
|
rules_text = "\n".join(rules)
|
||||||
|
|
||||||
# Build prompt
|
# Build prompt
|
||||||
prompt = f"""<no_think>Consolidate email categories by merging duplicates and overlaps.
|
prompt = f"""<no_think>You are helping build an email classification system that will automatically sort thousands of emails.
|
||||||
|
|
||||||
|
TASK: Consolidate the discovered categories below into a lean, effective set for training a machine learning classifier.
|
||||||
|
|
||||||
|
WHY THIS MATTERS:
|
||||||
|
These categories will be used to:
|
||||||
|
1. Train a LightGBM classifier on email features (embeddings, patterns, structure)
|
||||||
|
2. Automatically label thousands of emails without human intervention
|
||||||
|
3. Help users quickly find emails by category (like Gmail labels)
|
||||||
|
|
||||||
|
WHAT MAKES GOOD CATEGORIES:
|
||||||
|
- BROAD & REUSABLE: "Meetings" not "Q3 Planning Meeting" - applies to many emails
|
||||||
|
- FUNCTIONALLY DISTINCT: Each category serves a different user need
|
||||||
|
- BALANCED: Avoid 1 huge category + many tiny ones
|
||||||
|
- LEARNABLE: ML model needs clear patterns to distinguish categories
|
||||||
|
- TIMELESS: "Financial Reports" not "2023 Budget Review"
|
||||||
|
- ACTION-ORIENTED: Users ask "show me all X" - what is X?
|
||||||
|
|
||||||
DISCOVERED CATEGORIES (sorted by email count):
|
DISCOVERED CATEGORIES (sorted by email count):
|
||||||
{category_list}
|
{category_list}
|
||||||
|
|
||||||
{context_section}CONSOLIDATION RULES:
|
{context_section}CONSOLIDATION STRATEGY:
|
||||||
{rules_text}
|
{rules_text}
|
||||||
|
|
||||||
|
THINK LIKE A USER: If you had to sort 10,000 emails, what categories would help you find things fast?
|
||||||
|
- "Work Communication" catches daily business emails
|
||||||
|
- "Urgent" flags time-sensitive items
|
||||||
|
- "Financial" groups all money-related emails
|
||||||
|
- "Technical" vs "Administrative" serves different workflows
|
||||||
|
|
||||||
OUTPUT FORMAT - Return JSON with consolidated categories and mapping:
|
OUTPUT FORMAT - Return JSON with consolidated categories and mapping:
|
||||||
{{
|
{{
|
||||||
"consolidated": {{
|
"consolidated": {{
|
||||||
"FinalCategoryName": "Clear, generic description of what emails fit here"
|
"FinalCategoryName": "Clear description of what user need this serves"
|
||||||
}},
|
}},
|
||||||
"mappings": {{
|
"mappings": {{
|
||||||
"OldCategoryName": "FinalCategoryName"
|
"OldCategoryName": "FinalCategoryName"
|
||||||
}}
|
}}
|
||||||
}}
|
}}
|
||||||
|
|
||||||
IMPORTANT:
|
CRITICAL REQUIREMENTS:
|
||||||
- consolidated dict should have {target_categories} or fewer entries
|
- Maximum {target_categories} final categories (strict limit)
|
||||||
- mappings dict must map EVERY old category name to a final category
|
- Map EVERY old category to exactly one final category
|
||||||
- Final category names should be present in both consolidated and mappings
|
- Final category names must be SHORT (1-3 words), GENERIC, and REUSABLE
|
||||||
|
- Think: "Would this category still make sense in 5 years?"
|
||||||
|
|
||||||
JSON:
|
JSON:
|
||||||
"""
|
"""
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user