email-sorter/.gitignore
FSSCoding 50ddaa4b39 Fix calibration workflow - LLM now generates categories/labels correctly
Root cause: Pre-trained model was loading successfully, causing CLI to skip
calibration entirely. System went straight to classification with 35% model.

Changes:
- config: Set calibration_model to qwen3:8b-q4_K_M (larger model for better instruction following)
- cli: Create separate calibration_llm provider with 8b model
- llm_analyzer: Improved prompt to force exact email ID copying
- workflow: Merge discovered categories with predefined ones
- workflow: Add detailed error logging for label mismatches
- ml_classifier: Fixed model path checking (was checking None parameter)
- ml_classifier: Add dual API support (sklearn predict_proba vs LightGBM predict)
- ollama: Fixed model list parsing (use m.model not m.get('name'))
- feature_extractor: Switch to Ollama embeddings (instant vs 90s load time)

Result: Calibration now runs and generates 16 categories + 50 labels correctly.
Next: Investigate calibration sampling to reduce overfitting on small samples.
2025-10-23 13:51:09 +11:00

65 lines
600 B
Plaintext

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
*.egg-info/
dist/
build/
# Data and Models
data/training/
src/models/pretrained/*.pkl
src/models/pretrained/*.joblib
*.h5
*.joblib
enron_mail_20150507
maildir
# Credentials
.env
credentials/
*.json
!config/*.json
!config/*.yaml
# Logs
logs/*.log
*.log
# IDE
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Checkpoints
checkpoints/
*.checkpoint
# Results
results/
output/
# Pytest
.pytest_cache/
.coverage
htmlcov/
# MyPy
.mypy_cache/
.dmypy.json
dmypy.json
# Temporary files
*.tmp
*.bak
*~
enron_mail_20150507.tar.gz