Root cause: Pre-trained model was loading successfully, causing CLI to skip
calibration entirely. System went straight to classification with 35% model.
Changes:
- config: Set calibration_model to qwen3:8b-q4_K_M (larger model for better instruction following)
- cli: Create separate calibration_llm provider with 8b model
- llm_analyzer: Improved prompt to force exact email ID copying
- workflow: Merge discovered categories with predefined ones
- workflow: Add detailed error logging for label mismatches
- ml_classifier: Fixed model path checking (was checking None parameter)
- ml_classifier: Add dual API support (sklearn predict_proba vs LightGBM predict)
- ollama: Fixed model list parsing (use m.model not m.get('name'))
- feature_extractor: Switch to Ollama embeddings (instant vs 90s load time)
Result: Calibration now runs and generates 16 categories + 50 labels correctly.
Next: Investigate calibration sampling to reduce overfitting on small samples.
65 lines
600 B
Plaintext
65 lines
600 B
Plaintext
# Python
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
*.so
|
|
.Python
|
|
env/
|
|
venv/
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
|
|
# Data and Models
|
|
data/training/
|
|
src/models/pretrained/*.pkl
|
|
src/models/pretrained/*.joblib
|
|
*.h5
|
|
*.joblib
|
|
enron_mail_20150507
|
|
maildir
|
|
|
|
# Credentials
|
|
.env
|
|
credentials/
|
|
*.json
|
|
!config/*.json
|
|
!config/*.yaml
|
|
|
|
# Logs
|
|
logs/*.log
|
|
*.log
|
|
|
|
# IDE
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
*.swo
|
|
|
|
# OS
|
|
.DS_Store
|
|
Thumbs.db
|
|
|
|
# Checkpoints
|
|
checkpoints/
|
|
*.checkpoint
|
|
|
|
# Results
|
|
results/
|
|
output/
|
|
|
|
# Pytest
|
|
.pytest_cache/
|
|
.coverage
|
|
htmlcov/
|
|
|
|
# MyPy
|
|
.mypy_cache/
|
|
.dmypy.json
|
|
dmypy.json
|
|
|
|
# Temporary files
|
|
*.tmp
|
|
*.bak
|
|
*~
|
|
enron_mail_20150507.tar.gz |