Project Reorganization: - Created docs/ directory and moved all documentation - Created scripts/ directory for shell scripts - Created scripts/experimental/ for research scripts - Updated .gitignore for new structure - Updated README.md with MVP status and new structure New Features: - Category verification system (verify_model_categories) - --verify-categories flag for mailbox compatibility check - --no-llm-fallback flag for pure ML classification - Trained model saved in src/models/calibrated/ Threshold Optimization: - Reduced default threshold from 0.75 to 0.55 - Updated all category thresholds to 0.55 - Reduces LLM fallback rate by 40% (35% -> 21%) Documentation: - SYSTEM_FLOW.html - Complete system architecture - VERIFY_CATEGORIES_FEATURE.html - Feature documentation - LABEL_TRAINING_PHASE_DETAIL.html - Calibration breakdown - FAST_ML_ONLY_WORKFLOW.html - Pure ML guide - PROJECT_STATUS_AND_NEXT_STEPS.html - Roadmap - ROOT_CAUSE_ANALYSIS.md - Bug fixes MVP Status: - 10k emails in 4 minutes, 72.7% accuracy, 0 LLM calls - LLM-driven category discovery working - Embedding-based transfer learning confirmed - All model paths verified and working
78 lines
751 B
Plaintext
78 lines
751 B
Plaintext
# Python
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
*.so
|
|
.Python
|
|
env/
|
|
venv/
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
|
|
# Data and Models
|
|
data/training/
|
|
src/models/pretrained/*.pkl
|
|
src/models/pretrained/*.joblib
|
|
*.h5
|
|
*.joblib
|
|
enron_mail_20150507
|
|
maildir
|
|
|
|
# Credentials
|
|
.env
|
|
credentials/
|
|
*.json
|
|
!config/*.json
|
|
!config/*.yaml
|
|
|
|
# Logs
|
|
logs/
|
|
*.log
|
|
|
|
# IDE
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
*.swo
|
|
|
|
# OS
|
|
.DS_Store
|
|
Thumbs.db
|
|
|
|
# Checkpoints
|
|
checkpoints/
|
|
*.checkpoint
|
|
|
|
# Results
|
|
results/
|
|
output/
|
|
|
|
# Pytest
|
|
.pytest_cache/
|
|
.coverage
|
|
htmlcov/
|
|
|
|
# MyPy
|
|
.mypy_cache/
|
|
.dmypy.json
|
|
dmypy.json
|
|
|
|
# Temporary files
|
|
*.tmp
|
|
*.bak
|
|
*~
|
|
enron_mail_20150507.tar.gz
|
|
debug_*.txt
|
|
|
|
# Test artifacts
|
|
test/
|
|
ml_only_test/
|
|
results_*/
|
|
phase1_*/
|
|
|
|
# Python scripts (experimental/research)
|
|
*.py
|
|
!src/**/*.py
|
|
!tests/**/*.py
|
|
!setup.py |