Add intelligent context window management for optimal RAG performance:
## Core Features
- Dynamic context sizing based on model capabilities
- User-friendly configuration menu with Development/Production/Advanced presets
- Automatic validation against model limits (qwen3:0.6b/1.7b = 32K, qwen3:4b = 131K)
- Educational content explaining context window importance for RAG
## Technical Implementation
- Enhanced LLMConfig with context_window and auto_context parameters
- Intelligent _get_optimal_context_size() method with model-specific limits
- Consistent context application across synthesizer and explorer
- YAML configuration output with helpful context explanations
## User Experience Improvements
- Clear context window display in configuration status
- Guided selection: Development (8K), Production (16K), Advanced (32K)
- Memory usage estimates and performance guidance
- Validation prevents invalid context/model combinations
## Educational Value
- Explains why default 2048 tokens fails for RAG
- Shows relationship between context size and conversation length
- Guides users toward optimal settings for their use case
- Highlights advanced capabilities (15+ results, 4000+ character chunks)
This addresses the critical issue where Ollama's default context severely
limits RAG performance, providing users with proper configuration tools
and understanding of this crucial parameter.
This comprehensive update enhances user experience with several key improvements:
## Enhanced Streaming & Thinking Display
- Implement real-time streaming with gray thinking tokens that collapse after completion
- Fix thinking token redisplay bug with proper content filtering
- Add clear "AI Response:" headers to separate thinking from responses
- Enable streaming by default for better user engagement
- Keep thinking visible for exploration, collapse only for suggested questions
## Natural Conversation Responses
- Convert clunky JSON exploration responses to natural, conversational format
- Improve exploration prompts for friendly, colleague-style interactions
- Update summary generation with better context handling
- Eliminate double response display issues
## Model Reference Updates
- Remove all llama3.2 references in favor of qwen3 models
- Fix non-existent qwen3:3b references, replace with proper model names
- Update model rankings to prioritize working qwen models across all components
- Ensure consistent model recommendations in docs and examples
## Cross-Platform Icon Integration
- Add desktop icon setup to Linux installer with .desktop entry
- Add Windows shortcuts for desktop and Start Menu integration
- Improve installer user experience with visual branding
## Configuration & Navigation Fixes
- Fix "0" option in configuration menu to properly go back
- Improve configuration menu user-friendliness
- Update troubleshooting guides with correct model suggestions
These changes significantly improve the beginner experience while maintaining
technical accuracy and system reliability.
Major fixes:
- Fix model selection to prioritize qwen3:1.7b instead of qwen3:4b for testing
- Correct context length from 80,000 to 32,000 tokens (proper Qwen3 limit)
- Implement content-preserving safeguards instead of dropping responses
- Fix all test imports from claude_rag to mini_rag module naming
- Add virtual environment warnings to all test entry points
- Fix TUI EOF crash handling with proper error handling
- Remove warmup delays that were causing startup lag and unwanted model calls
- Fix command mappings between bash wrapper and Python script
- Update documentation to reflect qwen3:1.7b as primary recommendation
- Improve TUI box alignment and formatting
- Make language generic for any documents, not just codebases
- Add proper folder names in user feedback instead of generic terms
Technical improvements:
- Unified model rankings across all components
- Better error handling for missing dependencies
- Comprehensive testing and validation of all fixes
- All tests now pass and system is deployment-ready
All major crashes and deployment issues resolved.
- Changed primary model recommendation from qwen3:1.7b to qwen3:4b
- Added Q8 quantization info in technical docs for production users
- Fixed method name error: get_embedding_info() -> get_status()
- Updated all error messages and test files with new recommendations
- Maintained beginner-friendly options (1.7b still very good, 0.6b surprisingly good)
- Added explanation of why small models work well with RAG context
- Comprehensive testing completed - system ready for clean release
Complete rebrand to eliminate any Claude/Anthropic references:
Directory Changes:
- claude_rag/ → mini_rag/ (preserving git history)
Content Changes:
- Replaced 930+ Claude references across 40+ files
- Updated all imports: from claude_rag → from mini_rag
- Updated all file paths: .claude-rag → .mini-rag
- Updated documentation and comments
- Updated configuration files and examples
Testing Changes:
- All tests updated to use mini_rag imports
- Integration tests verify new module structure
This ensures complete independence from Claude/Anthropic
branding while maintaining all functionality and git history.