Add intelligent context window management for optimal RAG performance:
## Core Features
- Dynamic context sizing based on model capabilities
- User-friendly configuration menu with Development/Production/Advanced presets
- Automatic validation against model limits (qwen3:0.6b/1.7b = 32K, qwen3:4b = 131K)
- Educational content explaining context window importance for RAG
## Technical Implementation
- Enhanced LLMConfig with context_window and auto_context parameters
- Intelligent _get_optimal_context_size() method with model-specific limits
- Consistent context application across synthesizer and explorer
- YAML configuration output with helpful context explanations
## User Experience Improvements
- Clear context window display in configuration status
- Guided selection: Development (8K), Production (16K), Advanced (32K)
- Memory usage estimates and performance guidance
- Validation prevents invalid context/model combinations
## Educational Value
- Explains why default 2048 tokens fails for RAG
- Shows relationship between context size and conversation length
- Guides users toward optimal settings for their use case
- Highlights advanced capabilities (15+ results, 4000+ character chunks)
This addresses the critical issue where Ollama's default context severely
limits RAG performance, providing users with proper configuration tools
and understanding of this crucial parameter.
This comprehensive update enhances user experience with several key improvements:
## Enhanced Streaming & Thinking Display
- Implement real-time streaming with gray thinking tokens that collapse after completion
- Fix thinking token redisplay bug with proper content filtering
- Add clear "AI Response:" headers to separate thinking from responses
- Enable streaming by default for better user engagement
- Keep thinking visible for exploration, collapse only for suggested questions
## Natural Conversation Responses
- Convert clunky JSON exploration responses to natural, conversational format
- Improve exploration prompts for friendly, colleague-style interactions
- Update summary generation with better context handling
- Eliminate double response display issues
## Model Reference Updates
- Remove all llama3.2 references in favor of qwen3 models
- Fix non-existent qwen3:3b references, replace with proper model names
- Update model rankings to prioritize working qwen models across all components
- Ensure consistent model recommendations in docs and examples
## Cross-Platform Icon Integration
- Add desktop icon setup to Linux installer with .desktop entry
- Add Windows shortcuts for desktop and Start Menu integration
- Improve installer user experience with visual branding
## Configuration & Navigation Fixes
- Fix "0" option in configuration menu to properly go back
- Improve configuration menu user-friendliness
- Update troubleshooting guides with correct model suggestions
These changes significantly improve the beginner experience while maintaining
technical accuracy and system reliability.
Major fixes:
- Fix model selection to prioritize qwen3:1.7b instead of qwen3:4b for testing
- Correct context length from 80,000 to 32,000 tokens (proper Qwen3 limit)
- Implement content-preserving safeguards instead of dropping responses
- Fix all test imports from claude_rag to mini_rag module naming
- Add virtual environment warnings to all test entry points
- Fix TUI EOF crash handling with proper error handling
- Remove warmup delays that were causing startup lag and unwanted model calls
- Fix command mappings between bash wrapper and Python script
- Update documentation to reflect qwen3:1.7b as primary recommendation
- Improve TUI box alignment and formatting
- Make language generic for any documents, not just codebases
- Add proper folder names in user feedback instead of generic terms
Technical improvements:
- Unified model rankings across all components
- Better error handling for missing dependencies
- Comprehensive testing and validation of all fixes
- All tests now pass and system is deployment-ready
All major crashes and deployment issues resolved.
Based on feedback in PR comment, implemented:
Installer improvements:
- Added choice between code/docs sample testing
- Created FSS-Mini-RAG specific sample files (chunker.py, ollama_integration.py, etc.)
- Timing-based estimation for full project indexing
- Better sample content that actually relates to this project
TUI enhancements:
- Replaced generic searches with FSS-Mini-RAG relevant questions:
* "chunking strategy"
* "ollama integration"
* "indexing performance"
* "why does indexing take long"
- Added search count tracking and sample limitation reminder
- Intelligent transition to full project after 2 sample searches
- FSS-Mini-RAG specific follow-up question patterns
Key fixes:
- No more dead search results (removed auth/API queries that don't exist)
- Sample questions now match actual content that will be found
- User gets timing estimate for full indexing based on sample performance
- Clear transition path from sample to full project exploration
This prevents the "installed malware" feeling when searches return no results.
- Replace slow full-project test with fast 3-file sample
- Add beginner guidance and welcome messages
- Add sample questions to combat prompt paralysis
- Add intelligent follow-up question suggestions
- Improve TUI with contextual next steps
Installer improvements:
- Create minimal sample project (3 files) for testing
- Add helpful tips and guidance for new users
- Better error messaging and progress indicators
TUI enhancements:
- Welcome message for first-time users
- Sample search questions (authentication, error handling, etc.)
- Pattern-based follow-up question generation
- Contextual suggestions based on search results
These changes address user feedback about installation taking too long
and beginners not knowing what to search for.
- Changed primary model recommendation from qwen3:1.7b to qwen3:4b
- Added Q8 quantization info in technical docs for production users
- Fixed method name error: get_embedding_info() -> get_status()
- Updated all error messages and test files with new recommendations
- Maintained beginner-friendly options (1.7b still very good, 0.6b surprisingly good)
- Added explanation of why small models work well with RAG context
- Comprehensive testing completed - system ready for clean release
Complete rebrand to eliminate any Claude/Anthropic references:
Directory Changes:
- claude_rag/ → mini_rag/ (preserving git history)
Content Changes:
- Replaced 930+ Claude references across 40+ files
- Updated all imports: from claude_rag → from mini_rag
- Updated all file paths: .claude-rag → .mini-rag
- Updated documentation and comments
- Updated configuration files and examples
Testing Changes:
- All tests updated to use mini_rag imports
- Integration tests verify new module structure
This ensures complete independence from Claude/Anthropic
branding while maintaining all functionality and git history.
- Update README with prominent two-mode explanation (synthesis vs exploration)
- Add exploration mode to TUI with full interactive interface
- Create comprehensive mode separation tests (test_mode_separation.py)
- Update Ollama integration tests to cover both synthesis and exploration modes
- Add CLI reference updates showing both modes
- Implement complete testing coverage for lazy loading, mode contamination prevention
- Add session management tests for exploration mode
- Update all examples and help text to reflect clean two-mode architecture
📚 DOCUMENTATION
- docs/QUERY_EXPANSION.md: Complete beginner guide with examples and troubleshooting
- Updated config.yaml with proper LLM settings and comments
- Clear explanations of when features are enabled/disabled
🧪 NEW TESTING INFRASTRUCTURE
- test_ollama_integration.py: 6 comprehensive tests with helpful error messages
- test_smart_ranking.py: 6 tests verifying ranking quality improvements
- troubleshoot.py: Interactive tool for diagnosing setup issues
- Enhanced system validation with new features coverage
⚙️ SMART DEFAULTS
- Query expansion disabled by default (CLI speed)
- TUI enables expansion automatically (exploration mode)
- Clear user feedback about which features are active
- Graceful degradation when Ollama unavailable
🎯 BEGINNER-FRIENDLY APPROACH
- Tests explain what they're checking and why
- Clear solutions provided for common problems
- Educational output showing system status
- Offline testing with gentle mocking
Run 'python3 tests/troubleshoot.py' to verify your setup\!
🔧 Integration Updates
- Added --synthesize flag to main rag-mini CLI
- Updated README with synthesis examples and 10 result default
- Enhanced demo script with 8 complete results (was cutting off at 5)
- Updated rag-tui default from 5 to 10 results
- Updated rag-mini-enhanced script defaults
📈 User Experience Improvements
- All components now consistently default to 10 results
- Demo shows complete 8-result workflow with multi-line previews
- Documentation reflects new AI analysis capabilities
- Seamless integration preserves existing workflows
Users get more comprehensive results by default and can optionally
add intelligent AI analysis with a simple --synthesize flag!