# Add Context Window Configuration for Optimal RAG Performance ## Problem Statement Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance: - **Default 2048 tokens** is inadequate for RAG applications - Users can't configure context window for their hardware/use case - No guidance on optimal context sizes for different models - Inconsistent context handling across the codebase - New users don't understand context window importance ## Impact on User Experience **With 2048 token context window:** - Only 1-2 responses possible before context truncation - Thinking tokens consume significant context space - Poor performance with larger document chunks - Frustrated users who don't understand why responses degrade **With proper context configuration:** - 5-15+ responses in exploration mode - Support for advanced use cases (15+ results, 4000+ character chunks) - Better coding assistance and analysis - Professional-grade RAG experience ## Proposed Solution ### 1. Enhanced Model Configuration Menu Add context window selection alongside model selection with: - **Development**: 8K tokens (fast, good for most cases) - **Production**: 16K tokens (balanced performance) - **Advanced**: 32K+ tokens (heavy development work) ### 2. Educational Content Help users understand: - Why context window size matters for RAG - Hardware implications of larger contexts - Optimal settings for their use case - Model-specific context capabilities ### 3. Consistent Implementation - Update all Ollama API calls to use consistent context settings - Ensure configuration applies across synthesis, expansion, and exploration - Validate context sizes against model capabilities - Provide clear error messages for invalid configurations ## Technical Implementation Based on research findings: ### Model Context Capabilities - **qwen3:0.6b/1.7b**: 32K token maximum - **qwen3:4b**: 131K token maximum (YaRN extended) ### Recommended Context Sizes ```yaml # Conservative (fast, low memory) num_ctx: 8192 # ~6MB memory, excellent for exploration # Balanced (recommended for most users) num_ctx: 16384 # ~12MB memory, handles complex analysis # Advanced (heavy development work) num_ctx: 32768 # ~24MB memory, supports large codebases ``` ### Configuration Integration - Add context window selection to TUI configuration menu - Update config.yaml schema with context parameters - Implement validation for model-specific limits - Provide migration for existing configurations ## Benefits 1. **Improved User Experience** - Longer conversation sessions - Better analysis quality - Clear performance expectations 2. **Professional RAG Capability** - Support for enterprise-scale projects - Handles large codebases effectively - Enables advanced use cases 3. **Educational Value** - Users learn about context windows - Better understanding of RAG performance - Informed decision making ## Implementation Plan 1. **Phase 1**: Research Ollama context handling (✅ Complete) 2. **Phase 2**: Update configuration system (✅ Complete) 3. **Phase 3**: Enhance TUI with context selection (✅ Complete) 4. **Phase 4**: Update all API calls consistently (✅ Complete) 5. **Phase 5**: Add documentation and validation (✅ Complete) ## Implementation Details ### Configuration System - Added `context_window` and `auto_context` to LLMConfig - Default 16K context (vs problematic 2K default) - Model-specific validation and limits - YAML output includes helpful context explanations ### TUI Enhancement - New "Configure context window" menu option - Educational content about context importance - Three presets: Development (8K), Production (16K), Advanced (32K) - Custom size entry with validation - Memory usage estimates for each option ### API Consistency - Dynamic context sizing via `_get_optimal_context_size()` - Model capability awareness (qwen3:4b = 131K, others = 32K) - Applied consistently to synthesizer and explorer - Automatic capping at model limits ### User Education - Clear explanations of why context matters for RAG - Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB) - Advanced use case guidance (15+ results, 4000+ chunks) - Performance vs quality tradeoffs ## Answers to Review Questions 1. ✅ **Auto-detection**: Implemented via `auto_context` flag that respects model limits 2. ✅ **Model changes**: Dynamic validation against current model capabilities 3. ✅ **Scope**: Global configuration with per-model validation 4. ✅ **Validation**: Comprehensive validation with clear error messages and guidance --- **This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**