Add intelligent context window management for optimal RAG performance: ## Core Features - Dynamic context sizing based on model capabilities - User-friendly configuration menu with Development/Production/Advanced presets - Automatic validation against model limits (qwen3:0.6b/1.7b = 32K, qwen3:4b = 131K) - Educational content explaining context window importance for RAG ## Technical Implementation - Enhanced LLMConfig with context_window and auto_context parameters - Intelligent _get_optimal_context_size() method with model-specific limits - Consistent context application across synthesizer and explorer - YAML configuration output with helpful context explanations ## User Experience Improvements - Clear context window display in configuration status - Guided selection: Development (8K), Production (16K), Advanced (32K) - Memory usage estimates and performance guidance - Validation prevents invalid context/model combinations ## Educational Value - Explains why default 2048 tokens fails for RAG - Shows relationship between context size and conversation length - Guides users toward optimal settings for their use case - Highlights advanced capabilities (15+ results, 4000+ character chunks) This addresses the critical issue where Ollama's default context severely limits RAG performance, providing users with proper configuration tools and understanding of this crucial parameter.
3.6 KiB
3.6 KiB
Add Context Window Configuration for Optimal RAG Performance
Problem Statement
Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
- Default 2048 tokens is inadequate for RAG applications
- Users can't configure context window for their hardware/use case
- No guidance on optimal context sizes for different models
- Inconsistent context handling across the codebase
- New users don't understand context window importance
Impact on User Experience
With 2048 token context window:
- Only 1-2 responses possible before context truncation
- Thinking tokens consume significant context space
- Poor performance with larger document chunks
- Frustrated users who don't understand why responses degrade
With proper context configuration:
- 5-15+ responses in exploration mode
- Support for advanced use cases (15+ results, 4000+ character chunks)
- Better coding assistance and analysis
- Professional-grade RAG experience
Proposed Solution
1. Enhanced Model Configuration Menu
Add context window selection alongside model selection with:
- Development: 8K tokens (fast, good for most cases)
- Production: 16K tokens (balanced performance)
- Advanced: 32K+ tokens (heavy development work)
2. Educational Content
Help users understand:
- Why context window size matters for RAG
- Hardware implications of larger contexts
- Optimal settings for their use case
- Model-specific context capabilities
3. Consistent Implementation
- Update all Ollama API calls to use consistent context settings
- Ensure configuration applies across synthesis, expansion, and exploration
- Validate context sizes against model capabilities
- Provide clear error messages for invalid configurations
Technical Implementation
Based on research findings:
Model Context Capabilities
- qwen3:0.6b/1.7b: 32K token maximum
- qwen3:4b: 131K token maximum (YaRN extended)
Recommended Context Sizes
# Conservative (fast, low memory)
num_ctx: 8192 # ~6MB memory, excellent for exploration
# Balanced (recommended for most users)
num_ctx: 16384 # ~12MB memory, handles complex analysis
# Advanced (heavy development work)
num_ctx: 32768 # ~24MB memory, supports large codebases
Configuration Integration
- Add context window selection to TUI configuration menu
- Update config.yaml schema with context parameters
- Implement validation for model-specific limits
- Provide migration for existing configurations
Benefits
-
Improved User Experience
- Longer conversation sessions
- Better analysis quality
- Clear performance expectations
-
Professional RAG Capability
- Support for enterprise-scale projects
- Handles large codebases effectively
- Enables advanced use cases
-
Educational Value
- Users learn about context windows
- Better understanding of RAG performance
- Informed decision making
Implementation Plan
- Phase 1: Research Ollama context handling (✅ Complete)
- Phase 2: Update configuration system
- Phase 3: Enhance TUI with context selection
- Phase 4: Update all API calls consistently
- Phase 5: Add documentation and validation
Questions for Review
- Should we auto-detect optimal context based on available memory?
- How should we handle model changes that affect context capabilities?
- Should context be per-model or global configuration?
- What validation should we provide for context/model combinations?
This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.