3.7 KiB
3.7 KiB
Problem Statement
Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
- Default 2048 tokens is inadequate for RAG applications
- Users can't configure context window for their hardware/use case
- No guidance on optimal context sizes for different models
- Inconsistent context handling across the codebase
- New users don't understand context window importance
Impact on User Experience
With 2048 token context window:
- Only 1-2 responses possible before context truncation
- Thinking tokens consume significant context space
- Poor performance with larger document chunks
- Frustrated users who don't understand why responses degrade
With proper context configuration:
- 5-15+ responses in exploration mode
- Support for advanced use cases (15+ results, 4000+ character chunks)
- Better coding assistance and analysis
- Professional-grade RAG experience
Solution Implemented
1. Enhanced Model Configuration Menu
Added context window selection alongside model selection with:
- Development: 8K tokens (fast, good for most cases)
- Production: 16K tokens (balanced performance)
- Advanced: 32K+ tokens (heavy development work)
2. Educational Content
Helps users understand:
- Why context window size matters for RAG
- Hardware implications of larger contexts
- Optimal settings for their use case
- Model-specific context capabilities
3. Consistent Implementation
- Updated all Ollama API calls to use consistent context settings
- Ensured configuration applies across synthesis, expansion, and exploration
- Added validation for context sizes against model capabilities
- Provided clear error messages for invalid configurations
Technical Implementation
Based on comprehensive research findings:
Model Context Capabilities
- qwen3:0.6b/1.7b: 32K token maximum
- qwen3:4b: 131K token maximum (YaRN extended)
Recommended Context Sizes
# Conservative (fast, low memory)
num_ctx: 8192 # ~6MB memory, excellent for exploration
# Balanced (recommended for most users)
num_ctx: 16384 # ~12MB memory, handles complex analysis
# Advanced (heavy development work)
num_ctx: 32768 # ~24MB memory, supports large codebases
Configuration Integration
- Added context window selection to TUI configuration menu
- Updated config.yaml schema with context parameters
- Implemented validation for model-specific limits
- Provided migration for existing configurations
Benefits
-
Improved User Experience
- Longer conversation sessions
- Better analysis quality
- Clear performance expectations
-
Professional RAG Capability
- Support for enterprise-scale projects
- Handles large codebases effectively
- Enables advanced use cases
-
Educational Value
- Users learn about context windows
- Better understanding of RAG performance
- Informed decision making
Files Changed
mini_rag/config.py: Added context window configuration parametersmini_rag/llm_synthesizer.py: Dynamic context sizing with model awarenessmini_rag/explorer.py: Consistent context applicationrag-tui.py: Enhanced configuration menu with context selectionPR_DRAFT.md: Documentation of implementation approach
Testing Recommendations
- Test context configuration menu with different models
- Verify context limits are enforced correctly
- Test conversation length with different context sizes
- Validate memory usage estimates
- Test advanced use cases (15+ results, large chunks)
This PR significantly improves FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.
Ready for review and testing! 🚀