diff --git a/PR_BODY.md b/PR_BODY.md new file mode 100644 index 0000000..d8d9f57 --- /dev/null +++ b/PR_BODY.md @@ -0,0 +1,109 @@ +## Problem Statement + +Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance: + +- **Default 2048 tokens** is inadequate for RAG applications +- Users can't configure context window for their hardware/use case +- No guidance on optimal context sizes for different models +- Inconsistent context handling across the codebase +- New users don't understand context window importance + +## Impact on User Experience + +**With 2048 token context window:** +- Only 1-2 responses possible before context truncation +- Thinking tokens consume significant context space +- Poor performance with larger document chunks +- Frustrated users who don't understand why responses degrade + +**With proper context configuration:** +- 5-15+ responses in exploration mode +- Support for advanced use cases (15+ results, 4000+ character chunks) +- Better coding assistance and analysis +- Professional-grade RAG experience + +## Solution Implemented + +### 1. Enhanced Model Configuration Menu +Added context window selection alongside model selection with: +- **Development**: 8K tokens (fast, good for most cases) +- **Production**: 16K tokens (balanced performance) +- **Advanced**: 32K+ tokens (heavy development work) + +### 2. Educational Content +Helps users understand: +- Why context window size matters for RAG +- Hardware implications of larger contexts +- Optimal settings for their use case +- Model-specific context capabilities + +### 3. Consistent Implementation +- Updated all Ollama API calls to use consistent context settings +- Ensured configuration applies across synthesis, expansion, and exploration +- Added validation for context sizes against model capabilities +- Provided clear error messages for invalid configurations + +## Technical Implementation + +Based on comprehensive research findings: + +### Model Context Capabilities +- **qwen3:0.6b/1.7b**: 32K token maximum +- **qwen3:4b**: 131K token maximum (YaRN extended) + +### Recommended Context Sizes +```yaml +# Conservative (fast, low memory) +num_ctx: 8192 # ~6MB memory, excellent for exploration + +# Balanced (recommended for most users) +num_ctx: 16384 # ~12MB memory, handles complex analysis + +# Advanced (heavy development work) +num_ctx: 32768 # ~24MB memory, supports large codebases +``` + +### Configuration Integration +- Added context window selection to TUI configuration menu +- Updated config.yaml schema with context parameters +- Implemented validation for model-specific limits +- Provided migration for existing configurations + +## Benefits + +1. **Improved User Experience** + - Longer conversation sessions + - Better analysis quality + - Clear performance expectations + +2. **Professional RAG Capability** + - Support for enterprise-scale projects + - Handles large codebases effectively + - Enables advanced use cases + +3. **Educational Value** + - Users learn about context windows + - Better understanding of RAG performance + - Informed decision making + +## Files Changed + +- `mini_rag/config.py`: Added context window configuration parameters +- `mini_rag/llm_synthesizer.py`: Dynamic context sizing with model awareness +- `mini_rag/explorer.py`: Consistent context application +- `rag-tui.py`: Enhanced configuration menu with context selection +- `PR_DRAFT.md`: Documentation of implementation approach + +## Testing Recommendations + +1. Test context configuration menu with different models +2. Verify context limits are enforced correctly +3. Test conversation length with different context sizes +4. Validate memory usage estimates +5. Test advanced use cases (15+ results, large chunks) + +--- + +**This PR significantly improves FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.** + +**Ready for review and testing!** 🚀 \ No newline at end of file diff --git a/PR_DRAFT.md b/PR_DRAFT.md index 3cf60ea..6364f65 100644 --- a/PR_DRAFT.md +++ b/PR_DRAFT.md @@ -91,17 +91,44 @@ num_ctx: 32768 # ~24MB memory, supports large codebases ## Implementation Plan 1. **Phase 1**: Research Ollama context handling (✅ Complete) -2. **Phase 2**: Update configuration system -3. **Phase 3**: Enhance TUI with context selection -4. **Phase 4**: Update all API calls consistently -5. **Phase 5**: Add documentation and validation +2. **Phase 2**: Update configuration system (✅ Complete) +3. **Phase 3**: Enhance TUI with context selection (✅ Complete) +4. **Phase 4**: Update all API calls consistently (✅ Complete) +5. **Phase 5**: Add documentation and validation (✅ Complete) -## Questions for Review +## Implementation Details -1. Should we auto-detect optimal context based on available memory? -2. How should we handle model changes that affect context capabilities? -3. Should context be per-model or global configuration? -4. What validation should we provide for context/model combinations? +### Configuration System +- Added `context_window` and `auto_context` to LLMConfig +- Default 16K context (vs problematic 2K default) +- Model-specific validation and limits +- YAML output includes helpful context explanations + +### TUI Enhancement +- New "Configure context window" menu option +- Educational content about context importance +- Three presets: Development (8K), Production (16K), Advanced (32K) +- Custom size entry with validation +- Memory usage estimates for each option + +### API Consistency +- Dynamic context sizing via `_get_optimal_context_size()` +- Model capability awareness (qwen3:4b = 131K, others = 32K) +- Applied consistently to synthesizer and explorer +- Automatic capping at model limits + +### User Education +- Clear explanations of why context matters for RAG +- Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB) +- Advanced use case guidance (15+ results, 4000+ chunks) +- Performance vs quality tradeoffs + +## Answers to Review Questions + +1. ✅ **Auto-detection**: Implemented via `auto_context` flag that respects model limits +2. ✅ **Model changes**: Dynamic validation against current model capabilities +3. ✅ **Scope**: Global configuration with per-model validation +4. ✅ **Validation**: Comprehensive validation with clear error messages and guidance ---