From 5c9fb45dd14450af486db845b68a8091fa19d2b1 Mon Sep 17 00:00:00 2001 From: BobAi Date: Fri, 15 Aug 2025 14:04:52 +1000 Subject: [PATCH] Clean up PR documentation files after Gitea workflow example --- PR_BODY.md | 109 ------------------------------------------ PR_DRAFT.md | 135 ---------------------------------------------------- 2 files changed, 244 deletions(-) delete mode 100644 PR_BODY.md delete mode 100644 PR_DRAFT.md diff --git a/PR_BODY.md b/PR_BODY.md deleted file mode 100644 index d8d9f57..0000000 --- a/PR_BODY.md +++ /dev/null @@ -1,109 +0,0 @@ -## Problem Statement - -Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance: - -- **Default 2048 tokens** is inadequate for RAG applications -- Users can't configure context window for their hardware/use case -- No guidance on optimal context sizes for different models -- Inconsistent context handling across the codebase -- New users don't understand context window importance - -## Impact on User Experience - -**With 2048 token context window:** -- Only 1-2 responses possible before context truncation -- Thinking tokens consume significant context space -- Poor performance with larger document chunks -- Frustrated users who don't understand why responses degrade - -**With proper context configuration:** -- 5-15+ responses in exploration mode -- Support for advanced use cases (15+ results, 4000+ character chunks) -- Better coding assistance and analysis -- Professional-grade RAG experience - -## Solution Implemented - -### 1. Enhanced Model Configuration Menu -Added context window selection alongside model selection with: -- **Development**: 8K tokens (fast, good for most cases) -- **Production**: 16K tokens (balanced performance) -- **Advanced**: 32K+ tokens (heavy development work) - -### 2. Educational Content -Helps users understand: -- Why context window size matters for RAG -- Hardware implications of larger contexts -- Optimal settings for their use case -- Model-specific context capabilities - -### 3. Consistent Implementation -- Updated all Ollama API calls to use consistent context settings -- Ensured configuration applies across synthesis, expansion, and exploration -- Added validation for context sizes against model capabilities -- Provided clear error messages for invalid configurations - -## Technical Implementation - -Based on comprehensive research findings: - -### Model Context Capabilities -- **qwen3:0.6b/1.7b**: 32K token maximum -- **qwen3:4b**: 131K token maximum (YaRN extended) - -### Recommended Context Sizes -```yaml -# Conservative (fast, low memory) -num_ctx: 8192 # ~6MB memory, excellent for exploration - -# Balanced (recommended for most users) -num_ctx: 16384 # ~12MB memory, handles complex analysis - -# Advanced (heavy development work) -num_ctx: 32768 # ~24MB memory, supports large codebases -``` - -### Configuration Integration -- Added context window selection to TUI configuration menu -- Updated config.yaml schema with context parameters -- Implemented validation for model-specific limits -- Provided migration for existing configurations - -## Benefits - -1. **Improved User Experience** - - Longer conversation sessions - - Better analysis quality - - Clear performance expectations - -2. **Professional RAG Capability** - - Support for enterprise-scale projects - - Handles large codebases effectively - - Enables advanced use cases - -3. **Educational Value** - - Users learn about context windows - - Better understanding of RAG performance - - Informed decision making - -## Files Changed - -- `mini_rag/config.py`: Added context window configuration parameters -- `mini_rag/llm_synthesizer.py`: Dynamic context sizing with model awareness -- `mini_rag/explorer.py`: Consistent context application -- `rag-tui.py`: Enhanced configuration menu with context selection -- `PR_DRAFT.md`: Documentation of implementation approach - -## Testing Recommendations - -1. Test context configuration menu with different models -2. Verify context limits are enforced correctly -3. Test conversation length with different context sizes -4. Validate memory usage estimates -5. Test advanced use cases (15+ results, large chunks) - ---- - -**This PR significantly improves FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.** - -**Ready for review and testing!** 🚀 \ No newline at end of file diff --git a/PR_DRAFT.md b/PR_DRAFT.md deleted file mode 100644 index 6364f65..0000000 --- a/PR_DRAFT.md +++ /dev/null @@ -1,135 +0,0 @@ -# Add Context Window Configuration for Optimal RAG Performance - -## Problem Statement - -Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance: - -- **Default 2048 tokens** is inadequate for RAG applications -- Users can't configure context window for their hardware/use case -- No guidance on optimal context sizes for different models -- Inconsistent context handling across the codebase -- New users don't understand context window importance - -## Impact on User Experience - -**With 2048 token context window:** -- Only 1-2 responses possible before context truncation -- Thinking tokens consume significant context space -- Poor performance with larger document chunks -- Frustrated users who don't understand why responses degrade - -**With proper context configuration:** -- 5-15+ responses in exploration mode -- Support for advanced use cases (15+ results, 4000+ character chunks) -- Better coding assistance and analysis -- Professional-grade RAG experience - -## Proposed Solution - -### 1. Enhanced Model Configuration Menu -Add context window selection alongside model selection with: -- **Development**: 8K tokens (fast, good for most cases) -- **Production**: 16K tokens (balanced performance) -- **Advanced**: 32K+ tokens (heavy development work) - -### 2. Educational Content -Help users understand: -- Why context window size matters for RAG -- Hardware implications of larger contexts -- Optimal settings for their use case -- Model-specific context capabilities - -### 3. Consistent Implementation -- Update all Ollama API calls to use consistent context settings -- Ensure configuration applies across synthesis, expansion, and exploration -- Validate context sizes against model capabilities -- Provide clear error messages for invalid configurations - -## Technical Implementation - -Based on research findings: - -### Model Context Capabilities -- **qwen3:0.6b/1.7b**: 32K token maximum -- **qwen3:4b**: 131K token maximum (YaRN extended) - -### Recommended Context Sizes -```yaml -# Conservative (fast, low memory) -num_ctx: 8192 # ~6MB memory, excellent for exploration - -# Balanced (recommended for most users) -num_ctx: 16384 # ~12MB memory, handles complex analysis - -# Advanced (heavy development work) -num_ctx: 32768 # ~24MB memory, supports large codebases -``` - -### Configuration Integration -- Add context window selection to TUI configuration menu -- Update config.yaml schema with context parameters -- Implement validation for model-specific limits -- Provide migration for existing configurations - -## Benefits - -1. **Improved User Experience** - - Longer conversation sessions - - Better analysis quality - - Clear performance expectations - -2. **Professional RAG Capability** - - Support for enterprise-scale projects - - Handles large codebases effectively - - Enables advanced use cases - -3. **Educational Value** - - Users learn about context windows - - Better understanding of RAG performance - - Informed decision making - -## Implementation Plan - -1. **Phase 1**: Research Ollama context handling (✅ Complete) -2. **Phase 2**: Update configuration system (✅ Complete) -3. **Phase 3**: Enhance TUI with context selection (✅ Complete) -4. **Phase 4**: Update all API calls consistently (✅ Complete) -5. **Phase 5**: Add documentation and validation (✅ Complete) - -## Implementation Details - -### Configuration System -- Added `context_window` and `auto_context` to LLMConfig -- Default 16K context (vs problematic 2K default) -- Model-specific validation and limits -- YAML output includes helpful context explanations - -### TUI Enhancement -- New "Configure context window" menu option -- Educational content about context importance -- Three presets: Development (8K), Production (16K), Advanced (32K) -- Custom size entry with validation -- Memory usage estimates for each option - -### API Consistency -- Dynamic context sizing via `_get_optimal_context_size()` -- Model capability awareness (qwen3:4b = 131K, others = 32K) -- Applied consistently to synthesizer and explorer -- Automatic capping at model limits - -### User Education -- Clear explanations of why context matters for RAG -- Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB) -- Advanced use case guidance (15+ results, 4000+ chunks) -- Performance vs quality tradeoffs - -## Answers to Review Questions - -1. ✅ **Auto-detection**: Implemented via `auto_context` flag that respects model limits -2. ✅ **Model changes**: Dynamic validation against current model capabilities -3. ✅ **Scope**: Global configuration with per-model validation -4. ✅ **Validation**: Comprehensive validation with clear error messages and guidance - ---- - -**This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.** \ No newline at end of file