Clean up PR documentation files after Gitea workflow example
This commit is contained in:
parent
03d177c8e0
commit
5c9fb45dd1
109
PR_BODY.md
109
PR_BODY.md
@ -1,109 +0,0 @@
|
|||||||
## Problem Statement
|
|
||||||
|
|
||||||
Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
|
|
||||||
|
|
||||||
- **Default 2048 tokens** is inadequate for RAG applications
|
|
||||||
- Users can't configure context window for their hardware/use case
|
|
||||||
- No guidance on optimal context sizes for different models
|
|
||||||
- Inconsistent context handling across the codebase
|
|
||||||
- New users don't understand context window importance
|
|
||||||
|
|
||||||
## Impact on User Experience
|
|
||||||
|
|
||||||
**With 2048 token context window:**
|
|
||||||
- Only 1-2 responses possible before context truncation
|
|
||||||
- Thinking tokens consume significant context space
|
|
||||||
- Poor performance with larger document chunks
|
|
||||||
- Frustrated users who don't understand why responses degrade
|
|
||||||
|
|
||||||
**With proper context configuration:**
|
|
||||||
- 5-15+ responses in exploration mode
|
|
||||||
- Support for advanced use cases (15+ results, 4000+ character chunks)
|
|
||||||
- Better coding assistance and analysis
|
|
||||||
- Professional-grade RAG experience
|
|
||||||
|
|
||||||
## Solution Implemented
|
|
||||||
|
|
||||||
### 1. Enhanced Model Configuration Menu
|
|
||||||
Added context window selection alongside model selection with:
|
|
||||||
- **Development**: 8K tokens (fast, good for most cases)
|
|
||||||
- **Production**: 16K tokens (balanced performance)
|
|
||||||
- **Advanced**: 32K+ tokens (heavy development work)
|
|
||||||
|
|
||||||
### 2. Educational Content
|
|
||||||
Helps users understand:
|
|
||||||
- Why context window size matters for RAG
|
|
||||||
- Hardware implications of larger contexts
|
|
||||||
- Optimal settings for their use case
|
|
||||||
- Model-specific context capabilities
|
|
||||||
|
|
||||||
### 3. Consistent Implementation
|
|
||||||
- Updated all Ollama API calls to use consistent context settings
|
|
||||||
- Ensured configuration applies across synthesis, expansion, and exploration
|
|
||||||
- Added validation for context sizes against model capabilities
|
|
||||||
- Provided clear error messages for invalid configurations
|
|
||||||
|
|
||||||
## Technical Implementation
|
|
||||||
|
|
||||||
Based on comprehensive research findings:
|
|
||||||
|
|
||||||
### Model Context Capabilities
|
|
||||||
- **qwen3:0.6b/1.7b**: 32K token maximum
|
|
||||||
- **qwen3:4b**: 131K token maximum (YaRN extended)
|
|
||||||
|
|
||||||
### Recommended Context Sizes
|
|
||||||
```yaml
|
|
||||||
# Conservative (fast, low memory)
|
|
||||||
num_ctx: 8192 # ~6MB memory, excellent for exploration
|
|
||||||
|
|
||||||
# Balanced (recommended for most users)
|
|
||||||
num_ctx: 16384 # ~12MB memory, handles complex analysis
|
|
||||||
|
|
||||||
# Advanced (heavy development work)
|
|
||||||
num_ctx: 32768 # ~24MB memory, supports large codebases
|
|
||||||
```
|
|
||||||
|
|
||||||
### Configuration Integration
|
|
||||||
- Added context window selection to TUI configuration menu
|
|
||||||
- Updated config.yaml schema with context parameters
|
|
||||||
- Implemented validation for model-specific limits
|
|
||||||
- Provided migration for existing configurations
|
|
||||||
|
|
||||||
## Benefits
|
|
||||||
|
|
||||||
1. **Improved User Experience**
|
|
||||||
- Longer conversation sessions
|
|
||||||
- Better analysis quality
|
|
||||||
- Clear performance expectations
|
|
||||||
|
|
||||||
2. **Professional RAG Capability**
|
|
||||||
- Support for enterprise-scale projects
|
|
||||||
- Handles large codebases effectively
|
|
||||||
- Enables advanced use cases
|
|
||||||
|
|
||||||
3. **Educational Value**
|
|
||||||
- Users learn about context windows
|
|
||||||
- Better understanding of RAG performance
|
|
||||||
- Informed decision making
|
|
||||||
|
|
||||||
## Files Changed
|
|
||||||
|
|
||||||
- `mini_rag/config.py`: Added context window configuration parameters
|
|
||||||
- `mini_rag/llm_synthesizer.py`: Dynamic context sizing with model awareness
|
|
||||||
- `mini_rag/explorer.py`: Consistent context application
|
|
||||||
- `rag-tui.py`: Enhanced configuration menu with context selection
|
|
||||||
- `PR_DRAFT.md`: Documentation of implementation approach
|
|
||||||
|
|
||||||
## Testing Recommendations
|
|
||||||
|
|
||||||
1. Test context configuration menu with different models
|
|
||||||
2. Verify context limits are enforced correctly
|
|
||||||
3. Test conversation length with different context sizes
|
|
||||||
4. Validate memory usage estimates
|
|
||||||
5. Test advanced use cases (15+ results, large chunks)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**This PR significantly improves FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
|
|
||||||
|
|
||||||
**Ready for review and testing!** 🚀
|
|
||||||
135
PR_DRAFT.md
135
PR_DRAFT.md
@ -1,135 +0,0 @@
|
|||||||
# Add Context Window Configuration for Optimal RAG Performance
|
|
||||||
|
|
||||||
## Problem Statement
|
|
||||||
|
|
||||||
Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
|
|
||||||
|
|
||||||
- **Default 2048 tokens** is inadequate for RAG applications
|
|
||||||
- Users can't configure context window for their hardware/use case
|
|
||||||
- No guidance on optimal context sizes for different models
|
|
||||||
- Inconsistent context handling across the codebase
|
|
||||||
- New users don't understand context window importance
|
|
||||||
|
|
||||||
## Impact on User Experience
|
|
||||||
|
|
||||||
**With 2048 token context window:**
|
|
||||||
- Only 1-2 responses possible before context truncation
|
|
||||||
- Thinking tokens consume significant context space
|
|
||||||
- Poor performance with larger document chunks
|
|
||||||
- Frustrated users who don't understand why responses degrade
|
|
||||||
|
|
||||||
**With proper context configuration:**
|
|
||||||
- 5-15+ responses in exploration mode
|
|
||||||
- Support for advanced use cases (15+ results, 4000+ character chunks)
|
|
||||||
- Better coding assistance and analysis
|
|
||||||
- Professional-grade RAG experience
|
|
||||||
|
|
||||||
## Proposed Solution
|
|
||||||
|
|
||||||
### 1. Enhanced Model Configuration Menu
|
|
||||||
Add context window selection alongside model selection with:
|
|
||||||
- **Development**: 8K tokens (fast, good for most cases)
|
|
||||||
- **Production**: 16K tokens (balanced performance)
|
|
||||||
- **Advanced**: 32K+ tokens (heavy development work)
|
|
||||||
|
|
||||||
### 2. Educational Content
|
|
||||||
Help users understand:
|
|
||||||
- Why context window size matters for RAG
|
|
||||||
- Hardware implications of larger contexts
|
|
||||||
- Optimal settings for their use case
|
|
||||||
- Model-specific context capabilities
|
|
||||||
|
|
||||||
### 3. Consistent Implementation
|
|
||||||
- Update all Ollama API calls to use consistent context settings
|
|
||||||
- Ensure configuration applies across synthesis, expansion, and exploration
|
|
||||||
- Validate context sizes against model capabilities
|
|
||||||
- Provide clear error messages for invalid configurations
|
|
||||||
|
|
||||||
## Technical Implementation
|
|
||||||
|
|
||||||
Based on research findings:
|
|
||||||
|
|
||||||
### Model Context Capabilities
|
|
||||||
- **qwen3:0.6b/1.7b**: 32K token maximum
|
|
||||||
- **qwen3:4b**: 131K token maximum (YaRN extended)
|
|
||||||
|
|
||||||
### Recommended Context Sizes
|
|
||||||
```yaml
|
|
||||||
# Conservative (fast, low memory)
|
|
||||||
num_ctx: 8192 # ~6MB memory, excellent for exploration
|
|
||||||
|
|
||||||
# Balanced (recommended for most users)
|
|
||||||
num_ctx: 16384 # ~12MB memory, handles complex analysis
|
|
||||||
|
|
||||||
# Advanced (heavy development work)
|
|
||||||
num_ctx: 32768 # ~24MB memory, supports large codebases
|
|
||||||
```
|
|
||||||
|
|
||||||
### Configuration Integration
|
|
||||||
- Add context window selection to TUI configuration menu
|
|
||||||
- Update config.yaml schema with context parameters
|
|
||||||
- Implement validation for model-specific limits
|
|
||||||
- Provide migration for existing configurations
|
|
||||||
|
|
||||||
## Benefits
|
|
||||||
|
|
||||||
1. **Improved User Experience**
|
|
||||||
- Longer conversation sessions
|
|
||||||
- Better analysis quality
|
|
||||||
- Clear performance expectations
|
|
||||||
|
|
||||||
2. **Professional RAG Capability**
|
|
||||||
- Support for enterprise-scale projects
|
|
||||||
- Handles large codebases effectively
|
|
||||||
- Enables advanced use cases
|
|
||||||
|
|
||||||
3. **Educational Value**
|
|
||||||
- Users learn about context windows
|
|
||||||
- Better understanding of RAG performance
|
|
||||||
- Informed decision making
|
|
||||||
|
|
||||||
## Implementation Plan
|
|
||||||
|
|
||||||
1. **Phase 1**: Research Ollama context handling (✅ Complete)
|
|
||||||
2. **Phase 2**: Update configuration system (✅ Complete)
|
|
||||||
3. **Phase 3**: Enhance TUI with context selection (✅ Complete)
|
|
||||||
4. **Phase 4**: Update all API calls consistently (✅ Complete)
|
|
||||||
5. **Phase 5**: Add documentation and validation (✅ Complete)
|
|
||||||
|
|
||||||
## Implementation Details
|
|
||||||
|
|
||||||
### Configuration System
|
|
||||||
- Added `context_window` and `auto_context` to LLMConfig
|
|
||||||
- Default 16K context (vs problematic 2K default)
|
|
||||||
- Model-specific validation and limits
|
|
||||||
- YAML output includes helpful context explanations
|
|
||||||
|
|
||||||
### TUI Enhancement
|
|
||||||
- New "Configure context window" menu option
|
|
||||||
- Educational content about context importance
|
|
||||||
- Three presets: Development (8K), Production (16K), Advanced (32K)
|
|
||||||
- Custom size entry with validation
|
|
||||||
- Memory usage estimates for each option
|
|
||||||
|
|
||||||
### API Consistency
|
|
||||||
- Dynamic context sizing via `_get_optimal_context_size()`
|
|
||||||
- Model capability awareness (qwen3:4b = 131K, others = 32K)
|
|
||||||
- Applied consistently to synthesizer and explorer
|
|
||||||
- Automatic capping at model limits
|
|
||||||
|
|
||||||
### User Education
|
|
||||||
- Clear explanations of why context matters for RAG
|
|
||||||
- Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB)
|
|
||||||
- Advanced use case guidance (15+ results, 4000+ chunks)
|
|
||||||
- Performance vs quality tradeoffs
|
|
||||||
|
|
||||||
## Answers to Review Questions
|
|
||||||
|
|
||||||
1. ✅ **Auto-detection**: Implemented via `auto_context` flag that respects model limits
|
|
||||||
2. ✅ **Model changes**: Dynamic validation against current model capabilities
|
|
||||||
3. ✅ **Scope**: Global configuration with per-model validation
|
|
||||||
4. ✅ **Validation**: Comprehensive validation with clear error messages and guidance
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
|
|
||||||
Loading…
x
Reference in New Issue
Block a user