fss-mini-rag-github/PR_DRAFT.md
BobAi a189a4fe29 Implement comprehensive context window configuration system
Add intelligent context window management for optimal RAG performance:

## Core Features
- Dynamic context sizing based on model capabilities
- User-friendly configuration menu with Development/Production/Advanced presets
- Automatic validation against model limits (qwen3:0.6b/1.7b = 32K, qwen3:4b = 131K)
- Educational content explaining context window importance for RAG

## Technical Implementation
- Enhanced LLMConfig with context_window and auto_context parameters
- Intelligent _get_optimal_context_size() method with model-specific limits
- Consistent context application across synthesizer and explorer
- YAML configuration output with helpful context explanations

## User Experience Improvements
- Clear context window display in configuration status
- Guided selection: Development (8K), Production (16K), Advanced (32K)
- Memory usage estimates and performance guidance
- Validation prevents invalid context/model combinations

## Educational Value
- Explains why default 2048 tokens fails for RAG
- Shows relationship between context size and conversation length
- Guides users toward optimal settings for their use case
- Highlights advanced capabilities (15+ results, 4000+ character chunks)

This addresses the critical issue where Ollama's default context severely
limits RAG performance, providing users with proper configuration tools
and understanding of this crucial parameter.
2025-08-15 13:09:53 +10:00

3.6 KiB

Add Context Window Configuration for Optimal RAG Performance

Problem Statement

Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:

  • Default 2048 tokens is inadequate for RAG applications
  • Users can't configure context window for their hardware/use case
  • No guidance on optimal context sizes for different models
  • Inconsistent context handling across the codebase
  • New users don't understand context window importance

Impact on User Experience

With 2048 token context window:

  • Only 1-2 responses possible before context truncation
  • Thinking tokens consume significant context space
  • Poor performance with larger document chunks
  • Frustrated users who don't understand why responses degrade

With proper context configuration:

  • 5-15+ responses in exploration mode
  • Support for advanced use cases (15+ results, 4000+ character chunks)
  • Better coding assistance and analysis
  • Professional-grade RAG experience

Proposed Solution

1. Enhanced Model Configuration Menu

Add context window selection alongside model selection with:

  • Development: 8K tokens (fast, good for most cases)
  • Production: 16K tokens (balanced performance)
  • Advanced: 32K+ tokens (heavy development work)

2. Educational Content

Help users understand:

  • Why context window size matters for RAG
  • Hardware implications of larger contexts
  • Optimal settings for their use case
  • Model-specific context capabilities

3. Consistent Implementation

  • Update all Ollama API calls to use consistent context settings
  • Ensure configuration applies across synthesis, expansion, and exploration
  • Validate context sizes against model capabilities
  • Provide clear error messages for invalid configurations

Technical Implementation

Based on research findings:

Model Context Capabilities

  • qwen3:0.6b/1.7b: 32K token maximum
  • qwen3:4b: 131K token maximum (YaRN extended)
# Conservative (fast, low memory)
num_ctx: 8192    # ~6MB memory, excellent for exploration

# Balanced (recommended for most users)  
num_ctx: 16384   # ~12MB memory, handles complex analysis

# Advanced (heavy development work)
num_ctx: 32768   # ~24MB memory, supports large codebases

Configuration Integration

  • Add context window selection to TUI configuration menu
  • Update config.yaml schema with context parameters
  • Implement validation for model-specific limits
  • Provide migration for existing configurations

Benefits

  1. Improved User Experience

    • Longer conversation sessions
    • Better analysis quality
    • Clear performance expectations
  2. Professional RAG Capability

    • Support for enterprise-scale projects
    • Handles large codebases effectively
    • Enables advanced use cases
  3. Educational Value

    • Users learn about context windows
    • Better understanding of RAG performance
    • Informed decision making

Implementation Plan

  1. Phase 1: Research Ollama context handling ( Complete)
  2. Phase 2: Update configuration system
  3. Phase 3: Enhance TUI with context selection
  4. Phase 4: Update all API calls consistently
  5. Phase 5: Add documentation and validation

Questions for Review

  1. Should we auto-detect optimal context based on available memory?
  2. How should we handle model changes that affect context capabilities?
  3. Should context be per-model or global configuration?
  4. What validation should we provide for context/model combinations?

This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.