fss-mini-rag-github/PR_DRAFT.md

4.7 KiB

Add Context Window Configuration for Optimal RAG Performance

Problem Statement

Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:

  • Default 2048 tokens is inadequate for RAG applications
  • Users can't configure context window for their hardware/use case
  • No guidance on optimal context sizes for different models
  • Inconsistent context handling across the codebase
  • New users don't understand context window importance

Impact on User Experience

With 2048 token context window:

  • Only 1-2 responses possible before context truncation
  • Thinking tokens consume significant context space
  • Poor performance with larger document chunks
  • Frustrated users who don't understand why responses degrade

With proper context configuration:

  • 5-15+ responses in exploration mode
  • Support for advanced use cases (15+ results, 4000+ character chunks)
  • Better coding assistance and analysis
  • Professional-grade RAG experience

Proposed Solution

1. Enhanced Model Configuration Menu

Add context window selection alongside model selection with:

  • Development: 8K tokens (fast, good for most cases)
  • Production: 16K tokens (balanced performance)
  • Advanced: 32K+ tokens (heavy development work)

2. Educational Content

Help users understand:

  • Why context window size matters for RAG
  • Hardware implications of larger contexts
  • Optimal settings for their use case
  • Model-specific context capabilities

3. Consistent Implementation

  • Update all Ollama API calls to use consistent context settings
  • Ensure configuration applies across synthesis, expansion, and exploration
  • Validate context sizes against model capabilities
  • Provide clear error messages for invalid configurations

Technical Implementation

Based on research findings:

Model Context Capabilities

  • qwen3:0.6b/1.7b: 32K token maximum
  • qwen3:4b: 131K token maximum (YaRN extended)
# Conservative (fast, low memory)
num_ctx: 8192    # ~6MB memory, excellent for exploration

# Balanced (recommended for most users)  
num_ctx: 16384   # ~12MB memory, handles complex analysis

# Advanced (heavy development work)
num_ctx: 32768   # ~24MB memory, supports large codebases

Configuration Integration

  • Add context window selection to TUI configuration menu
  • Update config.yaml schema with context parameters
  • Implement validation for model-specific limits
  • Provide migration for existing configurations

Benefits

  1. Improved User Experience

    • Longer conversation sessions
    • Better analysis quality
    • Clear performance expectations
  2. Professional RAG Capability

    • Support for enterprise-scale projects
    • Handles large codebases effectively
    • Enables advanced use cases
  3. Educational Value

    • Users learn about context windows
    • Better understanding of RAG performance
    • Informed decision making

Implementation Plan

  1. Phase 1: Research Ollama context handling ( Complete)
  2. Phase 2: Update configuration system ( Complete)
  3. Phase 3: Enhance TUI with context selection ( Complete)
  4. Phase 4: Update all API calls consistently ( Complete)
  5. Phase 5: Add documentation and validation ( Complete)

Implementation Details

Configuration System

  • Added context_window and auto_context to LLMConfig
  • Default 16K context (vs problematic 2K default)
  • Model-specific validation and limits
  • YAML output includes helpful context explanations

TUI Enhancement

  • New "Configure context window" menu option
  • Educational content about context importance
  • Three presets: Development (8K), Production (16K), Advanced (32K)
  • Custom size entry with validation
  • Memory usage estimates for each option

API Consistency

  • Dynamic context sizing via _get_optimal_context_size()
  • Model capability awareness (qwen3:4b = 131K, others = 32K)
  • Applied consistently to synthesizer and explorer
  • Automatic capping at model limits

User Education

  • Clear explanations of why context matters for RAG
  • Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB)
  • Advanced use case guidance (15+ results, 4000+ chunks)
  • Performance vs quality tradeoffs

Answers to Review Questions

  1. Auto-detection: Implemented via auto_context flag that respects model limits
  2. Model changes: Dynamic validation against current model capabilities
  3. Scope: Global configuration with per-model validation
  4. Validation: Comprehensive validation with clear error messages and guidance

This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.