From 5c9fb45dd14450af486db845b68a8091fa19d2b1 Mon Sep 17 00:00:00 2001
From: BobAi <brett@foxsoftwaresolutions.com.au>
Date: Fri, 15 Aug 2025 14:04:52 +1000
Subject: [PATCH] Clean up PR documentation files after Gitea workflow example

---
 PR_BODY.md  | 109 ------------------------------------------
 PR_DRAFT.md | 135 ----------------------------------------------------
 2 files changed, 244 deletions(-)
 delete mode 100644 PR_BODY.md
 delete mode 100644 PR_DRAFT.md

diff --git a/PR_BODY.md b/PR_BODY.md
deleted file mode 100644
index d8d9f57..0000000
--- a/PR_BODY.md
+++ /dev/null
@@ -1,109 +0,0 @@
-## Problem Statement
-
-Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
-
-- **Default 2048 tokens** is inadequate for RAG applications
-- Users can't configure context window for their hardware/use case
-- No guidance on optimal context sizes for different models
-- Inconsistent context handling across the codebase
-- New users don't understand context window importance
-
-## Impact on User Experience
-
-**With 2048 token context window:**
-- Only 1-2 responses possible before context truncation
-- Thinking tokens consume significant context space
-- Poor performance with larger document chunks
-- Frustrated users who don't understand why responses degrade
-
-**With proper context configuration:**
-- 5-15+ responses in exploration mode
-- Support for advanced use cases (15+ results, 4000+ character chunks)
-- Better coding assistance and analysis
-- Professional-grade RAG experience
-
-## Solution Implemented
-
-### 1. Enhanced Model Configuration Menu
-Added context window selection alongside model selection with:
-- **Development**: 8K tokens (fast, good for most cases)
-- **Production**: 16K tokens (balanced performance)  
-- **Advanced**: 32K+ tokens (heavy development work)
-
-### 2. Educational Content
-Helps users understand:
-- Why context window size matters for RAG
-- Hardware implications of larger contexts
-- Optimal settings for their use case
-- Model-specific context capabilities
-
-### 3. Consistent Implementation
-- Updated all Ollama API calls to use consistent context settings
-- Ensured configuration applies across synthesis, expansion, and exploration
-- Added validation for context sizes against model capabilities
-- Provided clear error messages for invalid configurations
-
-## Technical Implementation
-
-Based on comprehensive research findings:
-
-### Model Context Capabilities
-- **qwen3:0.6b/1.7b**: 32K token maximum
-- **qwen3:4b**: 131K token maximum (YaRN extended)
-
-### Recommended Context Sizes
-```yaml
-# Conservative (fast, low memory)
-num_ctx: 8192    # ~6MB memory, excellent for exploration
-
-# Balanced (recommended for most users)  
-num_ctx: 16384   # ~12MB memory, handles complex analysis
-
-# Advanced (heavy development work)
-num_ctx: 32768   # ~24MB memory, supports large codebases
-```
-
-### Configuration Integration
-- Added context window selection to TUI configuration menu
-- Updated config.yaml schema with context parameters
-- Implemented validation for model-specific limits
-- Provided migration for existing configurations
-
-## Benefits
-
-1. **Improved User Experience**
-   - Longer conversation sessions
-   - Better analysis quality
-   - Clear performance expectations
-
-2. **Professional RAG Capability**
-   - Support for enterprise-scale projects
-   - Handles large codebases effectively
-   - Enables advanced use cases
-
-3. **Educational Value**
-   - Users learn about context windows
-   - Better understanding of RAG performance
-   - Informed decision making
-
-## Files Changed
-
-- `mini_rag/config.py`: Added context window configuration parameters
-- `mini_rag/llm_synthesizer.py`: Dynamic context sizing with model awareness
-- `mini_rag/explorer.py`: Consistent context application
-- `rag-tui.py`: Enhanced configuration menu with context selection
-- `PR_DRAFT.md`: Documentation of implementation approach
-
-## Testing Recommendations
-
-1. Test context configuration menu with different models
-2. Verify context limits are enforced correctly
-3. Test conversation length with different context sizes
-4. Validate memory usage estimates
-5. Test advanced use cases (15+ results, large chunks)
-
----
-
-**This PR significantly improves FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
-
-**Ready for review and testing!** 🚀
\ No newline at end of file
diff --git a/PR_DRAFT.md b/PR_DRAFT.md
deleted file mode 100644
index 6364f65..0000000
--- a/PR_DRAFT.md
+++ /dev/null
@@ -1,135 +0,0 @@
-# Add Context Window Configuration for Optimal RAG Performance
-
-## Problem Statement
-
-Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
-
-- **Default 2048 tokens** is inadequate for RAG applications
-- Users can't configure context window for their hardware/use case
-- No guidance on optimal context sizes for different models
-- Inconsistent context handling across the codebase
-- New users don't understand context window importance
-
-## Impact on User Experience
-
-**With 2048 token context window:**
-- Only 1-2 responses possible before context truncation
-- Thinking tokens consume significant context space
-- Poor performance with larger document chunks
-- Frustrated users who don't understand why responses degrade
-
-**With proper context configuration:**
-- 5-15+ responses in exploration mode
-- Support for advanced use cases (15+ results, 4000+ character chunks)
-- Better coding assistance and analysis
-- Professional-grade RAG experience
-
-## Proposed Solution
-
-### 1. Enhanced Model Configuration Menu
-Add context window selection alongside model selection with:
-- **Development**: 8K tokens (fast, good for most cases)
-- **Production**: 16K tokens (balanced performance)  
-- **Advanced**: 32K+ tokens (heavy development work)
-
-### 2. Educational Content
-Help users understand:
-- Why context window size matters for RAG
-- Hardware implications of larger contexts
-- Optimal settings for their use case
-- Model-specific context capabilities
-
-### 3. Consistent Implementation
-- Update all Ollama API calls to use consistent context settings
-- Ensure configuration applies across synthesis, expansion, and exploration
-- Validate context sizes against model capabilities
-- Provide clear error messages for invalid configurations
-
-## Technical Implementation
-
-Based on research findings:
-
-### Model Context Capabilities
-- **qwen3:0.6b/1.7b**: 32K token maximum
-- **qwen3:4b**: 131K token maximum (YaRN extended)
-
-### Recommended Context Sizes
-```yaml
-# Conservative (fast, low memory)
-num_ctx: 8192    # ~6MB memory, excellent for exploration
-
-# Balanced (recommended for most users)  
-num_ctx: 16384   # ~12MB memory, handles complex analysis
-
-# Advanced (heavy development work)
-num_ctx: 32768   # ~24MB memory, supports large codebases
-```
-
-### Configuration Integration
-- Add context window selection to TUI configuration menu
-- Update config.yaml schema with context parameters
-- Implement validation for model-specific limits
-- Provide migration for existing configurations
-
-## Benefits
-
-1. **Improved User Experience**
-   - Longer conversation sessions
-   - Better analysis quality
-   - Clear performance expectations
-
-2. **Professional RAG Capability**
-   - Support for enterprise-scale projects
-   - Handles large codebases effectively
-   - Enables advanced use cases
-
-3. **Educational Value**
-   - Users learn about context windows
-   - Better understanding of RAG performance
-   - Informed decision making
-
-## Implementation Plan
-
-1. **Phase 1**: Research Ollama context handling (✅ Complete)
-2. **Phase 2**: Update configuration system (✅ Complete)
-3. **Phase 3**: Enhance TUI with context selection (✅ Complete)
-4. **Phase 4**: Update all API calls consistently (✅ Complete)
-5. **Phase 5**: Add documentation and validation (✅ Complete)
-
-## Implementation Details
-
-### Configuration System
-- Added `context_window` and `auto_context` to LLMConfig
-- Default 16K context (vs problematic 2K default)
-- Model-specific validation and limits
-- YAML output includes helpful context explanations
-
-### TUI Enhancement
-- New "Configure context window" menu option
-- Educational content about context importance
-- Three presets: Development (8K), Production (16K), Advanced (32K)
-- Custom size entry with validation
-- Memory usage estimates for each option
-
-### API Consistency
-- Dynamic context sizing via `_get_optimal_context_size()`
-- Model capability awareness (qwen3:4b = 131K, others = 32K)
-- Applied consistently to synthesizer and explorer
-- Automatic capping at model limits
-
-### User Education
-- Clear explanations of why context matters for RAG
-- Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB)
-- Advanced use case guidance (15+ results, 4000+ chunks)
-- Performance vs quality tradeoffs
-
-## Answers to Review Questions
-
-1. ✅ **Auto-detection**: Implemented via `auto_context` flag that respects model limits
-2. ✅ **Model changes**: Dynamic validation against current model capabilities  
-3. ✅ **Scope**: Global configuration with per-model validation
-4. ✅ **Validation**: Comprehensive validation with clear error messages and guidance
-
----
-
-**This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
\ No newline at end of file