25 changed files with 257 additions and 2585 deletions
--- a/.gitignore
+++ b/.gitignore
@ -41,14 +41,10 @@ Thumbs.db
 # RAG system specific
 .claude-rag/
 .mini-rag/
 *.lance/
 *.db
 manifest.json
 # Claude Code specific
 .claude/
 # Logs and temporary files
 *.log
 *.tmp
--- a/PR_BODY.md
+++ b/PR_BODY.md
@ -1,109 +0,0 @@
 ## Problem Statement
 Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
 - **Default 2048 tokens** is inadequate for RAG applications
 - Users can't configure context window for their hardware/use case
 - No guidance on optimal context sizes for different models
 - Inconsistent context handling across the codebase
 - New users don't understand context window importance
 ## Impact on User Experience
 **With 2048 token context window:**
 - Only 1-2 responses possible before context truncation
 - Thinking tokens consume significant context space
 - Poor performance with larger document chunks
 - Frustrated users who don't understand why responses degrade
 **With proper context configuration:**
 - 5-15+ responses in exploration mode
 - Support for advanced use cases (15+ results, 4000+ character chunks)
 - Better coding assistance and analysis
 - Professional-grade RAG experience
 ## Solution Implemented
 ### 1. Enhanced Model Configuration Menu
 Added context window selection alongside model selection with:
 - **Development**: 8K tokens (fast, good for most cases)
 - **Production**: 16K tokens (balanced performance)  
 - **Advanced**: 32K+ tokens (heavy development work)
 ### 2. Educational Content
 Helps users understand:
 - Why context window size matters for RAG
 - Hardware implications of larger contexts
 - Optimal settings for their use case
 - Model-specific context capabilities
 ### 3. Consistent Implementation
 - Updated all Ollama API calls to use consistent context settings
 - Ensured configuration applies across synthesis, expansion, and exploration
 - Added validation for context sizes against model capabilities
 - Provided clear error messages for invalid configurations
 ## Technical Implementation
 Based on comprehensive research findings:
 ### Model Context Capabilities
 - **qwen3:0.6b/1.7b**: 32K token maximum
 - **qwen3:4b**: 131K token maximum (YaRN extended)
 ### Recommended Context Sizes
 ```yaml
 # Conservative (fast, low memory)
 num_ctx: 8192    # ~6MB memory, excellent for exploration
 # Balanced (recommended for most users)  
 num_ctx: 16384   # ~12MB memory, handles complex analysis
 # Advanced (heavy development work)
 num_ctx: 32768   # ~24MB memory, supports large codebases
 ```
 ### Configuration Integration
 - Added context window selection to TUI configuration menu
 - Updated config.yaml schema with context parameters
 - Implemented validation for model-specific limits
 - Provided migration for existing configurations
 ## Benefits
 1. **Improved User Experience**
   - Longer conversation sessions
   - Better analysis quality
   - Clear performance expectations
 2. **Professional RAG Capability**
   - Support for enterprise-scale projects
   - Handles large codebases effectively
   - Enables advanced use cases
 3. **Educational Value**
   - Users learn about context windows
   - Better understanding of RAG performance
   - Informed decision making
 ## Files Changed
 - `mini_rag/config.py`: Added context window configuration parameters
 - `mini_rag/llm_synthesizer.py`: Dynamic context sizing with model awareness
 - `mini_rag/explorer.py`: Consistent context application
 - `rag-tui.py`: Enhanced configuration menu with context selection
 - `PR_DRAFT.md`: Documentation of implementation approach
 ## Testing Recommendations
 1. Test context configuration menu with different models
 2. Verify context limits are enforced correctly
 3. Test conversation length with different context sizes
 4. Validate memory usage estimates
 5. Test advanced use cases (15+ results, large chunks)
 ---
 **This PR significantly improves FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
 **Ready for review and testing!** 🚀
--- a/PR_DRAFT.md
+++ b/PR_DRAFT.md
@ -1,135 +0,0 @@
 # Add Context Window Configuration for Optimal RAG Performance
 ## Problem Statement
 Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
 - **Default 2048 tokens** is inadequate for RAG applications
 - Users can't configure context window for their hardware/use case
 - No guidance on optimal context sizes for different models
 - Inconsistent context handling across the codebase
 - New users don't understand context window importance
 ## Impact on User Experience
 **With 2048 token context window:**
 - Only 1-2 responses possible before context truncation
 - Thinking tokens consume significant context space
 - Poor performance with larger document chunks
 - Frustrated users who don't understand why responses degrade
 **With proper context configuration:**
 - 5-15+ responses in exploration mode
 - Support for advanced use cases (15+ results, 4000+ character chunks)
 - Better coding assistance and analysis
 - Professional-grade RAG experience
 ## Proposed Solution
 ### 1. Enhanced Model Configuration Menu
 Add context window selection alongside model selection with:
 - **Development**: 8K tokens (fast, good for most cases)
 - **Production**: 16K tokens (balanced performance)  
 - **Advanced**: 32K+ tokens (heavy development work)
 ### 2. Educational Content
 Help users understand:
 - Why context window size matters for RAG
 - Hardware implications of larger contexts
 - Optimal settings for their use case
 - Model-specific context capabilities
 ### 3. Consistent Implementation
 - Update all Ollama API calls to use consistent context settings
 - Ensure configuration applies across synthesis, expansion, and exploration
 - Validate context sizes against model capabilities
 - Provide clear error messages for invalid configurations
 ## Technical Implementation
 Based on research findings:
 ### Model Context Capabilities
 - **qwen3:0.6b/1.7b**: 32K token maximum
 - **qwen3:4b**: 131K token maximum (YaRN extended)
 ### Recommended Context Sizes
 ```yaml
 # Conservative (fast, low memory)
 num_ctx: 8192    # ~6MB memory, excellent for exploration
 # Balanced (recommended for most users)  
 num_ctx: 16384   # ~12MB memory, handles complex analysis
 # Advanced (heavy development work)
 num_ctx: 32768   # ~24MB memory, supports large codebases
 ```
 ### Configuration Integration
 - Add context window selection to TUI configuration menu
 - Update config.yaml schema with context parameters
 - Implement validation for model-specific limits
 - Provide migration for existing configurations
 ## Benefits
 1. **Improved User Experience**
   - Longer conversation sessions
   - Better analysis quality
   - Clear performance expectations
 2. **Professional RAG Capability**
   - Support for enterprise-scale projects
   - Handles large codebases effectively
   - Enables advanced use cases
 3. **Educational Value**
   - Users learn about context windows
   - Better understanding of RAG performance
   - Informed decision making
 ## Implementation Plan
 1. **Phase 1**: Research Ollama context handling (✅ Complete)
 2. **Phase 2**: Update configuration system (✅ Complete)
 3. **Phase 3**: Enhance TUI with context selection (✅ Complete)
 4. **Phase 4**: Update all API calls consistently (✅ Complete)
 5. **Phase 5**: Add documentation and validation (✅ Complete)
 ## Implementation Details
 ### Configuration System
 - Added `context_window` and `auto_context` to LLMConfig
 - Default 16K context (vs problematic 2K default)
 - Model-specific validation and limits
 - YAML output includes helpful context explanations
 ### TUI Enhancement
 - New "Configure context window" menu option
 - Educational content about context importance
 - Three presets: Development (8K), Production (16K), Advanced (32K)
 - Custom size entry with validation
 - Memory usage estimates for each option
 ### API Consistency
 - Dynamic context sizing via `_get_optimal_context_size()`
 - Model capability awareness (qwen3:4b = 131K, others = 32K)
 - Applied consistently to synthesizer and explorer
 - Automatic capping at model limits
 ### User Education
 - Clear explanations of why context matters for RAG
 - Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB)
 - Advanced use case guidance (15+ results, 4000+ chunks)
 - Performance vs quality tradeoffs
 ## Answers to Review Questions
 1. ✅ **Auto-detection**: Implemented via `auto_context` flag that respects model limits
 2. ✅ **Model changes**: Dynamic validation against current model capabilities  
 3. ✅ **Scope**: Global configuration with per-model validation
 4. ✅ **Validation**: Comprehensive validation with clear error messages and guidance
 ---
 **This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
--- a/README.md
+++ b/README.md
@ -12,40 +12,19 @@
 ## How It Works
 ```mermaid
-flowchart TD
+graph LR
-    Start([🚀 Start FSS-Mini-RAG]) --> Interface{Choose Interface}
+    Files[📁 Your Code/Documents] --> Index[🔍 Index]
    Index --> Chunks[✂️ Smart Chunks]
    Chunks --> Embeddings[🧠 Semantic Vectors]
    Embeddings --> Database[(💾 Vector DB)]
-    Interface -->|Beginners| TUI[🖥️ Interactive TUI<br/>./rag-tui]
+    Query[❓ user auth] --> Search[🎯 Hybrid Search]
-    Interface -->|Power Users| CLI[⚡ Advanced CLI<br/>./rag-mini <command>]
+    Database --> Search
    Search --> Results[📋 Ranked Results]
-    TUI --> SelectFolder[📁 Select Folder to Index]
+    style Files fill:#e3f2fd
-    CLI --> SelectFolder
+    style Results fill:#e8f5e8
-    
+    style Database fill:#fff3e0
    SelectFolder --> Index[🔍 Index Documents<br/>Creates searchable database]
    Index --> Ready{📚 Ready to Search}
    Ready -->|Quick Answers| Search[🔍 Search Mode<br/>Fast semantic search]
    Ready -->|Deep Analysis| Explore[🧠 Explore Mode<br/>AI-powered analysis]
    Search --> SearchResults[📋 Instant Results<br/>Ranked by relevance]
    Explore --> ExploreResults[💬 AI Conversation<br/>Context + reasoning]
    SearchResults --> More{Want More?}
    ExploreResults --> More
    More -->|Different Query| Ready
    More -->|Advanced Features| CLI
    More -->|Done| End([✅ Success!])
    CLI -.->|Full Power| AdvancedFeatures[⚡ Advanced Features:<br/>• Batch processing<br/>• Custom parameters<br/>• Automation scripts<br/>• Background server]
    style Start fill:#e8f5e8,stroke:#4caf50,stroke-width:2px
    style CLI fill:#fff9c4,stroke:#f57c00,stroke-width:3px
    style AdvancedFeatures fill:#fff9c4,stroke:#f57c00,stroke-width:2px
    style Search fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
    style Explore fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
    style End fill:#e8f5e8,stroke:#4caf50,stroke-width:2px
 ```
 ## What This Is
@ -79,7 +58,6 @@ FSS-Mini-RAG offers **two distinct experiences** optimized for different use cas
 ## Quick Start (2 Minutes)
 **Linux/macOS:**
 ```bash
 # 1. Install everything
 ./install_mini_rag.sh
@ -92,19 +70,6 @@ FSS-Mini-RAG offers **two distinct experiences** optimized for different use cas
 ./rag-mini explore ~/my-project   # Interactive exploration
 ```
 **Windows:**
 ```cmd
 # 1. Install everything
 install_windows.bat
 # 2. Choose your interface
 rag.bat                           # Interactive interface
 # OR choose your mode:
 rag.bat index C:\my-project       # Index your project first
 rag.bat search C:\my-project "query"  # Fast search
 rag.bat explore C:\my-project     # Interactive exploration
 ```
 That's it. No external dependencies, no configuration required, no PhD in computer science needed.
 ## What Makes This Different
@ -154,22 +119,12 @@ That's it. No external dependencies, no configuration required, no PhD in comput
 ## Installation Options
 ### Recommended: Full Installation
 **Linux/macOS:**
 ```bash
 ./install_mini_rag.sh
 # Handles Python setup, dependencies, optional AI models
 ```
 **Windows:**
 ```cmd
 install_windows.bat
 # Handles Python setup, dependencies, works reliably
 ```
 ### Experimental: Copy & Run (May Not Work)
 **Linux/macOS:**
 ```bash
 # Copy folder anywhere and try to run directly
 ./rag-mini index ~/my-project
@ -177,30 +132,13 @@ install_windows.bat
 # Falls back with clear instructions if it fails
 ```
 **Windows:**
 ```cmd
 # Copy folder anywhere and try to run directly
 rag.bat index C:\my-project
 # Auto-setup will attempt to create environment
 # Falls back with clear instructions if it fails
 ```
 ### Manual Setup
 **Linux/macOS:**
 ```bash
 python3 -m venv .venv
 source .venv/bin/activate
 pip install -r requirements.txt
 ```
 **Windows:**
 ```cmd
 python -m venv .venv
 .venv\Scripts\activate.bat
 pip install -r requirements.txt
 ```
 **Note**: The experimental copy & run feature is provided for convenience but may fail on some systems. If you encounter issues, use the full installer for reliable setup.
 ## System Requirements
@ -228,7 +166,7 @@ This implementation prioritizes:
 ## Next Steps
- **New users**: Run `./rag-mini` (Linux/macOS) or `rag.bat` (Windows) for guided experience
+- **New users**: Run `./rag-mini` for guided experience
 - **Developers**: Read [`TECHNICAL_GUIDE.md`](docs/TECHNICAL_GUIDE.md) for implementation details
 - **Contributors**: See [`CONTRIBUTING.md`](CONTRIBUTING.md) for development setup
--- a/commit_message.txt
+++ b/commit_message.txt
@ -1,36 +0,0 @@
 feat: Add comprehensive Windows compatibility and enhanced LLM model setup
 🚀 Major cross-platform enhancement making FSS-Mini-RAG fully Windows and Linux compatible
 ## Windows Compatibility
 - **New Windows installer**: `install_windows.bat` - rock-solid, no-hang installation
 - **Simple Windows launcher**: `rag.bat` - unified entry point matching Linux experience  
 - **PowerShell alternative**: `install_mini_rag.ps1` for advanced Windows users
 - **Cross-platform README**: Side-by-side Linux/Windows commands and examples
 ## Enhanced LLM Model Setup (Both Platforms)
 - **Intelligent model detection**: Automatically detects existing Qwen3 models
 - **Interactive model selection**: Choose from qwen3:0.6b, 1.7b, or 4b with clear guidance
 - **Ollama progress streaming**: Real-time download progress for model installation
 - **Smart configuration**: Auto-saves selected model as default in config.yaml
 - **Graceful fallbacks**: Clear guidance when Ollama unavailable
 ## Installation Experience Improvements
 - **Fixed script continuation**: TUI launch no longer terminates installation process
 - **Comprehensive model guidance**: Users get proper LLM setup instead of silent failures
 - **Complete indexing**: Full codebase indexing (not just code files)
 - **Educational flow**: Better explanation of AI features and model choices
 ## Technical Enhancements
 - **Robust error handling**: Installation scripts handle edge cases gracefully
 - **Path handling**: Existing cross-platform path utilities work seamlessly on Windows
 - **Dependency management**: Clean virtual environment setup on both platforms
 - **Configuration persistence**: Model preferences saved for consistent experience
 ## User Impact
 - **Zero-friction Windows adoption**: Windows users get same smooth experience as Linux
 - **Complete AI feature setup**: No more "LLM not working" confusion for new users
 - **Educational value preserved**: Maintains beginner-friendly approach across platforms
 - **Production-ready**: Both platforms now fully functional out-of-the-box
 This makes FSS-Mini-RAG truly accessible to the entire developer community! 🎉
--- a/docs/BEGINNER_GLOSSARY.md
+++ b/docs/BEGINNER_GLOSSARY.md
@ -117,7 +117,7 @@ def login_user(email, password):
 **Models you might see:**
 - **qwen3:0.6b** - Ultra-fast, good for most questions
- **qwen3:4b** - Slower but more detailed
+- **llama3.2** - Slower but more detailed
 - **auto** - Picks the best available model
 ---
--- a/docs/CPU_DEPLOYMENT.md
+++ b/docs/CPU_DEPLOYMENT.md
@ -49,7 +49,7 @@ ollama run qwen3:0.6b "Hello, can you expand this query: authentication"
 |-------|------|-----------|---------|
 | qwen3:0.6b | 522MB | Fast ⚡ | Excellent ✅ |
 | qwen3:1.7b | 1.4GB | Medium | Excellent ✅ |
-| qwen3:4b | 2.5GB | Slow | Excellent ✅ |
+| qwen3:3b | 2.0GB | Slow | Excellent ✅ |
 ## CPU-Optimized Configuration
--- a/docs/LLM_PROVIDERS.md
+++ b/docs/LLM_PROVIDERS.md
@ -22,8 +22,8 @@ This guide shows how to configure FSS-Mini-RAG with different LLM providers for
 llm:
  provider: ollama
  ollama_host: localhost:11434
-  synthesis_model: qwen3:1.7b
+  synthesis_model: llama3.2
-  expansion_model: qwen3:1.7b
+  expansion_model: llama3.2
  enable_synthesis: false
  synthesis_temperature: 0.3
  cpu_optimized: true
@ -33,13 +33,13 @@ llm:
 **Setup:**
 1. Install Ollama: `curl -fsSL https://ollama.ai/install.sh | sh`
 2. Start service: `ollama serve`
-3. Download model: `ollama pull qwen3:1.7b`
+3. Download model: `ollama pull llama3.2`
 4. Test: `./rag-mini search /path/to/project "test" --synthesize`
 **Recommended Models:**
 - `qwen3:0.6b` - Ultra-fast, good for CPU-only systems
- `qwen3:1.7b` - Balanced quality and speed (recommended)
+- `llama3.2` - Balanced quality and speed  
- `qwen3:4b` - Higher quality, excellent for most use cases
+- `llama3.1:8b` - Higher quality, needs more RAM
 ### LM Studio
--- a/docs/QUERY_EXPANSION.md
+++ b/docs/QUERY_EXPANSION.md
@ -34,24 +34,7 @@ graph LR
 ## Configuration
-### Easy Configuration (TUI)
+Edit `config.yaml`:
 Use the interactive Configuration Manager in the TUI:
 1. **Start TUI**: `./rag-tui` or `rag.bat` (Windows)
 2. **Select Option 6**: Configuration Manager
 3. **Choose Option 2**: Toggle query expansion
 4. **Follow prompts**: Get explanation and easy on/off toggle
 The TUI will:
 - Explain benefits and requirements clearly
 - Check if Ollama is available
 - Show current status (enabled/disabled)
 - Save changes automatically
 ### Manual Configuration (Advanced)
 Edit `config.yaml` directly:
 ```yaml
 # Search behavior settings
--- a/docs/TROUBLESHOOTING.md
+++ b/docs/TROUBLESHOOTING.md
@ -143,8 +143,8 @@ python3 -c "import mini_rag; print('✅ Installation successful')"
 2. **Install a model:**
   ```bash
-   ollama pull qwen2.5:3b    # Good balance of speed and quality
+   ollama pull qwen3:0.6b    # Fast, small model
-   # Or: ollama pull qwen3:4b   # Larger but better quality
+   # Or: ollama pull llama3.2  # Larger but better
   ```
 3. **Test connection:**
--- a/docs/TUI_GUIDE.md
+++ b/docs/TUI_GUIDE.md
@ -23,9 +23,8 @@ That's it! The TUI will guide you through everything.
 ### User Flow
 1. **Select Project** → Choose directory to search
 2. **Index Project** → Process files for search
-3. **Search Content** → Find what you need quickly
+3. **Search Content** → Find what you need
-4. **Explore Project** → Interactive AI-powered discovery (NEW!)
+4. **Explore Results** → See full context and files
 5. **Configure System** → Customize search behavior
 ## Main Menu Options
@ -111,63 +110,7 @@ That's it! The TUI will guide you through everything.
 ./rag-mini-enhanced context /path/to/project "login()"
 ```
-### 4. Explore Project (NEW!)
+### 4. View Status
 **Purpose**: Interactive AI-powered discovery with conversation memory
 **What Makes Explore Different**:
 - **Conversational**: Ask follow-up questions that build on previous answers
 - **AI Reasoning**: Uses thinking mode for deeper analysis and explanations
 - **Educational**: Perfect for understanding unfamiliar codebases
 - **Context Aware**: Remembers what you've already discussed
 **Interactive Process**:
 1. **First Question Guidance**: Clear prompts with example questions
 2. **Starter Suggestions**: Random helpful questions to get you going
 3. **Natural Follow-ups**: Ask "why?", "how?", "show me more" naturally
 4. **Session Memory**: AI remembers your conversation context
 **Explore Mode Features**:
 **Quick Start Options**:
 - **Option 1 - Help**: Show example questions and explore mode capabilities
 - **Option 2 - Status**: Project information and current exploration session
 - **Option 3 - Suggest**: Get a random starter question picked from 7 curated examples
 **Starter Questions** (randomly suggested):
 - "What are the main components of this project?"
 - "How is error handling implemented?"
 - "Show me the authentication and security logic"
 - "What are the key functions I should understand first?"
 - "How does data flow through this system?"
 - "What configuration options are available?"
 - "Show me the most important files to understand"
 **Advanced Usage**:
 - **Deep Questions**: "Why is this function slow?" "How does the security work?"
 - **Code Analysis**: "Explain this algorithm" "What could go wrong here?"
 - **Architecture**: "How do these components interact?" "What's the design pattern?"
 - **Best Practices**: "Is this code following best practices?" "How would you improve this?"
 **What You Learn**:
 - **Conversational AI**: How to have productive technical conversations with AI
 - **Code Understanding**: Deep analysis capabilities beyond simple search
 - **Context Building**: How conversation memory improves over time
 - **Question Techniques**: Effective ways to explore unfamiliar code
 **CLI Commands Shown**:
 ```bash
 ./rag-mini explore /path/to/project    # Start interactive exploration
 ```
 **Perfect For**:
 - Understanding new codebases
 - Code review and analysis
 - Learning from existing projects
 - Documenting complex systems
 - Onboarding new team members
 ### 5. View Status
 **Purpose**: Check system health and project information
@ -196,61 +139,32 @@ That's it! The TUI will guide you through everything.
 ./rag-mini status /path/to/project
 ```
-### 6. Configuration Manager (ENHANCED!)
+### 5. Configuration
-**Purpose**: Interactive configuration with user-friendly options
+**Purpose**: View and understand system settings
-**New Interactive Features**:
+**Configuration Display**:
- **Live Configuration Dashboard** - See current settings with clear status
+- **Current settings** - Chunk size, strategy, file patterns
- **Quick Configuration Options** - Change common settings without YAML editing
+- **File location** - Where config is stored
- **Guided Setup** - Explanations and presets for each option
+- **Setting explanations** - What each option does
- **Validation** - Input checking and helpful error messages
+- **Quick actions** - View or edit config directly
-**Main Configuration Options**:
+**Key Settings Explained**:
 - **chunking.max_size** - How large each searchable piece is
 - **chunking.strategy** - Smart (semantic) vs simple (fixed size)
 - **files.exclude_patterns** - Skip certain files/directories
 - **embedding.preferred_method** - AI model preference
 - **search.default_top_k** - How many results to show
-**1. Adjust Chunk Size**:
+**Interactive Options**:
- **Presets**: Small (1000), Medium (2000), Large (3000), or custom
+- **[V]iew config** - See full configuration file
- **Guidance**: Performance vs accuracy explanations
+- **[E]dit path** - Get command to edit configuration
 - **Smart Validation**: Range checking and recommendations
 **2. Toggle Query Expansion**:
 - **Educational Info**: Clear explanation of benefits and requirements  
 - **Easy Toggle**: Simple on/off with confirmation
 - **System Check**: Verifies Ollama availability for AI features
 **3. Configure Search Behavior**:
 - **Result Count**: Adjust default number of search results (1-100)
 - **BM25 Toggle**: Enable/disable keyword matching boost
 - **Similarity Threshold**: Fine-tune match sensitivity (0.0-1.0)
 **4. View/Edit Configuration File**:
 - **Full File Viewer**: Display complete config with syntax highlighting
 - **Editor Instructions**: Commands for nano, vim, VS Code
 - **YAML Help**: Format explanation and editing tips
 **5. Reset to Defaults**:
 - **Safe Reset**: Confirmation before resetting all settings
 - **Clear Explanations**: Shows what defaults will be restored
 - **Backup Reminder**: Suggests saving current config first
 **6. Advanced Settings**:
 - **File Filtering**: Min file size, exclude patterns (view only)
 - **Performance Settings**: Batch sizes, streaming thresholds
 - **LLM Preferences**: Model rankings and selection priorities
 **Key Settings Dashboard**:
 - 📁 **Chunk size**: 2000 characters (with emoji indicators)
 - 🧠 **Chunking strategy**: semantic
 - 🔍 **Search results**: 10 results
 - 📊 **Embedding method**: ollama
 - 🚀 **Query expansion**: enabled/disabled
 - ⚡ **LLM synthesis**: enabled/disabled
 **What You Learn**:
- **Configuration Impact**: How settings affect search quality and speed
+- How configuration affects search quality
- **Interactive YAML**: Easier than manual editing for beginners
+- YAML configuration format
- **Best Practices**: Recommended settings for different project types
+- Which settings to adjust for different projects
- **System Understanding**: How all components work together
+- Where to find advanced options
 **CLI Commands Shown**:
 ```bash
@ -258,13 +172,7 @@ cat /path/to/project/.mini-rag/config.yaml   # View config
 nano /path/to/project/.mini-rag/config.yaml  # Edit config
 ```
-**Perfect For**:
+### 6. CLI Command Reference
 - Beginners who find YAML intimidating
 - Quick adjustments without memorizing syntax
 - Understanding what each setting actually does
 - Safe experimentation with guided validation
 ### 7. CLI Command Reference
 **Purpose**: Complete command reference for transitioning to CLI
--- a/examples/config-llm-providers.yaml
+++ b/examples/config-llm-providers.yaml
@ -68,9 +68,9 @@ search:
 llm:
  provider: ollama                    # Use local Ollama
  ollama_host: localhost:11434        # Default Ollama location
-  synthesis_model: qwen3:1.7b         # Good all-around model
+  synthesis_model: llama3.2           # Good all-around model
-  # alternatives: qwen3:0.6b (faster), qwen2.5:3b (balanced), qwen3:4b (quality)
+  # alternatives: qwen3:0.6b (faster), llama3.2:3b (balanced), llama3.1:8b (quality)
-  expansion_model: qwen3:1.7b
+  expansion_model: llama3.2
  enable_synthesis: false
  synthesis_temperature: 0.3
  cpu_optimized: true
--- a/examples/config-quality.yaml
+++ b/examples/config-quality.yaml
@ -102,7 +102,7 @@ llm:
 # For even better results, try these model combinations:
 # • ollama pull nomic-embed-text:latest  (best embeddings)
 # • ollama pull qwen3:1.7b              (good general model)
-# • ollama pull qwen3:4b                (excellent for analysis)
+# • ollama pull llama3.2                (excellent for analysis)
 # 
 # Or adjust these settings for your specific needs:
 # • similarity_threshold: 0.3   (more selective results)
--- a/examples/config.yaml
+++ b/examples/config.yaml
@ -112,7 +112,7 @@ llm:
  synthesis_model: auto           # Which AI model to use for explanations
                                  # 'auto': Picks best available model - RECOMMENDED
                                  # 'qwen3:0.6b': Ultra-fast, good for CPU-only computers
-                                  # 'qwen3:4b': Slower but more detailed explanations
+                                  # 'llama3.2': Slower but more detailed explanations
  expansion_model: auto           # Model for query expansion (usually same as synthesis)
--- a/install_mini_rag.ps1
+++ b/install_mini_rag.ps1
@ -1,458 +0,0 @@
 # FSS-Mini-RAG PowerShell Installation Script
 # Interactive installer that sets up Python environment and dependencies
 # Enable advanced features
 $ErrorActionPreference = "Stop"
 # Color functions for better output
 function Write-ColorOutput($message, $color = "White") {
    Write-Host $message -ForegroundColor $color
 }
 function Write-Header($message) {
    Write-Host "`n" -NoNewline
    Write-ColorOutput "=== $message ===" "Cyan"
 }
 function Write-Success($message) {
    Write-ColorOutput "✅ $message" "Green"
 }
 function Write-Warning($message) {
    Write-ColorOutput "⚠️  $message" "Yellow"
 }
 function Write-Error($message) {
    Write-ColorOutput "❌ $message" "Red"
 }
 function Write-Info($message) {
    Write-ColorOutput "ℹ️  $message" "Blue"
 }
 # Get script directory
 $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 # Main installation function
 function Main {
    Write-Host ""
    Write-ColorOutput "╔══════════════════════════════════════╗" "Cyan"
    Write-ColorOutput "║        FSS-Mini-RAG Installer        ║" "Cyan"
    Write-ColorOutput "║   Fast Semantic Search for Code      ║" "Cyan" 
    Write-ColorOutput "╚══════════════════════════════════════╝" "Cyan"
    Write-Host ""
    Write-Info "PowerShell installation process:"
    Write-Host "  • Python environment setup"
    Write-Host "  • Smart configuration based on your system"
    Write-Host "  • Optional AI model downloads (with consent)"
    Write-Host "  • Testing and verification"
    Write-Host ""
    Write-ColorOutput "Note: You'll be asked before downloading any models" "Cyan"
    Write-Host ""
    $continue = Read-Host "Begin installation? [Y/n]"
    if ($continue -eq "n" -or $continue -eq "N") {
        Write-Host "Installation cancelled."
        exit 0
    }
    # Run installation steps
    Check-Python
    Create-VirtualEnvironment
    # Check Ollama availability
    $ollamaAvailable = Check-Ollama
    # Get installation preferences
    Get-InstallationPreferences $ollamaAvailable
    # Install dependencies
    Install-Dependencies
    # Setup models if available
    if ($ollamaAvailable) {
        Setup-OllamaModel
    }
    # Test installation
    if (Test-Installation) {
        Show-Completion
    } else {
        Write-Error "Installation test failed"
        Write-Host "Please check error messages and try again."
        exit 1
    }
 }
 function Check-Python {
    Write-Header "Checking Python Installation"
    # Try different Python commands
    $pythonCmd = $null
    $pythonVersion = $null
    foreach ($cmd in @("python", "python3", "py")) {
        try {
            $version = & $cmd --version 2>&1
            if ($LASTEXITCODE -eq 0) {
                $pythonCmd = $cmd
                $pythonVersion = ($version -split " ")[1]
                break
            }
        } catch {
            continue
        }
    }
    if (-not $pythonCmd) {
        Write-Error "Python not found!"
        Write-Host ""
        Write-ColorOutput "Please install Python 3.8+ from:" "Yellow"
        Write-Host "  • https://python.org/downloads"
        Write-Host "  • Make sure to check 'Add Python to PATH' during installation"
        Write-Host ""
        Write-ColorOutput "After installing Python, run this script again." "Cyan"
        exit 1
    }
    # Check version
    $versionParts = $pythonVersion -split "\."
    $major = [int]$versionParts[0]
    $minor = [int]$versionParts[1]
    if ($major -lt 3 -or ($major -eq 3 -and $minor -lt 8)) {
        Write-Error "Python $pythonVersion found, but 3.8+ required"
        Write-Host "Please upgrade Python to 3.8 or higher."
        exit 1
    }
    Write-Success "Found Python $pythonVersion ($pythonCmd)"
    $script:PythonCmd = $pythonCmd
 }
 function Create-VirtualEnvironment {
    Write-Header "Creating Python Virtual Environment"
    $venvPath = Join-Path $ScriptDir ".venv"
    if (Test-Path $venvPath) {
        Write-Info "Virtual environment already exists at $venvPath"
        $recreate = Read-Host "Recreate it? (y/N)"
        if ($recreate -eq "y" -or $recreate -eq "Y") {
            Write-Info "Removing existing virtual environment..."
            Remove-Item -Recurse -Force $venvPath
        } else {
            Write-Success "Using existing virtual environment"
            return
        }
    }
    Write-Info "Creating virtual environment at $venvPath"
    try {
        & $script:PythonCmd -m venv $venvPath
        if ($LASTEXITCODE -ne 0) {
            throw "Virtual environment creation failed"
        }
        Write-Success "Virtual environment created"
    } catch {
        Write-Error "Failed to create virtual environment"
        Write-Host "This might be because python venv module is not available."
        Write-Host "Try installing Python from python.org with full installation."
        exit 1
    }
    # Activate virtual environment and upgrade pip
    $activateScript = Join-Path $venvPath "Scripts\Activate.ps1"
    if (Test-Path $activateScript) {
        & $activateScript
        Write-Success "Virtual environment activated"
        Write-Info "Upgrading pip..."
        try {
            & python -m pip install --upgrade pip --quiet
        } catch {
            Write-Warning "Could not upgrade pip, continuing anyway..."
        }
    }
 }
 function Check-Ollama {
    Write-Header "Checking Ollama (AI Model Server)"
    try {
        $response = Invoke-WebRequest -Uri "http://localhost:11434/api/version" -TimeoutSec 5 -ErrorAction SilentlyContinue
        if ($response.StatusCode -eq 200) {
            Write-Success "Ollama server is running"
            return $true
        }
    } catch {
        # Ollama not running, check if installed
    }
    try {
        & ollama version 2>$null
        if ($LASTEXITCODE -eq 0) {
            Write-Warning "Ollama is installed but not running"
            $startOllama = Read-Host "Start Ollama now? (Y/n)"
            if ($startOllama -ne "n" -and $startOllama -ne "N") {
                Write-Info "Starting Ollama server..."
                Start-Process -FilePath "ollama" -ArgumentList "serve" -WindowStyle Hidden
                Start-Sleep -Seconds 3
                try {
                    $response = Invoke-WebRequest -Uri "http://localhost:11434/api/version" -TimeoutSec 5 -ErrorAction SilentlyContinue
                    if ($response.StatusCode -eq 200) {
                        Write-Success "Ollama server started"
                        return $true
                    }
                } catch {
                    Write-Warning "Failed to start Ollama automatically"
                    Write-Host "Please start Ollama manually: ollama serve"
                    return $false
                }
            }
            return $false
        }
    } catch {
        # Ollama not installed
    }
    Write-Warning "Ollama not found"
    Write-Host ""
    Write-ColorOutput "Ollama provides the best embedding quality and performance." "Cyan"
    Write-Host ""
    Write-ColorOutput "Options:" "White"
    Write-ColorOutput "1) Install Ollama automatically" "Green" -NoNewline
    Write-Host " (recommended)"
    Write-ColorOutput "2) Manual installation" "Yellow" -NoNewline
    Write-Host " - Visit https://ollama.com/download"
    Write-ColorOutput "3) Continue without Ollama" "Blue" -NoNewline
    Write-Host " (uses ML fallback)"
    Write-Host ""
    $choice = Read-Host "Choose [1/2/3]"
    switch ($choice) {
        "1" {
            Write-Info "Opening Ollama download page..."
            Start-Process "https://ollama.com/download"
            Write-Host ""
            Write-ColorOutput "Please:" "Yellow"
            Write-Host "  1. Download and install Ollama from the opened page"
            Write-Host "  2. Run 'ollama serve' in a new terminal"
            Write-Host "  3. Re-run this installer"
            Write-Host ""
            Read-Host "Press Enter to exit"
            exit 0
        }
        "2" {
            Write-Host ""
            Write-ColorOutput "Manual Ollama installation:" "Yellow"
            Write-Host "  1. Visit: https://ollama.com/download"
            Write-Host "  2. Download and install for Windows"
            Write-Host "  3. Run: ollama serve"
            Write-Host "  4. Re-run this installer"
            Read-Host "Press Enter to exit"
            exit 0
        }
        "3" {
            Write-Info "Continuing without Ollama (will use ML fallback)"
            return $false
        }
        default {
            Write-Warning "Invalid choice, continuing without Ollama"
            return $false
        }
    }
 }
 function Get-InstallationPreferences($ollamaAvailable) {
    Write-Header "Installation Configuration"
    Write-ColorOutput "FSS-Mini-RAG can run with different embedding backends:" "Cyan"
    Write-Host ""
    Write-ColorOutput "• Ollama" "Green" -NoNewline
    Write-Host " (recommended) - Best quality, local AI server"
    Write-ColorOutput "• ML Fallback" "Yellow" -NoNewline
    Write-Host " - Offline transformers, larger but always works"
    Write-ColorOutput "• Hash-based" "Blue" -NoNewline
    Write-Host " - Lightweight fallback, basic similarity"
    Write-Host ""
    if ($ollamaAvailable) {
        $recommended = "light (Ollama detected)"
        Write-ColorOutput "✓ Ollama detected - light installation recommended" "Green"
    } else {
        $recommended = "full (no Ollama)"
        Write-ColorOutput "⚠ No Ollama - full installation recommended for better quality" "Yellow"
    }
    Write-Host ""
    Write-ColorOutput "Installation options:" "White"
    Write-ColorOutput "L) Light" "Green" -NoNewline
    Write-Host " - Ollama + basic deps (~50MB) " -NoNewline
    Write-ColorOutput "← Best performance + AI chat" "Cyan"
    Write-ColorOutput "F) Full" "Yellow" -NoNewline
    Write-Host "  - Light + ML fallback (~2-3GB) " -NoNewline
    Write-ColorOutput "← Works without Ollama" "Cyan"
    Write-Host ""
    $choice = Read-Host "Choose [L/F] or Enter for recommended ($recommended)"
    if ($choice -eq "") {
        if ($ollamaAvailable) {
            $choice = "L"
        } else {
            $choice = "F"
        }
    }
    switch ($choice.ToUpper()) {
        "L" {
            $script:InstallType = "light"
            Write-ColorOutput "Selected: Light installation" "Green"
        }
        "F" {
            $script:InstallType = "full"
            Write-ColorOutput "Selected: Full installation" "Yellow"
        }
        default {
            Write-Warning "Invalid choice, using light installation"
            $script:InstallType = "light"
        }
    }
 }
 function Install-Dependencies {
    Write-Header "Installing Python Dependencies"
    if ($script:InstallType -eq "light") {
        Write-Info "Installing core dependencies (~50MB)..."
        Write-ColorOutput "  Installing: lancedb, pandas, numpy, PyYAML, etc." "Blue"
        try {
            & pip install -r (Join-Path $ScriptDir "requirements.txt") --quiet
            if ($LASTEXITCODE -ne 0) {
                throw "Dependency installation failed"
            }
            Write-Success "Dependencies installed"
        } catch {
            Write-Error "Failed to install dependencies"
            Write-Host "Try: pip install -r requirements.txt"
            exit 1
        }
    } else {
        Write-Info "Installing full dependencies (~2-3GB)..."
        Write-ColorOutput "This includes PyTorch and transformers - will take several minutes" "Yellow"
        try {
            & pip install -r (Join-Path $ScriptDir "requirements-full.txt")
            if ($LASTEXITCODE -ne 0) {
                throw "Dependency installation failed"
            }
            Write-Success "All dependencies installed"
        } catch {
            Write-Error "Failed to install dependencies"
            Write-Host "Try: pip install -r requirements-full.txt"
            exit 1
        }
    }
    Write-Info "Verifying installation..."
    try {
        & python -c "import lancedb, pandas, numpy" 2>$null
        if ($LASTEXITCODE -ne 0) {
            throw "Package verification failed"
        }
        Write-Success "Core packages verified"
    } catch {
        Write-Error "Package verification failed"
        exit 1
    }
 }
 function Setup-OllamaModel {
    # Implementation similar to bash version but adapted for PowerShell
    Write-Header "Ollama Model Setup"
    # For brevity, implementing basic version
    Write-Info "Ollama model setup available - see bash version for full implementation"
 }
 function Test-Installation {
    Write-Header "Testing Installation"
    Write-Info "Testing basic functionality..."
    try {
        & python -c "from mini_rag import CodeEmbedder, ProjectIndexer, CodeSearcher; print('✅ Import successful')" 2>$null
        if ($LASTEXITCODE -ne 0) {
            throw "Import test failed"
        }
        Write-Success "Python imports working"
        return $true
    } catch {
        Write-Error "Import test failed"
        return $false
    }
 }
 function Show-Completion {
    Write-Header "Installation Complete!"
    Write-ColorOutput "FSS-Mini-RAG is now installed!" "Green"
    Write-Host ""
    Write-ColorOutput "Quick Start Options:" "Cyan"
    Write-Host ""
    Write-ColorOutput "🎯 TUI (Beginner-Friendly):" "Green"
    Write-Host "     rag-tui.bat"
    Write-Host "     # Interactive interface with guided setup"
    Write-Host ""
    Write-ColorOutput "💻 CLI (Advanced):" "Blue"
    Write-Host "     rag-mini.bat index C:\path\to\project"
    Write-Host "     rag-mini.bat search C:\path\to\project `"query`""
    Write-Host "     rag-mini.bat status C:\path\to\project"
    Write-Host ""
    Write-ColorOutput "Documentation:" "Cyan"
    Write-Host "  • README.md - Complete technical documentation"
    Write-Host "  • docs\GETTING_STARTED.md - Step-by-step guide"
    Write-Host "  • examples\ - Usage examples and sample configs"
    Write-Host ""
    $runTest = Read-Host "Run quick test now? [Y/n]"
    if ($runTest -ne "n" -and $runTest -ne "N") {
        Run-QuickTest
    }
    Write-Host ""
    Write-ColorOutput "🎉 Setup complete! FSS-Mini-RAG is ready to use." "Green"
 }
 function Run-QuickTest {
    Write-Header "Quick Test"
    Write-Info "Testing with FSS-Mini-RAG codebase..."
    $ragDir = Join-Path $ScriptDir ".mini-rag"
    if (Test-Path $ragDir) {
        Write-Success "Project already indexed, running search..."
    } else {
        Write-Info "Indexing FSS-Mini-RAG system for demo..."
        & python (Join-Path $ScriptDir "rag-mini.py") index $ScriptDir
        if ($LASTEXITCODE -ne 0) {
            Write-Error "Test indexing failed"
            return
        }
    }
    Write-Host ""
    Write-Success "Running demo search: 'embedding system'"
    & python (Join-Path $ScriptDir "rag-mini.py") search $ScriptDir "embedding system" --top-k 3
    Write-Host ""
    Write-Success "Test completed successfully!"
    Write-ColorOutput "FSS-Mini-RAG is working perfectly on Windows!" "Cyan"
 }
 # Run main function
 Main
--- a/install_mini_rag.sh
+++ b/install_mini_rag.sh
@ -462,73 +462,6 @@ install_dependencies() {
    fi
 }
 # Setup application icon for desktop integration
 setup_desktop_icon() {
    print_header "Setting Up Desktop Integration"
    # Check if we're in a GUI environment
    if [ -z "$DISPLAY" ] && [ -z "$WAYLAND_DISPLAY" ]; then
        print_info "No GUI environment detected - skipping desktop integration"
        return 0
    fi
    local icon_source="$SCRIPT_DIR/assets/Fss_Mini_Rag.png"
    local desktop_dir="$HOME/.local/share/applications"
    local icon_dir="$HOME/.local/share/icons"
    # Check if icon file exists
    if [ ! -f "$icon_source" ]; then
        print_warning "Icon file not found at $icon_source"
        return 1
    fi
    # Create directories if needed
    mkdir -p "$desktop_dir" "$icon_dir" 2>/dev/null
    # Copy icon to standard location
    local icon_dest="$icon_dir/fss-mini-rag.png"
    if cp "$icon_source" "$icon_dest" 2>/dev/null; then
        print_success "Icon installed to $icon_dest"
    else
        print_warning "Could not install icon (permissions?)"
        return 1
    fi
    # Create desktop entry
    local desktop_file="$desktop_dir/fss-mini-rag.desktop"
    cat > "$desktop_file" << EOF
 [Desktop Entry]
 Name=FSS-Mini-RAG
 Comment=Fast Semantic Search for Code and Documents
 Exec=$SCRIPT_DIR/rag-tui
 Icon=fss-mini-rag
 Terminal=true
 Type=Application
 Categories=Development;Utility;TextEditor;
 Keywords=search;code;rag;semantic;ai;
 StartupNotify=true
 EOF
    if [ -f "$desktop_file" ]; then
        chmod +x "$desktop_file"
        print_success "Desktop entry created"
        # Update desktop database if available
        if command_exists update-desktop-database; then
            update-desktop-database "$desktop_dir" 2>/dev/null
            print_info "Desktop database updated"
        fi
        print_info "✨ FSS-Mini-RAG should now appear in your application menu!"
        print_info "   Look for it in Development or Utility categories"
    else
        print_warning "Could not create desktop entry"
        return 1
    fi
    return 0
 }
 # Setup ML models based on configuration  
 setup_ml_models() {
    if [ "$INSTALL_TYPE" != "full" ]; then
@ -772,7 +705,7 @@ run_quick_test() {
        read -r
        # Launch the TUI which has the existing interactive tutorial system
-        ./rag-tui.py "$target_dir" || true
+        ./rag-tui.py "$target_dir"
        echo ""
        print_success "🎉 Tutorial completed!"
@ -861,9 +794,6 @@ main() {
    fi
    setup_ml_models
    # Setup desktop integration with icon
    setup_desktop_icon
    if test_installation; then
        show_completion
    else
--- a/install_windows.bat
+++ b/install_windows.bat
@ -1,343 +0,0 @@
@echo off
 REM FSS-Mini-RAG Windows Installer - Beautiful & Comprehensive
 setlocal enabledelayedexpansion
 REM Enable colors and unicode for modern Windows
 chcp 65001 >nul 2>&1
 echo.
 echo ╔══════════════════════════════════════════════════╗
 echo ║            FSS-Mini-RAG Windows Installer       ║
 echo ║         Fast Semantic Search for Code           ║
 echo ╚══════════════════════════════════════════════════╝
 echo.
 echo 🚀 Comprehensive installation process:
 echo   • Python environment setup and validation
 echo   • Smart dependency management 
 echo   • Optional AI model downloads (with your consent)
 echo   • System testing and verification
 echo   • Interactive tutorial (optional)
 echo.
 echo 💡 Note: You'll be asked before downloading any models
 echo.
 set /p "continue=Begin installation? [Y/n]: "
 if /i "!continue!"=="n" (
    echo Installation cancelled.
    pause
    exit /b 0
 )
 REM Get script directory
 set "SCRIPT_DIR=%~dp0"
 set "SCRIPT_DIR=%SCRIPT_DIR:~0,-1%"
 echo.
 echo ══════════════════════════════════════════════════
 echo [1/5] Checking Python Environment...
 python --version >nul 2>&1
 if errorlevel 1 (
    echo ❌ ERROR: Python not found!
    echo.
    echo 📦 Please install Python from: https://python.org/downloads
    echo 🔧 Installation requirements:
    echo    • Python 3.8 or higher
    echo    • Make sure to check "Add Python to PATH" during installation
    echo    • Restart your command prompt after installation
    echo.
    echo 💡 Quick install options:
    echo    • Download from python.org (recommended)
    echo    • Or use: winget install Python.Python.3.11
    echo    • Or use: choco install python311
    echo.
    pause
    exit /b 1
 )
 for /f "tokens=2" %%i in ('python --version 2^>^&1') do set "PYTHON_VERSION=%%i"
 echo ✅ Found Python !PYTHON_VERSION!
 REM Check Python version (basic check for 3.x)
 for /f "tokens=1 delims=." %%a in ("!PYTHON_VERSION!") do set "MAJOR_VERSION=%%a"
 if !MAJOR_VERSION! LSS 3 (
    echo ❌ ERROR: Python !PYTHON_VERSION! found, but Python 3.8+ required
    echo 📦 Please upgrade Python to 3.8 or higher
    pause
    exit /b 1
 )
 echo.
 echo ══════════════════════════════════════════════════
 echo [2/5] Creating Python Virtual Environment...
 if exist "%SCRIPT_DIR%\.venv" (
    echo 🔄 Removing old virtual environment...
    rmdir /s /q "%SCRIPT_DIR%\.venv" 2>nul
    if exist "%SCRIPT_DIR%\.venv" (
        echo ⚠️ Could not remove old environment, creating anyway...
    )
 )
 echo 📁 Creating fresh virtual environment...
 python -m venv "%SCRIPT_DIR%\.venv"
 if errorlevel 1 (
    echo ❌ ERROR: Failed to create virtual environment
    echo.
    echo 🔧 This might be because:
    echo    • Python venv module is not installed
    echo    • Insufficient permissions
    echo    • Path contains special characters
    echo.
    echo 💡 Try: python -m pip install --user virtualenv
    pause
    exit /b 1
 )
 echo ✅ Virtual environment created successfully
 echo.
 echo ══════════════════════════════════════════════════
 echo [3/5] Installing Python Dependencies...
 echo 📦 This may take 2-3 minutes depending on your internet speed...
 echo.
 call "%SCRIPT_DIR%\.venv\Scripts\activate.bat"
 if errorlevel 1 (
    echo ❌ ERROR: Could not activate virtual environment
    pause
    exit /b 1
 )
 echo 🔧 Upgrading pip...
 "%SCRIPT_DIR%\.venv\Scripts\python.exe" -m pip install --upgrade pip --quiet
 if errorlevel 1 (
    echo ⚠️ Warning: Could not upgrade pip, continuing anyway...
 )
 echo 📚 Installing core dependencies (lancedb, pandas, numpy, etc.)...
 echo    This provides semantic search capabilities
 "%SCRIPT_DIR%\.venv\Scripts\pip.exe" install -r "%SCRIPT_DIR%\requirements.txt"
 if errorlevel 1 (
    echo ❌ ERROR: Failed to install dependencies
    echo.
    echo 🔧 Possible solutions:
    echo    • Check internet connection
    echo    • Try running as administrator
    echo    • Check if antivirus is blocking pip
    echo    • Manually run: pip install -r requirements.txt
    echo.
    pause
    exit /b 1
 )
 echo ✅ Dependencies installed successfully
 echo.
 echo ══════════════════════════════════════════════════
 echo [4/5] Testing Installation...
 echo 🧪 Verifying Python imports...
 "%SCRIPT_DIR%\.venv\Scripts\python.exe" -c "from mini_rag import CodeEmbedder, ProjectIndexer, CodeSearcher; print('✅ Core imports successful')" 2>nul
 if errorlevel 1 (
    echo ❌ ERROR: Installation test failed
    echo.
    echo 🔧 This usually means:
    echo    • Dependencies didn't install correctly
    echo    • Virtual environment is corrupted
    echo    • Python path issues
    echo.
    echo 💡 Try running: pip install -r requirements.txt
    pause
    exit /b 1
 )
 echo 🔍 Testing embedding system...
 "%SCRIPT_DIR%\.venv\Scripts\python.exe" -c "from mini_rag import CodeEmbedder; embedder = CodeEmbedder(); info = embedder.get_embedding_info(); print(f'✅ Embedding method: {info[\"method\"]}')" 2>nul
 if errorlevel 1 (
    echo ⚠️ Warning: Embedding test inconclusive, but core system is ready
 )
 echo.
 echo ══════════════════════════════════════════════════
 echo [5/6] Setting Up Desktop Integration...
 call :setup_windows_icon
 echo.
 echo ══════════════════════════════════════════════════
 echo [6/6] Checking AI Features (Optional)...
 call :check_ollama_enhanced
 echo.
 echo ╔══════════════════════════════════════════════════╗
 echo ║             INSTALLATION SUCCESSFUL!            ║
 echo ╚══════════════════════════════════════════════════╝
 echo.
 echo 🎯 Quick Start Options:
 echo.
 echo 🎨 For Beginners (Recommended):
 echo    rag.bat                 - Interactive interface with guided setup
 echo.
 echo 💻 For Developers:
 echo    rag.bat index C:\myproject      - Index a project
 echo    rag.bat search C:\myproject "authentication"  - Search project  
 echo    rag.bat help            - Show all commands
 echo.
 REM Offer interactive tutorial
 echo 🧪 Quick Test Available:
 echo    Test FSS-Mini-RAG with a small sample project (takes ~30 seconds)
 echo.
 set /p "run_test=Run interactive tutorial now? [Y/n]: "
 if /i "!run_test!" NEQ "n" (
    call :run_tutorial
 ) else (
    echo 📚 You can run the tutorial anytime with: rag.bat
 )
 echo.
 echo 🎉 Setup complete! FSS-Mini-RAG is ready to use.
 echo 💡 Pro tip: Try indexing any folder with text files - code, docs, notes!
 echo.
 pause
 exit /b 0
 :check_ollama_enhanced
 echo 🤖 Checking for AI capabilities...
 echo.
 REM Check if Ollama is installed
 where ollama >nul 2>&1
 if errorlevel 1 (
    echo ⚠️ Ollama not installed - using basic search mode
    echo.
    echo 🎯 For Enhanced AI Features:
    echo    • 📥 Install Ollama: https://ollama.com/download
    echo    • 🔄 Run: ollama serve  
    echo    • 🧠 Download model: ollama pull qwen3:1.7b
    echo.
    echo 💡 Benefits of AI features:
    echo    • Smart query expansion for better search results
    echo    • Interactive exploration mode with conversation memory
    echo    • AI-powered synthesis of search results  
    echo    • Natural language understanding of your questions
    echo.
    goto :eof
 )
 REM Check if Ollama server is running
 curl -s http://localhost:11434/api/version >nul 2>&1
 if errorlevel 1 (
    echo 🟡 Ollama installed but not running
    echo.
    set /p "start_ollama=Start Ollama server now? [Y/n]: "
    if /i "!start_ollama!" NEQ "n" (
        echo 🚀 Starting Ollama server...
        start /b ollama serve
        timeout /t 3 /nobreak >nul
        curl -s http://localhost:11434/api/version >nul 2>&1
        if errorlevel 1 (
            echo ⚠️ Could not start Ollama automatically
            echo 💡 Please run: ollama serve
        ) else (
            echo ✅ Ollama server started successfully!
        )
    )
 ) else (
    echo ✅ Ollama server is running!
 )
 REM Check for available models
 echo 🔍 Checking for AI models...
 ollama list 2>nul | findstr /v "NAME" | findstr /v "^$" >nul
 if errorlevel 1 (
    echo 📦 No AI models found
    echo.
    echo 🧠 Recommended Models (choose one):
    echo    • qwen3:1.7b    - Excellent for RAG (1.4GB, recommended)
    echo    • qwen3:0.6b    - Lightweight and fast (~500MB)  
    echo    • qwen3:4b      - Higher quality but slower (~2.5GB)
    echo.
    set /p "install_model=Download qwen3:1.7b model now? [Y/n]: "
    if /i "!install_model!" NEQ "n" (
        echo 📥 Downloading qwen3:1.7b model...
        echo    This may take 5-10 minutes depending on your internet speed
        ollama pull qwen3:1.7b
        if errorlevel 1 (
            echo ⚠️ Download failed - you can try again later with: ollama pull qwen3:1.7b
        ) else (
            echo ✅ Model downloaded successfully! AI features are now available.
        )
    )
 ) else (
    echo ✅ AI models found - full AI features available!
    echo 🎉 Your system supports query expansion, exploration mode, and synthesis!
 )
 goto :eof
 :run_tutorial
 echo.
 echo ═══════════════════════════════════════════════════
 echo 🧪 Running Interactive Tutorial
 echo ═══════════════════════════════════════════════════
 echo.
 echo 📚 This tutorial will:
 echo    • Index the FSS-Mini-RAG documentation
 echo    • Show you how to search effectively
 echo    • Demonstrate AI features (if available)
 echo.
 call "%SCRIPT_DIR%\.venv\Scripts\activate.bat"
 echo 📁 Indexing project for demonstration...
 "%SCRIPT_DIR%\.venv\Scripts\python.exe" rag-mini.py index "%SCRIPT_DIR%" >nul 2>&1
 if errorlevel 1 (
    echo ❌ Indexing failed - please check the installation
    goto :eof
 )
 echo ✅ Indexing complete! 
 echo.
 echo 🔍 Example search: "embedding"
 "%SCRIPT_DIR%\.venv\Scripts\python.exe" rag-mini.py search "%SCRIPT_DIR%" "embedding" --top-k 3
 echo.
 echo 🎯 Try the interactive interface:
 echo    rag.bat
 echo.
 echo 💡 You can now search any project by indexing it first!
 goto :eof
 :setup_windows_icon
 echo 🎨 Setting up application icon and shortcuts...
 REM Check if icon exists
 if not exist "%SCRIPT_DIR%\assets\Fss_Mini_Rag.png" (
    echo ⚠️ Icon file not found - skipping desktop integration
    goto :eof
 )
 REM Create desktop shortcut
 echo 📱 Creating desktop shortcut...
 set "desktop=%USERPROFILE%\Desktop"
 set "shortcut=%desktop%\FSS-Mini-RAG.lnk"
 REM Use PowerShell to create shortcut with icon
 powershell -Command "& {$WshShell = New-Object -comObject WScript.Shell; $Shortcut = $WshShell.CreateShortcut('%shortcut%'); $Shortcut.TargetPath = '%SCRIPT_DIR%\rag.bat'; $Shortcut.WorkingDirectory = '%SCRIPT_DIR%'; $Shortcut.Description = 'FSS-Mini-RAG - Fast Semantic Search'; $Shortcut.Save()}" >nul 2>&1
 if exist "%shortcut%" (
    echo ✅ Desktop shortcut created
 ) else (
    echo ⚠️ Could not create desktop shortcut
 )
 REM Create Start Menu shortcut
 echo 📂 Creating Start Menu entry...
 set "startmenu=%APPDATA%\Microsoft\Windows\Start Menu\Programs"
 set "startshortcut=%startmenu%\FSS-Mini-RAG.lnk"
 powershell -Command "& {$WshShell = New-Object -comObject WScript.Shell; $Shortcut = $WshShell.CreateShortcut('%startshortcut%'); $Shortcut.TargetPath = '%SCRIPT_DIR%\rag.bat'; $Shortcut.WorkingDirectory = '%SCRIPT_DIR%'; $Shortcut.Description = 'FSS-Mini-RAG - Fast Semantic Search'; $Shortcut.Save()}" >nul 2>&1
 if exist "%startshortcut%" (
    echo ✅ Start Menu entry created
 ) else (
    echo ⚠️ Could not create Start Menu entry
 )
 echo 💡 FSS-Mini-RAG shortcuts have been created on your Desktop and Start Menu
 echo    You can now launch the application from either location
 goto :eof
--- a/mini_rag/config.py
+++ b/mini_rag/config.py
@ -81,10 +81,6 @@ class LLMConfig:
    enable_thinking: bool = True  # Enable thinking mode for Qwen3 models
    cpu_optimized: bool = True     # Prefer lightweight models
    # Context window configuration (critical for RAG performance)
    context_window: int = 16384    # Context window size in tokens (16K recommended)
    auto_context: bool = True      # Auto-adjust context based on model capabilities
    # Model preference rankings (configurable)
    model_rankings: list = None    # Will be set in __post_init__
@ -108,9 +104,9 @@ class LLMConfig:
                # Recommended model (excellent quality but larger)
                "qwen3:4b",
-                # Common fallbacks (prioritize Qwen models)  
+                # Common fallbacks (only include models we know exist)
                "llama3.2:1b",
                "qwen2.5:1.5b",
                "qwen2.5:3b",
            ]
@ -259,11 +255,6 @@ class ConfigManager:
            f"  max_expansion_terms: {config_dict['llm']['max_expansion_terms']}        # Maximum terms to add to queries",
            f"  enable_synthesis: {str(config_dict['llm']['enable_synthesis']).lower()}       # Enable synthesis by default",
            f"  synthesis_temperature: {config_dict['llm']['synthesis_temperature']}      # LLM temperature for analysis",
            "",
            "  # Context window configuration (critical for RAG performance)",
            f"  context_window: {config_dict['llm']['context_window']}           # Context size in tokens (8K=fast, 16K=balanced, 32K=advanced)",
            f"  auto_context: {str(config_dict['llm']['auto_context']).lower()}            # Auto-adjust context based on model capabilities",
            "",
            "  model_rankings:          # Preferred model order (edit to change priority)",
        ])
--- a/mini_rag/explorer.py
+++ b/mini_rag/explorer.py
@ -115,13 +115,12 @@ class CodeExplorer:
        # Add to conversation history
        self.current_session.add_exchange(question, results, synthesis)
-        # Streaming already displayed the response
+        # Format response with exploration context
-        # Just return minimal status for caller
+        response = self._format_exploration_response(
-        session_duration = time.time() - self.current_session.started_at
+            question, synthesis, len(results), search_time, synthesis_time
-        exchange_count = len(self.current_session.conversation_history)
+        )
-        status = f"\n📊 Session: {session_duration/60:.1f}m | Question #{exchange_count} | Results: {len(results)} | Time: {search_time+synthesis_time:.1f}s"
+        return response
        return status
    def _build_contextual_prompt(self, question: str, results: List[Any]) -> str:
        """Build a prompt that includes conversation context."""
@ -186,22 +185,33 @@ CURRENT QUESTION: "{question}"
 RELEVANT INFORMATION FOUND:
 {results_text}
-Please provide a helpful, natural explanation that answers their question. Write as if you're having a friendly conversation with a colleague who's exploring this project.
+Please provide a helpful analysis in JSON format:
-Structure your response to include:
+{{
-1. A clear explanation of what you found and how it answers their question
+    "summary": "Clear explanation of what you found and how it answers their question",
-2. The most important insights from the information you discovered  
+    "key_points": [
-3. Relevant examples or code patterns when helpful
+        "Most important insight from the information",
-4. Practical next steps they could take
+        "Secondary important point or relationship", 
        "Third key point or practical consideration"
    ],
    "code_examples": [
        "Relevant example or pattern from the information",
        "Another useful example or demonstration"
    ],
    "suggested_actions": [
        "Specific next step they could take",
        "Additional exploration or investigation suggestion",
        "Practical way to apply this information"
    ],
    "confidence": 0.85
 }}
 Guidelines:
- Write in a conversational, friendly tone
+- Be educational and break things down clearly
 - Be educational but not condescending
 - Reference specific files and information when helpful
 - Give practical, actionable suggestions
- Connect everything back to their original question
+- Keep explanations beginner-friendly but not condescending
- Use natural language, not structured formats
+- Connect information to their question directly
 - Break complex topics into understandable pieces
 """
        return prompt
@ -209,12 +219,16 @@ Guidelines:
    def _synthesize_with_context(self, prompt: str, results: List[Any]) -> SynthesisResult:
        """Synthesize results with full context and thinking."""
        try:
-            # Use streaming with thinking visible (don't collapse)
+            # TEMPORARILY: Use simple non-streaming call to avoid flow issues
-            response = self.synthesizer._call_ollama(prompt, temperature=0.2, disable_thinking=False, use_streaming=True, collapse_thinking=False)
+            # TODO: Re-enable streaming once flow is stable
            response = self.synthesizer._call_ollama(prompt, temperature=0.2, disable_thinking=False)
            thinking_stream = ""
-            # Streaming already shows thinking and response
+            # Display simple thinking indicator
-            # No need for additional indicators
+            if response and len(response) > 200:
                print("\n💭 Analysis in progress...")
            # Don't display thinking stream again - keeping it simple for now
            if not response:
                return SynthesisResult(
@ -225,14 +239,40 @@ Guidelines:
                    confidence=0.0
                )
-            # Use natural language response directly
+            # Parse the structured response
-            return SynthesisResult(
+            try:
-                summary=response.strip(),
+                # Extract JSON from response
-                key_points=[],  # Not used with natural language responses
+                start_idx = response.find('{')
-                code_examples=[],  # Not used with natural language responses
+                end_idx = response.rfind('}') + 1
-                suggested_actions=[],  # Not used with natural language responses
+                if start_idx >= 0 and end_idx > start_idx:
-                confidence=0.85  # High confidence for natural responses
+                    json_str = response[start_idx:end_idx]
-            )
+                    data = json.loads(json_str)
                    return SynthesisResult(
                        summary=data.get('summary', 'Analysis completed'),
                        key_points=data.get('key_points', []),
                        code_examples=data.get('code_examples', []),
                        suggested_actions=data.get('suggested_actions', []),
                        confidence=float(data.get('confidence', 0.7))
                    )
                else:
                    # Fallback: use raw response as summary
                    return SynthesisResult(
                        summary=response[:400] + '...' if len(response) > 400 else response,
                        key_points=[],
                        code_examples=[],
                        suggested_actions=[],
                        confidence=0.5
                    )
            except json.JSONDecodeError:
                return SynthesisResult(
                    summary="Analysis completed but format parsing failed",
                    key_points=[],
                    code_examples=[],
                    suggested_actions=["Try rephrasing your question"],
                    confidence=0.3
                )
        except Exception as e:
            logger.error(f"Context synthesis failed: {e}")
@ -260,12 +300,29 @@ Guidelines:
        output.append("=" * 60)
        output.append("")
-        # Response was already displayed via streaming
+        # Main analysis
-        # Just show completion status
+        output.append(f"📝 Analysis:")
-        output.append("✅ Analysis complete")
+        output.append(f"   {synthesis.summary}")
        output.append("")
        output.append("")
        if synthesis.key_points:
            output.append("🔍 Key Insights:")
            for point in synthesis.key_points:
                output.append(f"   • {point}")
            output.append("")
        if synthesis.code_examples:
            output.append("💡 Code Examples:")
            for example in synthesis.code_examples:
                output.append(f"   {example}")
            output.append("")
        if synthesis.suggested_actions:
            output.append("🎯 Next Steps:")
            for action in synthesis.suggested_actions:
                output.append(f"   • {action}")
            output.append("")
        # Confidence and context indicator
        confidence_emoji = "🟢" if synthesis.confidence > 0.7 else "🟡" if synthesis.confidence > 0.4 else "🔴"
        context_indicator = f" | Context: {exchange_count-1} previous questions" if exchange_count > 1 else ""
@ -408,7 +465,7 @@ Guidelines:
                    "temperature": temperature,
                    "top_p": optimal_params.get("top_p", 0.9),
                    "top_k": optimal_params.get("top_k", 40),
-                    "num_ctx": self.synthesizer._get_optimal_context_size(model_to_use),
+                    "num_ctx": optimal_params.get("num_ctx", 32768),
                    "num_predict": optimal_params.get("num_predict", 2000),
                    "repeat_penalty": optimal_params.get("repeat_penalty", 1.1),
                    "presence_penalty": optimal_params.get("presence_penalty", 1.0)
--- a/mini_rag/llm_safeguards.py
+++ b/mini_rag/llm_safeguards.py
@ -195,7 +195,7 @@ class ModelRunawayDetector:
 • Try a more specific question
 • Break complex questions into smaller parts
 • Use exploration mode which handles context better: `rag-mini explore`
-• Consider: A larger model (qwen3:1.7b or qwen3:4b) would help"""
+• Consider: A larger model (qwen3:1.7b or qwen3:3b) would help"""
    def _explain_thinking_loop(self) -> str:
        return """🧠 The AI got caught in a "thinking loop" - overthinking the response.
@ -266,7 +266,7 @@ class ModelRunawayDetector:
        # Universal suggestions
        suggestions.extend([
-            "Consider using a larger model if available (qwen3:1.7b or qwen3:4b)",
+            "Consider using a larger model if available (qwen3:1.7b or qwen3:3b)",
            "Check model status: `ollama list`"
        ])
--- a/mini_rag/llm_synthesizer.py
+++ b/mini_rag/llm_synthesizer.py
@ -72,8 +72,8 @@ class LLMSynthesizer:
        else:
            # Fallback rankings if no config
            model_rankings = [
-                "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "qwen2.5:3b", 
+                "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "llama3.2:1b", 
-                "qwen2.5:1.5b", "qwen2.5-coder:1.5b"
+                "qwen2.5:1.5b", "qwen3:3b", "qwen2.5-coder:1.5b"
            ]
        # Find first available model from our ranked list (exact matches first)
@ -114,57 +114,12 @@ class LLMSynthesizer:
        self._initialized = True
    def _get_optimal_context_size(self, model_name: str) -> int:
        """Get optimal context size based on model capabilities and configuration."""
        # Get configured context window
        if self.config and hasattr(self.config, 'llm'):
            configured_context = self.config.llm.context_window
            auto_context = getattr(self.config.llm, 'auto_context', True)
        else:
            configured_context = 16384  # Default to 16K
            auto_context = True
        # Model-specific maximum context windows (based on research)
        model_limits = {
            # Qwen3 models with native context support
            'qwen3:0.6b': 32768,    # 32K native
            'qwen3:1.7b': 32768,    # 32K native  
            'qwen3:4b': 131072,     # 131K with YaRN extension
            # Qwen2.5 models
            'qwen2.5:1.5b': 32768,  # 32K native
            'qwen2.5:3b': 32768,    # 32K native
            'qwen2.5-coder:1.5b': 32768,  # 32K native
            # Fallback for unknown models
            'default': 8192
        }
        # Find model limit (check for partial matches)
        model_limit = model_limits.get('default', 8192)
        for model_pattern, limit in model_limits.items():
            if model_pattern != 'default' and model_pattern.lower() in model_name.lower():
                model_limit = limit
                break
        # If auto_context is enabled, respect model limits
        if auto_context:
            optimal_context = min(configured_context, model_limit)
        else:
            optimal_context = configured_context
        # Ensure minimum usable context for RAG
        optimal_context = max(optimal_context, 4096)  # Minimum 4K for basic RAG
        logger.debug(f"Context for {model_name}: {optimal_context} tokens (configured: {configured_context}, limit: {model_limit})")
        return optimal_context
    def is_available(self) -> bool:
        """Check if Ollama is available and has models."""
        self._ensure_initialized()
        return len(self.available_models) > 0
-    def _call_ollama(self, prompt: str, temperature: float = 0.3, disable_thinking: bool = False, use_streaming: bool = True, collapse_thinking: bool = True) -> Optional[str]:
+    def _call_ollama(self, prompt: str, temperature: float = 0.3, disable_thinking: bool = False, use_streaming: bool = False) -> Optional[str]:
        """Make a call to Ollama API with safeguards."""
        start_time = time.time()
@ -219,16 +174,16 @@ class LLMSynthesizer:
                    "temperature": qwen3_temp,
                    "top_p": qwen3_top_p,
                    "top_k": qwen3_top_k,
-                    "num_ctx": self._get_optimal_context_size(model_to_use),  # Dynamic context based on model and config
+                    "num_ctx": 32000,  # Critical: Qwen3 context length (32K token limit)
                    "num_predict": optimal_params.get("num_predict", 2000),
                    "repeat_penalty": optimal_params.get("repeat_penalty", 1.1),
                    "presence_penalty": qwen3_presence
                }
            }
-            # Handle streaming with thinking display
+            # Handle streaming with early stopping
            if use_streaming:
-                return self._handle_streaming_with_thinking_display(payload, model_to_use, use_thinking, start_time, collapse_thinking)
+                return self._handle_streaming_with_early_stop(payload, model_to_use, use_thinking, start_time)
            response = requests.post(
                f"{self.ollama_url}/api/generate",
@ -329,130 +284,6 @@ This is normal with smaller AI models and helps ensure you get quality responses
 This is normal with smaller AI models and helps ensure you get quality responses."""
    def _handle_streaming_with_thinking_display(self, payload: dict, model_name: str, use_thinking: bool, start_time: float, collapse_thinking: bool = True) -> Optional[str]:
        """Handle streaming response with real-time thinking token display."""
        import json
        import sys
        try:
            response = requests.post(
                f"{self.ollama_url}/api/generate",
                json=payload,
                stream=True,
                timeout=65
            )
            if response.status_code != 200:
                logger.error(f"Ollama API error: {response.status_code}")
                return None
            full_response = ""
            thinking_content = ""
            is_in_thinking = False
            is_thinking_complete = False
            thinking_lines_printed = 0
            # ANSI escape codes for colors and cursor control
            GRAY = '\033[90m'      # Dark gray for thinking
            LIGHT_GRAY = '\033[37m'  # Light gray alternative
            RESET = '\033[0m'      # Reset color
            CLEAR_LINE = '\033[2K' # Clear entire line
            CURSOR_UP = '\033[A'   # Move cursor up one line
            print(f"\n💭 {GRAY}Thinking...{RESET}", flush=True)
            for line in response.iter_lines():
                if line:
                    try:
                        chunk_data = json.loads(line.decode('utf-8'))
                        chunk_text = chunk_data.get('response', '')
                        if chunk_text:
                            full_response += chunk_text
                            # Handle thinking tokens
                            if use_thinking and '<think>' in chunk_text:
                                is_in_thinking = True
                                chunk_text = chunk_text.replace('<think>', '')
                            if is_in_thinking and '</think>' in chunk_text:
                                is_in_thinking = False
                                is_thinking_complete = True
                                chunk_text = chunk_text.replace('</think>', '')
                                if collapse_thinking:
                                    # Clear thinking content and show completion
                                    # Move cursor up to clear thinking lines
                                    for _ in range(thinking_lines_printed + 1):
                                        print(f"{CURSOR_UP}{CLEAR_LINE}", end='', flush=True)
                                    print(f"💭 {GRAY}Thinking complete ✓{RESET}", flush=True)
                                    thinking_lines_printed = 0
                                else:
                                    # Keep thinking visible, just show completion
                                    print(f"\n💭 {GRAY}Thinking complete ✓{RESET}", flush=True)
                                print("🤖 AI Response:", flush=True)
                                continue
                            # Display thinking content in gray with better formatting
                            if is_in_thinking and chunk_text.strip():
                                thinking_content += chunk_text
                                # Handle line breaks and word wrapping properly
                                if ' ' in chunk_text or '\n' in chunk_text or len(thinking_content) > 100:
                                    # Split by sentences for better readability
                                    sentences = thinking_content.replace('\n', ' ').split('. ')
                                    for sentence in sentences[:-1]:  # Process complete sentences
                                        sentence = sentence.strip()
                                        if sentence:
                                            # Word wrap long sentences
                                            words = sentence.split()
                                            line = ""
                                            for word in words:
                                                if len(line + " " + word) > 70:
                                                    if line:
                                                        print(f"{GRAY}   {line.strip()}{RESET}", flush=True)
                                                        thinking_lines_printed += 1
                                                    line = word
                                                else:
                                                    line += " " + word if line else word
                                            if line.strip():
                                                print(f"{GRAY}   {line.strip()}.{RESET}", flush=True)
                                                thinking_lines_printed += 1
                                    # Keep the last incomplete sentence for next iteration
                                    thinking_content = sentences[-1] if sentences else ""
                            # Display regular response content (skip any leftover thinking)
                            elif not is_in_thinking and is_thinking_complete and chunk_text.strip():
                                # Filter out any remaining thinking tags that might leak through
                                clean_text = chunk_text
                                if '<think>' in clean_text or '</think>' in clean_text:
                                    clean_text = clean_text.replace('<think>', '').replace('</think>', '')
                                if clean_text.strip():
                                    print(clean_text, end='', flush=True)
                        # Check if response is done
                        if chunk_data.get('done', False):
                            print()  # Final newline
                            break
                    except json.JSONDecodeError:
                        continue
                    except Exception as e:
                        logger.error(f"Error processing stream chunk: {e}")
                        continue
            return full_response
        except Exception as e:
            logger.error(f"Streaming failed: {e}")
            return None
    def _handle_streaming_with_early_stop(self, payload: dict, model_name: str, use_thinking: bool, start_time: float) -> Optional[str]:
        """Handle streaming response with intelligent early stopping."""
        import json
--- a/mini_rag/query_expander.py
+++ b/mini_rag/query_expander.py
@ -170,8 +170,8 @@ Expanded query:"""
                # Use same model rankings as main synthesizer for consistency
                expansion_preferences = [
-                    "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "qwen2.5:3b", 
+                    "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "llama3.2:1b", 
-                    "qwen2.5:1.5b", "qwen2.5-coder:1.5b"
+                    "qwen2.5:1.5b", "qwen3:3b", "qwen2.5-coder:1.5b"
                ]
                for preferred in expansion_preferences:
--- a/rag-mini.py
+++ b/rag-mini.py
@ -142,8 +142,8 @@ def search_project(project_path: Path, query: str, top_k: int = 10, synthesize:
            print("   • Search for file types: \"python class\" or \"javascript function\"")
            print()
            print("⚙️ Configuration adjustments:")
-            print(f"   • Lower threshold: ./rag-mini search \"{project_path}\" \"{query}\" --threshold 0.05")
+            print(f"   • Lower threshold: ./rag-mini search {project_path} \"{query}\" --threshold 0.05")
-            print(f"   • More results: ./rag-mini search \"{project_path}\" \"{query}\" --top-k 20")
+            print("   • More results: add --top-k 20")
            print()
            print("📚 Need help? See: docs/TROUBLESHOOTING.md")
            return
@ -201,7 +201,7 @@ def search_project(project_path: Path, query: str, top_k: int = 10, synthesize:
            else:
                print("❌ LLM synthesis unavailable")
                print("   • Ensure Ollama is running: ollama serve")
-                print("   • Install a model: ollama pull qwen3:1.7b")
+                print("   • Install a model: ollama pull llama3.2")
                print("   • Check connection to http://localhost:11434")
        # Save last search for potential enhancements
@ -317,27 +317,12 @@ def explore_interactive(project_path: Path):
        if not explorer.start_exploration_session():
            sys.exit(1)
        # Show enhanced first-time guidance
        print(f"\n🤔 Ask your first question about {project_path.name}:")
        print()
        print("💡 Enter your search query or question below:")
        print('   Examples: "How does authentication work?" or "Show me error handling"')
        print()
        print("🔧 Quick options:")
        print("   1. Help - Show example questions")
        print("   2. Status - Project information")  
        print("   3. Suggest - Get a random starter question")
        print()
        is_first_question = True
        while True:
            try:
-                # Get user input with clearer prompt
+                # Get user input
-                if is_first_question:
+                question = input("\n> ").strip()
                    question = input("📝 Enter question or option (1-3): ").strip()
                else:
                    question = input("\n> ").strip()
                # Handle exit commands
                if question.lower() in ['quit', 'exit', 'q']:
@ -346,17 +331,14 @@ def explore_interactive(project_path: Path):
                # Handle empty input
                if not question:
-                    if is_first_question:
+                    print("Please enter a question or 'quit' to exit.")
                        print("Please enter a question or try option 3 for a suggestion.")
                    else:
                        print("Please enter a question or 'quit' to exit.")
                    continue
-                # Handle numbered options and special commands
+                # Special commands
-                if question in ['1'] or question.lower() in ['help', 'h']:
+                if question.lower() in ['help', 'h']:
                    print("""
 🧠 EXPLORATION MODE HELP:
-  • Ask any question about your documents or code
+  • Ask any question about the codebase
  • I remember our conversation for follow-up questions
  • Use 'why', 'how', 'explain' for detailed reasoning
  • Type 'summary' to see session overview
@ -364,54 +346,12 @@ def explore_interactive(project_path: Path):
 💡 Example questions:
  • "How does authentication work?"
  • "What are the main components?"
  • "Show me error handling patterns"
  • "Why is this function slow?"
-  • "What security measures are in place?"
+  • "Explain the database connection logic"
-  • "How does data flow through this system?"
+  • "What are the security concerns here?"
 """)
                    continue
                elif question in ['2'] or question.lower() == 'status':
                    print(f"""
 📊 PROJECT STATUS: {project_path.name}
  • Location: {project_path}
  • Exploration session active
  • AI model ready for questions
  • Conversation memory enabled
 """)
                    continue
                elif question in ['3'] or question.lower() == 'suggest':
                    # Random starter questions for first-time users
                    if is_first_question:
                        import random
                        starters = [
                            "What are the main components of this project?",
                            "How is error handling implemented?", 
                            "Show me the authentication and security logic",
                            "What are the key functions I should understand first?",
                            "How does data flow through this system?",
                            "What configuration options are available?",
                            "Show me the most important files to understand"
                        ]
                        suggested = random.choice(starters)
                        print(f"\n💡 Suggested question: {suggested}")
                        print("   Press Enter to use this, or type your own question:")
                        next_input = input("📝 > ").strip()
                        if not next_input:  # User pressed Enter to use suggestion
                            question = suggested
                        else:
                            question = next_input
                    else:
                        # For subsequent questions, could add AI-powered suggestions here
                        print("\n💡 Based on our conversation, you might want to ask:")
                        print('   "Can you explain that in more detail?"')
                        print('   "What are the security implications?"')
                        print('   "Show me related code examples"')
                        continue
                if question.lower() == 'summary':
                    print("\n" + explorer.get_session_summary())
                    continue
@ -421,9 +361,6 @@ def explore_interactive(project_path: Path):
                print("🧠 Thinking with AI model...")
                response = explorer.explore_question(question)
                # Mark as no longer first question after processing
                is_first_question = False
                if response:
                    print(f"\n{response}")
                else:
--- a/rag-tui.py
+++ b/rag-tui.py
--- a/rag.bat
+++ b/rag.bat
@ -1,51 +0,0 @@
@echo off
 REM FSS-Mini-RAG Windows Launcher - Simple and Reliable
 setlocal
 set "SCRIPT_DIR=%~dp0"
 set "SCRIPT_DIR=%SCRIPT_DIR:~0,-1%"
 set "VENV_PYTHON=%SCRIPT_DIR%\.venv\Scripts\python.exe"
 REM Check if virtual environment exists
 if not exist "%VENV_PYTHON%" (
    echo Virtual environment not found!
    echo.
    echo Run this first: install_windows.bat
    echo.
    pause
    exit /b 1
 )
 REM Route commands
 if "%1"=="" goto :interactive
 if "%1"=="help" goto :help
 if "%1"=="--help" goto :help
 if "%1"=="-h" goto :help
 REM Pass all arguments to Python script
 "%VENV_PYTHON%" "%SCRIPT_DIR%\rag-mini.py" %*
 goto :end
 :interactive
 echo Starting interactive interface...
 "%VENV_PYTHON%" "%SCRIPT_DIR%\rag-tui.py"
 goto :end
 :help
 echo FSS-Mini-RAG - Semantic Code Search
 echo.
 echo Usage:
 echo   rag.bat                           - Interactive interface
 echo   rag.bat index ^<folder^>             - Index a project
 echo   rag.bat search ^<folder^> ^<query^>     - Search project
 echo   rag.bat status ^<folder^>            - Check status
 echo.
 echo Examples:
 echo   rag.bat index C:\myproject
 echo   rag.bat search C:\myproject "authentication"
 echo   rag.bat search . "error handling"
 echo.
 pause
 :end
 endlocal