25 changed files with 257 additions and 2585 deletions
--- a/.gitignore
+++ b/.gitignore
@ -41,14 +41,10 @@ Thumbs.db

 # RAG system specific
 .claude-rag/
-.mini-rag/
 *.lance/
 *.db
 manifest.json

-# Claude Code specific
-.claude/
-
 # Logs and temporary files
 *.log
 *.tmp
--- a/PR_BODY.md
+++ b/PR_BODY.md
@ -1,109 +0,0 @@
-## Problem Statement
-
-Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
-
- **Default 2048 tokens** is inadequate for RAG applications
- Users can't configure context window for their hardware/use case
- No guidance on optimal context sizes for different models
- Inconsistent context handling across the codebase
- New users don't understand context window importance
-
-## Impact on User Experience
-
-**With 2048 token context window:**
- Only 1-2 responses possible before context truncation
- Thinking tokens consume significant context space
- Poor performance with larger document chunks
- Frustrated users who don't understand why responses degrade
-
-**With proper context configuration:**
- 5-15+ responses in exploration mode
- Support for advanced use cases (15+ results, 4000+ character chunks)
- Better coding assistance and analysis
- Professional-grade RAG experience
-
-## Solution Implemented
-
-### 1. Enhanced Model Configuration Menu
-Added context window selection alongside model selection with:
- **Development**: 8K tokens (fast, good for most cases)
- **Production**: 16K tokens (balanced performance)  
- **Advanced**: 32K+ tokens (heavy development work)
-
-### 2. Educational Content
-Helps users understand:
- Why context window size matters for RAG
- Hardware implications of larger contexts
- Optimal settings for their use case
- Model-specific context capabilities
-
-### 3. Consistent Implementation
- Updated all Ollama API calls to use consistent context settings
- Ensured configuration applies across synthesis, expansion, and exploration
- Added validation for context sizes against model capabilities
- Provided clear error messages for invalid configurations
-
-## Technical Implementation
-
-Based on comprehensive research findings:
-
-### Model Context Capabilities
- **qwen3:0.6b/1.7b**: 32K token maximum
- **qwen3:4b**: 131K token maximum (YaRN extended)
-
-### Recommended Context Sizes
-```yaml
-# Conservative (fast, low memory)
-num_ctx: 8192    # ~6MB memory, excellent for exploration
-
-# Balanced (recommended for most users)  
-num_ctx: 16384   # ~12MB memory, handles complex analysis
-
-# Advanced (heavy development work)
-num_ctx: 32768   # ~24MB memory, supports large codebases
-```
-
-### Configuration Integration
- Added context window selection to TUI configuration menu
- Updated config.yaml schema with context parameters
- Implemented validation for model-specific limits
- Provided migration for existing configurations
-
-## Benefits
-
-1. **Improved User Experience**
-   - Longer conversation sessions
-   - Better analysis quality
-   - Clear performance expectations
-
-2. **Professional RAG Capability**
-   - Support for enterprise-scale projects
-   - Handles large codebases effectively
-   - Enables advanced use cases
-
-3. **Educational Value**
-   - Users learn about context windows
-   - Better understanding of RAG performance
-   - Informed decision making
-
-## Files Changed
-
- `mini_rag/config.py`: Added context window configuration parameters
- `mini_rag/llm_synthesizer.py`: Dynamic context sizing with model awareness
- `mini_rag/explorer.py`: Consistent context application
- `rag-tui.py`: Enhanced configuration menu with context selection
- `PR_DRAFT.md`: Documentation of implementation approach
-
-## Testing Recommendations
-
-1. Test context configuration menu with different models
-2. Verify context limits are enforced correctly
-3. Test conversation length with different context sizes
-4. Validate memory usage estimates
-5. Test advanced use cases (15+ results, large chunks)
-
---
-
-**This PR significantly improves FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
-
-**Ready for review and testing!** 🚀
--- a/PR_DRAFT.md
+++ b/PR_DRAFT.md
@ -1,135 +0,0 @@
-# Add Context Window Configuration for Optimal RAG Performance
-
-## Problem Statement
-
-Currently, FSS-Mini-RAG uses Ollama's default context window settings, which severely limits performance:
-
- **Default 2048 tokens** is inadequate for RAG applications
- Users can't configure context window for their hardware/use case
- No guidance on optimal context sizes for different models
- Inconsistent context handling across the codebase
- New users don't understand context window importance
-
-## Impact on User Experience
-
-**With 2048 token context window:**
- Only 1-2 responses possible before context truncation
- Thinking tokens consume significant context space
- Poor performance with larger document chunks
- Frustrated users who don't understand why responses degrade
-
-**With proper context configuration:**
- 5-15+ responses in exploration mode
- Support for advanced use cases (15+ results, 4000+ character chunks)
- Better coding assistance and analysis
- Professional-grade RAG experience
-
-## Proposed Solution
-
-### 1. Enhanced Model Configuration Menu
-Add context window selection alongside model selection with:
- **Development**: 8K tokens (fast, good for most cases)
- **Production**: 16K tokens (balanced performance)  
- **Advanced**: 32K+ tokens (heavy development work)
-
-### 2. Educational Content
-Help users understand:
- Why context window size matters for RAG
- Hardware implications of larger contexts
- Optimal settings for their use case
- Model-specific context capabilities
-
-### 3. Consistent Implementation
- Update all Ollama API calls to use consistent context settings
- Ensure configuration applies across synthesis, expansion, and exploration
- Validate context sizes against model capabilities
- Provide clear error messages for invalid configurations
-
-## Technical Implementation
-
-Based on research findings:
-
-### Model Context Capabilities
- **qwen3:0.6b/1.7b**: 32K token maximum
- **qwen3:4b**: 131K token maximum (YaRN extended)
-
-### Recommended Context Sizes
-```yaml
-# Conservative (fast, low memory)
-num_ctx: 8192    # ~6MB memory, excellent for exploration
-
-# Balanced (recommended for most users)  
-num_ctx: 16384   # ~12MB memory, handles complex analysis
-
-# Advanced (heavy development work)
-num_ctx: 32768   # ~24MB memory, supports large codebases
-```
-
-### Configuration Integration
- Add context window selection to TUI configuration menu
- Update config.yaml schema with context parameters
- Implement validation for model-specific limits
- Provide migration for existing configurations
-
-## Benefits
-
-1. **Improved User Experience**
-   - Longer conversation sessions
-   - Better analysis quality
-   - Clear performance expectations
-
-2. **Professional RAG Capability**
-   - Support for enterprise-scale projects
-   - Handles large codebases effectively
-   - Enables advanced use cases
-
-3. **Educational Value**
-   - Users learn about context windows
-   - Better understanding of RAG performance
-   - Informed decision making
-
-## Implementation Plan
-
-1. **Phase 1**: Research Ollama context handling (✅ Complete)
-2. **Phase 2**: Update configuration system (✅ Complete)
-3. **Phase 3**: Enhance TUI with context selection (✅ Complete)
-4. **Phase 4**: Update all API calls consistently (✅ Complete)
-5. **Phase 5**: Add documentation and validation (✅ Complete)
-
-## Implementation Details
-
-### Configuration System
- Added `context_window` and `auto_context` to LLMConfig
- Default 16K context (vs problematic 2K default)
- Model-specific validation and limits
- YAML output includes helpful context explanations
-
-### TUI Enhancement
- New "Configure context window" menu option
- Educational content about context importance
- Three presets: Development (8K), Production (16K), Advanced (32K)
- Custom size entry with validation
- Memory usage estimates for each option
-
-### API Consistency
- Dynamic context sizing via `_get_optimal_context_size()`
- Model capability awareness (qwen3:4b = 131K, others = 32K)
- Applied consistently to synthesizer and explorer
- Automatic capping at model limits
-
-### User Education
- Clear explanations of why context matters for RAG
- Memory usage implications (8K = 6MB, 16K = 12MB, 32K = 24MB)
- Advanced use case guidance (15+ results, 4000+ chunks)
- Performance vs quality tradeoffs
-
-## Answers to Review Questions
-
-1. ✅ **Auto-detection**: Implemented via `auto_context` flag that respects model limits
-2. ✅ **Model changes**: Dynamic validation against current model capabilities  
-3. ✅ **Scope**: Global configuration with per-model validation
-4. ✅ **Validation**: Comprehensive validation with clear error messages and guidance
-
---
-
-**This PR will significantly improve FSS-Mini-RAG's performance and user experience by properly configuring one of the most critical parameters for RAG systems.**
--- a/README.md
+++ b/README.md
@ -12,40 +12,19 @@
 ## How It Works

 ```mermaid
-flowchart TD
-    Start([🚀 Start FSS-Mini-RAG]) --> Interface{Choose Interface}
+graph LR
+    Files[📁 Your Code/Documents] --> Index[🔍 Index]
+    Index --> Chunks[✂️ Smart Chunks]
+    Chunks --> Embeddings[🧠 Semantic Vectors]
+    Embeddings --> Database[(💾 Vector DB)]
    
-    Interface -->|Beginners| TUI[🖥️ Interactive TUI<br/>./rag-tui]
-    Interface -->|Power Users| CLI[⚡ Advanced CLI<br/>./rag-mini <command>]
+    Query[❓ user auth] --> Search[🎯 Hybrid Search]
+    Database --> Search
+    Search --> Results[📋 Ranked Results]
    
-    TUI --> SelectFolder[📁 Select Folder to Index]
-    CLI --> SelectFolder
-    
-    SelectFolder --> Index[🔍 Index Documents<br/>Creates searchable database]
-    
-    Index --> Ready{📚 Ready to Search}
-    
-    Ready -->|Quick Answers| Search[🔍 Search Mode<br/>Fast semantic search]
-    Ready -->|Deep Analysis| Explore[🧠 Explore Mode<br/>AI-powered analysis]
-    
-    Search --> SearchResults[📋 Instant Results<br/>Ranked by relevance]
-    Explore --> ExploreResults[💬 AI Conversation<br/>Context + reasoning]
-    
-    SearchResults --> More{Want More?}
-    ExploreResults --> More
-    
-    More -->|Different Query| Ready
-    More -->|Advanced Features| CLI
-    More -->|Done| End([✅ Success!])
-    
-    CLI -.->|Full Power| AdvancedFeatures[⚡ Advanced Features:<br/>• Batch processing<br/>• Custom parameters<br/>• Automation scripts<br/>• Background server]
-    
-    style Start fill:#e8f5e8,stroke:#4caf50,stroke-width:2px
-    style CLI fill:#fff9c4,stroke:#f57c00,stroke-width:3px
-    style AdvancedFeatures fill:#fff9c4,stroke:#f57c00,stroke-width:2px
-    style Search fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
-    style Explore fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
-    style End fill:#e8f5e8,stroke:#4caf50,stroke-width:2px
+    style Files fill:#e3f2fd
+    style Results fill:#e8f5e8
+    style Database fill:#fff3e0
 ```

 ## What This Is
@ -79,7 +58,6 @@ FSS-Mini-RAG offers **two distinct experiences** optimized for different use cas

 ## Quick Start (2 Minutes)

-**Linux/macOS:**
 ```bash
 # 1. Install everything
 ./install_mini_rag.sh
@ -92,19 +70,6 @@ FSS-Mini-RAG offers **two distinct experiences** optimized for different use cas
 ./rag-mini explore ~/my-project   # Interactive exploration
 ```

-**Windows:**
-```cmd
-# 1. Install everything
-install_windows.bat
-
-# 2. Choose your interface
-rag.bat                           # Interactive interface
-# OR choose your mode:
-rag.bat index C:\my-project       # Index your project first
-rag.bat search C:\my-project "query"  # Fast search
-rag.bat explore C:\my-project     # Interactive exploration
-```
-
 That's it. No external dependencies, no configuration required, no PhD in computer science needed.

 ## What Makes This Different
@ -154,22 +119,12 @@ That's it. No external dependencies, no configuration required, no PhD in comput
 ## Installation Options

 ### Recommended: Full Installation
-
-**Linux/macOS:**
 ```bash
 ./install_mini_rag.sh
 # Handles Python setup, dependencies, optional AI models
 ```

-**Windows:**
-```cmd
-install_windows.bat
-# Handles Python setup, dependencies, works reliably
-```
-
 ### Experimental: Copy & Run (May Not Work)
-
-**Linux/macOS:**
 ```bash
 # Copy folder anywhere and try to run directly
 ./rag-mini index ~/my-project
@ -177,30 +132,13 @@ install_windows.bat
 # Falls back with clear instructions if it fails
 ```

-**Windows:**
-```cmd
-# Copy folder anywhere and try to run directly
-rag.bat index C:\my-project
-# Auto-setup will attempt to create environment
-# Falls back with clear instructions if it fails
-```
-
 ### Manual Setup
-
-**Linux/macOS:**
 ```bash
 python3 -m venv .venv
 source .venv/bin/activate
 pip install -r requirements.txt
 ```

-**Windows:**
-```cmd
-python -m venv .venv
-.venv\Scripts\activate.bat
-pip install -r requirements.txt
-```
-
 **Note**: The experimental copy & run feature is provided for convenience but may fail on some systems. If you encounter issues, use the full installer for reliable setup.

 ## System Requirements
@ -228,7 +166,7 @@ This implementation prioritizes:

 ## Next Steps

- **New users**: Run `./rag-mini` (Linux/macOS) or `rag.bat` (Windows) for guided experience
+- **New users**: Run `./rag-mini` for guided experience
 - **Developers**: Read [`TECHNICAL_GUIDE.md`](docs/TECHNICAL_GUIDE.md) for implementation details
 - **Contributors**: See [`CONTRIBUTING.md`](CONTRIBUTING.md) for development setup

--- a/commit_message.txt
+++ b/commit_message.txt
@ -1,36 +0,0 @@
-feat: Add comprehensive Windows compatibility and enhanced LLM model setup
-
-🚀 Major cross-platform enhancement making FSS-Mini-RAG fully Windows and Linux compatible
-
-## Windows Compatibility
- **New Windows installer**: `install_windows.bat` - rock-solid, no-hang installation
- **Simple Windows launcher**: `rag.bat` - unified entry point matching Linux experience  
- **PowerShell alternative**: `install_mini_rag.ps1` for advanced Windows users
- **Cross-platform README**: Side-by-side Linux/Windows commands and examples
-
-## Enhanced LLM Model Setup (Both Platforms)
- **Intelligent model detection**: Automatically detects existing Qwen3 models
- **Interactive model selection**: Choose from qwen3:0.6b, 1.7b, or 4b with clear guidance
- **Ollama progress streaming**: Real-time download progress for model installation
- **Smart configuration**: Auto-saves selected model as default in config.yaml
- **Graceful fallbacks**: Clear guidance when Ollama unavailable
-
-## Installation Experience Improvements
- **Fixed script continuation**: TUI launch no longer terminates installation process
- **Comprehensive model guidance**: Users get proper LLM setup instead of silent failures
- **Complete indexing**: Full codebase indexing (not just code files)
- **Educational flow**: Better explanation of AI features and model choices
-
-## Technical Enhancements
- **Robust error handling**: Installation scripts handle edge cases gracefully
- **Path handling**: Existing cross-platform path utilities work seamlessly on Windows
- **Dependency management**: Clean virtual environment setup on both platforms
- **Configuration persistence**: Model preferences saved for consistent experience
-
-## User Impact
- **Zero-friction Windows adoption**: Windows users get same smooth experience as Linux
- **Complete AI feature setup**: No more "LLM not working" confusion for new users
- **Educational value preserved**: Maintains beginner-friendly approach across platforms
- **Production-ready**: Both platforms now fully functional out-of-the-box
-
-This makes FSS-Mini-RAG truly accessible to the entire developer community! 🎉
--- a/docs/BEGINNER_GLOSSARY.md
+++ b/docs/BEGINNER_GLOSSARY.md
@ -117,7 +117,7 @@ def login_user(email, password):

 **Models you might see:**
 - **qwen3:0.6b** - Ultra-fast, good for most questions
- **qwen3:4b** - Slower but more detailed
+- **llama3.2** - Slower but more detailed
 - **auto** - Picks the best available model

 ---
--- a/docs/CPU_DEPLOYMENT.md
+++ b/docs/CPU_DEPLOYMENT.md
@ -49,7 +49,7 @@ ollama run qwen3:0.6b "Hello, can you expand this query: authentication"
 |-------|------|-----------|---------|
 | qwen3:0.6b | 522MB | Fast ⚡ | Excellent ✅ |
 | qwen3:1.7b | 1.4GB | Medium | Excellent ✅ |
-| qwen3:4b | 2.5GB | Slow | Excellent ✅ |
+| qwen3:3b | 2.0GB | Slow | Excellent ✅ |

 ## CPU-Optimized Configuration

--- a/docs/LLM_PROVIDERS.md
+++ b/docs/LLM_PROVIDERS.md
@ -22,8 +22,8 @@ This guide shows how to configure FSS-Mini-RAG with different LLM providers for
 llm:
  provider: ollama
  ollama_host: localhost:11434
-  synthesis_model: qwen3:1.7b
-  expansion_model: qwen3:1.7b
+  synthesis_model: llama3.2
+  expansion_model: llama3.2
  enable_synthesis: false
  synthesis_temperature: 0.3
  cpu_optimized: true
@ -33,13 +33,13 @@ llm:
 **Setup:**
 1. Install Ollama: `curl -fsSL https://ollama.ai/install.sh | sh`
 2. Start service: `ollama serve`
-3. Download model: `ollama pull qwen3:1.7b`
+3. Download model: `ollama pull llama3.2`
 4. Test: `./rag-mini search /path/to/project "test" --synthesize`

 **Recommended Models:**
 - `qwen3:0.6b` - Ultra-fast, good for CPU-only systems
- `qwen3:1.7b` - Balanced quality and speed (recommended)
- `qwen3:4b` - Higher quality, excellent for most use cases
+- `llama3.2` - Balanced quality and speed  
+- `llama3.1:8b` - Higher quality, needs more RAM

 ### LM Studio

--- a/docs/QUERY_EXPANSION.md
+++ b/docs/QUERY_EXPANSION.md
@ -34,24 +34,7 @@ graph LR

 ## Configuration

-### Easy Configuration (TUI)
-
-Use the interactive Configuration Manager in the TUI:
-
-1. **Start TUI**: `./rag-tui` or `rag.bat` (Windows)
-2. **Select Option 6**: Configuration Manager
-3. **Choose Option 2**: Toggle query expansion
-4. **Follow prompts**: Get explanation and easy on/off toggle
-
-The TUI will:
- Explain benefits and requirements clearly
- Check if Ollama is available
- Show current status (enabled/disabled)
- Save changes automatically
-
-### Manual Configuration (Advanced)
-
-Edit `config.yaml` directly:
+Edit `config.yaml`:

 ```yaml
 # Search behavior settings
--- a/docs/TROUBLESHOOTING.md
+++ b/docs/TROUBLESHOOTING.md
@ -143,8 +143,8 @@ python3 -c "import mini_rag; print('✅ Installation successful')"

 2. **Install a model:**
   ```bash
-   ollama pull qwen2.5:3b    # Good balance of speed and quality
-   # Or: ollama pull qwen3:4b   # Larger but better quality
+   ollama pull qwen3:0.6b    # Fast, small model
+   # Or: ollama pull llama3.2  # Larger but better
   ```

 3. **Test connection:**
--- a/docs/TUI_GUIDE.md
+++ b/docs/TUI_GUIDE.md
@ -23,9 +23,8 @@ That's it! The TUI will guide you through everything.
 ### User Flow
 1. **Select Project** → Choose directory to search
 2. **Index Project** → Process files for search
-3. **Search Content** → Find what you need quickly
-4. **Explore Project** → Interactive AI-powered discovery (NEW!)
-5. **Configure System** → Customize search behavior
+3. **Search Content** → Find what you need
+4. **Explore Results** → See full context and files

 ## Main Menu Options

@ -111,63 +110,7 @@ That's it! The TUI will guide you through everything.
 ./rag-mini-enhanced context /path/to/project "login()"
 ```

-### 4. Explore Project (NEW!)
-
-**Purpose**: Interactive AI-powered discovery with conversation memory
-
-**What Makes Explore Different**:
- **Conversational**: Ask follow-up questions that build on previous answers
- **AI Reasoning**: Uses thinking mode for deeper analysis and explanations
- **Educational**: Perfect for understanding unfamiliar codebases
- **Context Aware**: Remembers what you've already discussed
-
-**Interactive Process**:
-1. **First Question Guidance**: Clear prompts with example questions
-2. **Starter Suggestions**: Random helpful questions to get you going
-3. **Natural Follow-ups**: Ask "why?", "how?", "show me more" naturally
-4. **Session Memory**: AI remembers your conversation context
-
-**Explore Mode Features**:
-
-**Quick Start Options**:
- **Option 1 - Help**: Show example questions and explore mode capabilities
- **Option 2 - Status**: Project information and current exploration session
- **Option 3 - Suggest**: Get a random starter question picked from 7 curated examples
-
-**Starter Questions** (randomly suggested):
- "What are the main components of this project?"
- "How is error handling implemented?"
- "Show me the authentication and security logic"
- "What are the key functions I should understand first?"
- "How does data flow through this system?"
- "What configuration options are available?"
- "Show me the most important files to understand"
-
-**Advanced Usage**:
- **Deep Questions**: "Why is this function slow?" "How does the security work?"
- **Code Analysis**: "Explain this algorithm" "What could go wrong here?"
- **Architecture**: "How do these components interact?" "What's the design pattern?"
- **Best Practices**: "Is this code following best practices?" "How would you improve this?"
-
-**What You Learn**:
- **Conversational AI**: How to have productive technical conversations with AI
- **Code Understanding**: Deep analysis capabilities beyond simple search
- **Context Building**: How conversation memory improves over time
- **Question Techniques**: Effective ways to explore unfamiliar code
-
-**CLI Commands Shown**:
-```bash
-./rag-mini explore /path/to/project    # Start interactive exploration
-```
-
-**Perfect For**:
- Understanding new codebases
- Code review and analysis
- Learning from existing projects
- Documenting complex systems
- Onboarding new team members
-
-### 5. View Status
+### 4. View Status

 **Purpose**: Check system health and project information

@ -196,61 +139,32 @@ That's it! The TUI will guide you through everything.
 ./rag-mini status /path/to/project
 ```

-### 6. Configuration Manager (ENHANCED!)
+### 5. Configuration

-**Purpose**: Interactive configuration with user-friendly options
+**Purpose**: View and understand system settings

-**New Interactive Features**:
- **Live Configuration Dashboard** - See current settings with clear status
- **Quick Configuration Options** - Change common settings without YAML editing
- **Guided Setup** - Explanations and presets for each option
- **Validation** - Input checking and helpful error messages
+**Configuration Display**:
+- **Current settings** - Chunk size, strategy, file patterns
+- **File location** - Where config is stored
+- **Setting explanations** - What each option does
+- **Quick actions** - View or edit config directly

-**Main Configuration Options**:
+**Key Settings Explained**:
+- **chunking.max_size** - How large each searchable piece is
+- **chunking.strategy** - Smart (semantic) vs simple (fixed size)
+- **files.exclude_patterns** - Skip certain files/directories
+- **embedding.preferred_method** - AI model preference
+- **search.default_top_k** - How many results to show

-**1. Adjust Chunk Size**:
- **Presets**: Small (1000), Medium (2000), Large (3000), or custom
- **Guidance**: Performance vs accuracy explanations
- **Smart Validation**: Range checking and recommendations
-
-**2. Toggle Query Expansion**:
- **Educational Info**: Clear explanation of benefits and requirements  
- **Easy Toggle**: Simple on/off with confirmation
- **System Check**: Verifies Ollama availability for AI features
-
-**3. Configure Search Behavior**:
- **Result Count**: Adjust default number of search results (1-100)
- **BM25 Toggle**: Enable/disable keyword matching boost
- **Similarity Threshold**: Fine-tune match sensitivity (0.0-1.0)
-
-**4. View/Edit Configuration File**:
- **Full File Viewer**: Display complete config with syntax highlighting
- **Editor Instructions**: Commands for nano, vim, VS Code
- **YAML Help**: Format explanation and editing tips
-
-**5. Reset to Defaults**:
- **Safe Reset**: Confirmation before resetting all settings
- **Clear Explanations**: Shows what defaults will be restored
- **Backup Reminder**: Suggests saving current config first
-
-**6. Advanced Settings**:
- **File Filtering**: Min file size, exclude patterns (view only)
- **Performance Settings**: Batch sizes, streaming thresholds
- **LLM Preferences**: Model rankings and selection priorities
-
-**Key Settings Dashboard**:
- 📁 **Chunk size**: 2000 characters (with emoji indicators)
- 🧠 **Chunking strategy**: semantic
- 🔍 **Search results**: 10 results
- 📊 **Embedding method**: ollama
- 🚀 **Query expansion**: enabled/disabled
- ⚡ **LLM synthesis**: enabled/disabled
+**Interactive Options**:
+- **[V]iew config** - See full configuration file
+- **[E]dit path** - Get command to edit configuration

 **What You Learn**:
- **Configuration Impact**: How settings affect search quality and speed
- **Interactive YAML**: Easier than manual editing for beginners
- **Best Practices**: Recommended settings for different project types
- **System Understanding**: How all components work together
+- How configuration affects search quality
+- YAML configuration format
+- Which settings to adjust for different projects
+- Where to find advanced options

 **CLI Commands Shown**:
 ```bash
@ -258,13 +172,7 @@ cat /path/to/project/.mini-rag/config.yaml   # View config
 nano /path/to/project/.mini-rag/config.yaml  # Edit config
 ```

-**Perfect For**:
- Beginners who find YAML intimidating
- Quick adjustments without memorizing syntax
- Understanding what each setting actually does
- Safe experimentation with guided validation
-
-### 7. CLI Command Reference
+### 6. CLI Command Reference

 **Purpose**: Complete command reference for transitioning to CLI

--- a/examples/config-llm-providers.yaml
+++ b/examples/config-llm-providers.yaml
@ -68,9 +68,9 @@ search:
 llm:
  provider: ollama                    # Use local Ollama
  ollama_host: localhost:11434        # Default Ollama location
-  synthesis_model: qwen3:1.7b         # Good all-around model
-  # alternatives: qwen3:0.6b (faster), qwen2.5:3b (balanced), qwen3:4b (quality)
-  expansion_model: qwen3:1.7b
+  synthesis_model: llama3.2           # Good all-around model
+  # alternatives: qwen3:0.6b (faster), llama3.2:3b (balanced), llama3.1:8b (quality)
+  expansion_model: llama3.2
  enable_synthesis: false
  synthesis_temperature: 0.3
  cpu_optimized: true
--- a/examples/config-quality.yaml
+++ b/examples/config-quality.yaml
@ -102,7 +102,7 @@ llm:
 # For even better results, try these model combinations:
 # • ollama pull nomic-embed-text:latest  (best embeddings)
 # • ollama pull qwen3:1.7b              (good general model)
-# • ollama pull qwen3:4b                (excellent for analysis)
+# • ollama pull llama3.2                (excellent for analysis)
 # 
 # Or adjust these settings for your specific needs:
 # • similarity_threshold: 0.3   (more selective results)
--- a/examples/config.yaml
+++ b/examples/config.yaml
@ -112,7 +112,7 @@ llm:
  synthesis_model: auto           # Which AI model to use for explanations
                                  # 'auto': Picks best available model - RECOMMENDED
                                  # 'qwen3:0.6b': Ultra-fast, good for CPU-only computers
-                                  # 'qwen3:4b': Slower but more detailed explanations
+                                  # 'llama3.2': Slower but more detailed explanations
  
  expansion_model: auto           # Model for query expansion (usually same as synthesis)
  
--- a/install_mini_rag.ps1
+++ b/install_mini_rag.ps1
@ -1,458 +0,0 @@
-# FSS-Mini-RAG PowerShell Installation Script
-# Interactive installer that sets up Python environment and dependencies
-
-# Enable advanced features
-$ErrorActionPreference = "Stop"
-
-# Color functions for better output
-function Write-ColorOutput($message, $color = "White") {
-    Write-Host $message -ForegroundColor $color
-}
-
-function Write-Header($message) {
-    Write-Host "`n" -NoNewline
-    Write-ColorOutput "=== $message ===" "Cyan"
-}
-
-function Write-Success($message) {
-    Write-ColorOutput "✅ $message" "Green"
-}
-
-function Write-Warning($message) {
-    Write-ColorOutput "⚠️  $message" "Yellow"
-}
-
-function Write-Error($message) {
-    Write-ColorOutput "❌ $message" "Red"
-}
-
-function Write-Info($message) {
-    Write-ColorOutput "ℹ️  $message" "Blue"
-}
-
-# Get script directory
-$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
-
-# Main installation function
-function Main {
-    Write-Host ""
-    Write-ColorOutput "╔══════════════════════════════════════╗" "Cyan"
-    Write-ColorOutput "║        FSS-Mini-RAG Installer        ║" "Cyan"
-    Write-ColorOutput "║   Fast Semantic Search for Code      ║" "Cyan" 
-    Write-ColorOutput "╚══════════════════════════════════════╝" "Cyan"
-    Write-Host ""
-    
-    Write-Info "PowerShell installation process:"
-    Write-Host "  • Python environment setup"
-    Write-Host "  • Smart configuration based on your system"
-    Write-Host "  • Optional AI model downloads (with consent)"
-    Write-Host "  • Testing and verification"
-    Write-Host ""
-    Write-ColorOutput "Note: You'll be asked before downloading any models" "Cyan"
-    Write-Host ""
-    
-    $continue = Read-Host "Begin installation? [Y/n]"
-    if ($continue -eq "n" -or $continue -eq "N") {
-        Write-Host "Installation cancelled."
-        exit 0
-    }
-    
-    # Run installation steps
-    Check-Python
-    Create-VirtualEnvironment
-    
-    # Check Ollama availability
-    $ollamaAvailable = Check-Ollama
-    
-    # Get installation preferences
-    Get-InstallationPreferences $ollamaAvailable
-    
-    # Install dependencies
-    Install-Dependencies
-    
-    # Setup models if available
-    if ($ollamaAvailable) {
-        Setup-OllamaModel
-    }
-    
-    # Test installation
-    if (Test-Installation) {
-        Show-Completion
-    } else {
-        Write-Error "Installation test failed"
-        Write-Host "Please check error messages and try again."
-        exit 1
-    }
-}
-
-function Check-Python {
-    Write-Header "Checking Python Installation"
-    
-    # Try different Python commands
-    $pythonCmd = $null
-    $pythonVersion = $null
-    
-    foreach ($cmd in @("python", "python3", "py")) {
-        try {
-            $version = & $cmd --version 2>&1
-            if ($LASTEXITCODE -eq 0) {
-                $pythonCmd = $cmd
-                $pythonVersion = ($version -split " ")[1]
-                break
-            }
-        } catch {
-            continue
-        }
-    }
-    
-    if (-not $pythonCmd) {
-        Write-Error "Python not found!"
-        Write-Host ""
-        Write-ColorOutput "Please install Python 3.8+ from:" "Yellow"
-        Write-Host "  • https://python.org/downloads"
-        Write-Host "  • Make sure to check 'Add Python to PATH' during installation"
-        Write-Host ""
-        Write-ColorOutput "After installing Python, run this script again." "Cyan"
-        exit 1
-    }
-    
-    # Check version
-    $versionParts = $pythonVersion -split "\."
-    $major = [int]$versionParts[0]
-    $minor = [int]$versionParts[1]
-    
-    if ($major -lt 3 -or ($major -eq 3 -and $minor -lt 8)) {
-        Write-Error "Python $pythonVersion found, but 3.8+ required"
-        Write-Host "Please upgrade Python to 3.8 or higher."
-        exit 1
-    }
-    
-    Write-Success "Found Python $pythonVersion ($pythonCmd)"
-    $script:PythonCmd = $pythonCmd
-}
-
-function Create-VirtualEnvironment {
-    Write-Header "Creating Python Virtual Environment"
-    
-    $venvPath = Join-Path $ScriptDir ".venv"
-    
-    if (Test-Path $venvPath) {
-        Write-Info "Virtual environment already exists at $venvPath"
-        $recreate = Read-Host "Recreate it? (y/N)"
-        if ($recreate -eq "y" -or $recreate -eq "Y") {
-            Write-Info "Removing existing virtual environment..."
-            Remove-Item -Recurse -Force $venvPath
-        } else {
-            Write-Success "Using existing virtual environment"
-            return
-        }
-    }
-    
-    Write-Info "Creating virtual environment at $venvPath"
-    try {
-        & $script:PythonCmd -m venv $venvPath
-        if ($LASTEXITCODE -ne 0) {
-            throw "Virtual environment creation failed"
-        }
-        Write-Success "Virtual environment created"
-    } catch {
-        Write-Error "Failed to create virtual environment"
-        Write-Host "This might be because python venv module is not available."
-        Write-Host "Try installing Python from python.org with full installation."
-        exit 1
-    }
-    
-    # Activate virtual environment and upgrade pip
-    $activateScript = Join-Path $venvPath "Scripts\Activate.ps1"
-    if (Test-Path $activateScript) {
-        & $activateScript
-        Write-Success "Virtual environment activated"
-        
-        Write-Info "Upgrading pip..."
-        try {
-            & python -m pip install --upgrade pip --quiet
-        } catch {
-            Write-Warning "Could not upgrade pip, continuing anyway..."
-        }
-    }
-}
-
-function Check-Ollama {
-    Write-Header "Checking Ollama (AI Model Server)"
-    
-    try {
-        $response = Invoke-WebRequest -Uri "http://localhost:11434/api/version" -TimeoutSec 5 -ErrorAction SilentlyContinue
-        if ($response.StatusCode -eq 200) {
-            Write-Success "Ollama server is running"
-            return $true
-        }
-    } catch {
-        # Ollama not running, check if installed
-    }
-    
-    try {
-        & ollama version 2>$null
-        if ($LASTEXITCODE -eq 0) {
-            Write-Warning "Ollama is installed but not running"
-            $startOllama = Read-Host "Start Ollama now? (Y/n)"
-            if ($startOllama -ne "n" -and $startOllama -ne "N") {
-                Write-Info "Starting Ollama server..."
-                Start-Process -FilePath "ollama" -ArgumentList "serve" -WindowStyle Hidden
-                Start-Sleep -Seconds 3
-                
-                try {
-                    $response = Invoke-WebRequest -Uri "http://localhost:11434/api/version" -TimeoutSec 5 -ErrorAction SilentlyContinue
-                    if ($response.StatusCode -eq 200) {
-                        Write-Success "Ollama server started"
-                        return $true
-                    }
-                } catch {
-                    Write-Warning "Failed to start Ollama automatically"
-                    Write-Host "Please start Ollama manually: ollama serve"
-                    return $false
-                }
-            }
-            return $false
-        }
-    } catch {
-        # Ollama not installed
-    }
-    
-    Write-Warning "Ollama not found"
-    Write-Host ""
-    Write-ColorOutput "Ollama provides the best embedding quality and performance." "Cyan"
-    Write-Host ""
-    Write-ColorOutput "Options:" "White"
-    Write-ColorOutput "1) Install Ollama automatically" "Green" -NoNewline
-    Write-Host " (recommended)"
-    Write-ColorOutput "2) Manual installation" "Yellow" -NoNewline
-    Write-Host " - Visit https://ollama.com/download"
-    Write-ColorOutput "3) Continue without Ollama" "Blue" -NoNewline
-    Write-Host " (uses ML fallback)"
-    Write-Host ""
-    
-    $choice = Read-Host "Choose [1/2/3]"
-    
-    switch ($choice) {
-        "1" {
-            Write-Info "Opening Ollama download page..."
-            Start-Process "https://ollama.com/download"
-            Write-Host ""
-            Write-ColorOutput "Please:" "Yellow"
-            Write-Host "  1. Download and install Ollama from the opened page"
-            Write-Host "  2. Run 'ollama serve' in a new terminal"
-            Write-Host "  3. Re-run this installer"
-            Write-Host ""
-            Read-Host "Press Enter to exit"
-            exit 0
-        }
-        "2" {
-            Write-Host ""
-            Write-ColorOutput "Manual Ollama installation:" "Yellow"
-            Write-Host "  1. Visit: https://ollama.com/download"
-            Write-Host "  2. Download and install for Windows"
-            Write-Host "  3. Run: ollama serve"
-            Write-Host "  4. Re-run this installer"
-            Read-Host "Press Enter to exit"
-            exit 0
-        }
-        "3" {
-            Write-Info "Continuing without Ollama (will use ML fallback)"
-            return $false
-        }
-        default {
-            Write-Warning "Invalid choice, continuing without Ollama"
-            return $false
-        }
-    }
-}
-
-function Get-InstallationPreferences($ollamaAvailable) {
-    Write-Header "Installation Configuration"
-    
-    Write-ColorOutput "FSS-Mini-RAG can run with different embedding backends:" "Cyan"
-    Write-Host ""
-    Write-ColorOutput "• Ollama" "Green" -NoNewline
-    Write-Host " (recommended) - Best quality, local AI server"
-    Write-ColorOutput "• ML Fallback" "Yellow" -NoNewline
-    Write-Host " - Offline transformers, larger but always works"
-    Write-ColorOutput "• Hash-based" "Blue" -NoNewline
-    Write-Host " - Lightweight fallback, basic similarity"
-    Write-Host ""
-    
-    if ($ollamaAvailable) {
-        $recommended = "light (Ollama detected)"
-        Write-ColorOutput "✓ Ollama detected - light installation recommended" "Green"
-    } else {
-        $recommended = "full (no Ollama)"
-        Write-ColorOutput "⚠ No Ollama - full installation recommended for better quality" "Yellow"
-    }
-    
-    Write-Host ""
-    Write-ColorOutput "Installation options:" "White"
-    Write-ColorOutput "L) Light" "Green" -NoNewline
-    Write-Host " - Ollama + basic deps (~50MB) " -NoNewline
-    Write-ColorOutput "← Best performance + AI chat" "Cyan"
-    Write-ColorOutput "F) Full" "Yellow" -NoNewline
-    Write-Host "  - Light + ML fallback (~2-3GB) " -NoNewline
-    Write-ColorOutput "← Works without Ollama" "Cyan"
-    Write-Host ""
-    
-    $choice = Read-Host "Choose [L/F] or Enter for recommended ($recommended)"
-    
-    if ($choice -eq "") {
-        if ($ollamaAvailable) {
-            $choice = "L"
-        } else {
-            $choice = "F"
-        }
-    }
-    
-    switch ($choice.ToUpper()) {
-        "L" {
-            $script:InstallType = "light"
-            Write-ColorOutput "Selected: Light installation" "Green"
-        }
-        "F" {
-            $script:InstallType = "full"
-            Write-ColorOutput "Selected: Full installation" "Yellow"
-        }
-        default {
-            Write-Warning "Invalid choice, using light installation"
-            $script:InstallType = "light"
-        }
-    }
-}
-
-function Install-Dependencies {
-    Write-Header "Installing Python Dependencies"
-    
-    if ($script:InstallType -eq "light") {
-        Write-Info "Installing core dependencies (~50MB)..."
-        Write-ColorOutput "  Installing: lancedb, pandas, numpy, PyYAML, etc." "Blue"
-        
-        try {
-            & pip install -r (Join-Path $ScriptDir "requirements.txt") --quiet
-            if ($LASTEXITCODE -ne 0) {
-                throw "Dependency installation failed"
-            }
-            Write-Success "Dependencies installed"
-        } catch {
-            Write-Error "Failed to install dependencies"
-            Write-Host "Try: pip install -r requirements.txt"
-            exit 1
-        }
-    } else {
-        Write-Info "Installing full dependencies (~2-3GB)..."
-        Write-ColorOutput "This includes PyTorch and transformers - will take several minutes" "Yellow"
-        
-        try {
-            & pip install -r (Join-Path $ScriptDir "requirements-full.txt")
-            if ($LASTEXITCODE -ne 0) {
-                throw "Dependency installation failed"
-            }
-            Write-Success "All dependencies installed"
-        } catch {
-            Write-Error "Failed to install dependencies"
-            Write-Host "Try: pip install -r requirements-full.txt"
-            exit 1
-        }
-    }
-    
-    Write-Info "Verifying installation..."
-    try {
-        & python -c "import lancedb, pandas, numpy" 2>$null
-        if ($LASTEXITCODE -ne 0) {
-            throw "Package verification failed"
-        }
-        Write-Success "Core packages verified"
-    } catch {
-        Write-Error "Package verification failed"
-        exit 1
-    }
-}
-
-function Setup-OllamaModel {
-    # Implementation similar to bash version but adapted for PowerShell
-    Write-Header "Ollama Model Setup"
-    # For brevity, implementing basic version
-    Write-Info "Ollama model setup available - see bash version for full implementation"
-}
-
-function Test-Installation {
-    Write-Header "Testing Installation"
-    
-    Write-Info "Testing basic functionality..."
-    
-    try {
-        & python -c "from mini_rag import CodeEmbedder, ProjectIndexer, CodeSearcher; print('✅ Import successful')" 2>$null
-        if ($LASTEXITCODE -ne 0) {
-            throw "Import test failed"
-        }
-        Write-Success "Python imports working"
-        return $true
-    } catch {
-        Write-Error "Import test failed"
-        return $false
-    }
-}
-
-function Show-Completion {
-    Write-Header "Installation Complete!"
-    
-    Write-ColorOutput "FSS-Mini-RAG is now installed!" "Green"
-    Write-Host ""
-    Write-ColorOutput "Quick Start Options:" "Cyan"
-    Write-Host ""
-    Write-ColorOutput "🎯 TUI (Beginner-Friendly):" "Green"
-    Write-Host "     rag-tui.bat"
-    Write-Host "     # Interactive interface with guided setup"
-    Write-Host ""
-    Write-ColorOutput "💻 CLI (Advanced):" "Blue"
-    Write-Host "     rag-mini.bat index C:\path\to\project"
-    Write-Host "     rag-mini.bat search C:\path\to\project `"query`""
-    Write-Host "     rag-mini.bat status C:\path\to\project"
-    Write-Host ""
-    Write-ColorOutput "Documentation:" "Cyan"
-    Write-Host "  • README.md - Complete technical documentation"
-    Write-Host "  • docs\GETTING_STARTED.md - Step-by-step guide"
-    Write-Host "  • examples\ - Usage examples and sample configs"
-    Write-Host ""
-    
-    $runTest = Read-Host "Run quick test now? [Y/n]"
-    if ($runTest -ne "n" -and $runTest -ne "N") {
-        Run-QuickTest
-    }
-    
-    Write-Host ""
-    Write-ColorOutput "🎉 Setup complete! FSS-Mini-RAG is ready to use." "Green"
-}
-
-function Run-QuickTest {
-    Write-Header "Quick Test"
-    
-    Write-Info "Testing with FSS-Mini-RAG codebase..."
-    
-    $ragDir = Join-Path $ScriptDir ".mini-rag"
-    if (Test-Path $ragDir) {
-        Write-Success "Project already indexed, running search..."
-    } else {
-        Write-Info "Indexing FSS-Mini-RAG system for demo..."
-        & python (Join-Path $ScriptDir "rag-mini.py") index $ScriptDir
-        if ($LASTEXITCODE -ne 0) {
-            Write-Error "Test indexing failed"
-            return
-        }
-    }
-    
-    Write-Host ""
-    Write-Success "Running demo search: 'embedding system'"
-    & python (Join-Path $ScriptDir "rag-mini.py") search $ScriptDir "embedding system" --top-k 3
-    
-    Write-Host ""
-    Write-Success "Test completed successfully!"
-    Write-ColorOutput "FSS-Mini-RAG is working perfectly on Windows!" "Cyan"
-}
-
-# Run main function
-Main
--- a/install_mini_rag.sh
+++ b/install_mini_rag.sh
@ -462,73 +462,6 @@ install_dependencies() {
    fi
 }

-# Setup application icon for desktop integration
-setup_desktop_icon() {
-    print_header "Setting Up Desktop Integration"
-    
-    # Check if we're in a GUI environment
-    if [ -z "$DISPLAY" ] && [ -z "$WAYLAND_DISPLAY" ]; then
-        print_info "No GUI environment detected - skipping desktop integration"
-        return 0
-    fi
-    
-    local icon_source="$SCRIPT_DIR/assets/Fss_Mini_Rag.png"
-    local desktop_dir="$HOME/.local/share/applications"
-    local icon_dir="$HOME/.local/share/icons"
-    
-    # Check if icon file exists
-    if [ ! -f "$icon_source" ]; then
-        print_warning "Icon file not found at $icon_source"
-        return 1
-    fi
-    
-    # Create directories if needed
-    mkdir -p "$desktop_dir" "$icon_dir" 2>/dev/null
-    
-    # Copy icon to standard location
-    local icon_dest="$icon_dir/fss-mini-rag.png"
-    if cp "$icon_source" "$icon_dest" 2>/dev/null; then
-        print_success "Icon installed to $icon_dest"
-    else
-        print_warning "Could not install icon (permissions?)"
-        return 1
-    fi
-    
-    # Create desktop entry
-    local desktop_file="$desktop_dir/fss-mini-rag.desktop"
-    cat > "$desktop_file" << EOF
-[Desktop Entry]
-Name=FSS-Mini-RAG
-Comment=Fast Semantic Search for Code and Documents
-Exec=$SCRIPT_DIR/rag-tui
-Icon=fss-mini-rag
-Terminal=true
-Type=Application
-Categories=Development;Utility;TextEditor;
-Keywords=search;code;rag;semantic;ai;
-StartupNotify=true
-EOF
-    
-    if [ -f "$desktop_file" ]; then
-        chmod +x "$desktop_file"
-        print_success "Desktop entry created"
-        
-        # Update desktop database if available
-        if command_exists update-desktop-database; then
-            update-desktop-database "$desktop_dir" 2>/dev/null
-            print_info "Desktop database updated"
-        fi
-        
-        print_info "✨ FSS-Mini-RAG should now appear in your application menu!"
-        print_info "   Look for it in Development or Utility categories"
-    else
-        print_warning "Could not create desktop entry"
-        return 1
-    fi
-    
-    return 0
-}
-
 # Setup ML models based on configuration  
 setup_ml_models() {
    if [ "$INSTALL_TYPE" != "full" ]; then
@ -772,7 +705,7 @@ run_quick_test() {
        read -r
        
        # Launch the TUI which has the existing interactive tutorial system
-        ./rag-tui.py "$target_dir" || true
+        ./rag-tui.py "$target_dir"
        
        echo ""
        print_success "🎉 Tutorial completed!"
@ -861,9 +794,6 @@ main() {
    fi
    setup_ml_models
    
-    # Setup desktop integration with icon
-    setup_desktop_icon
-    
    if test_installation; then
        show_completion
    else
--- a/install_windows.bat
+++ b/install_windows.bat
@ -1,343 +0,0 @@
-@echo off
-REM FSS-Mini-RAG Windows Installer - Beautiful & Comprehensive
-setlocal enabledelayedexpansion
-
-REM Enable colors and unicode for modern Windows
-chcp 65001 >nul 2>&1
-
-echo.
-echo ╔══════════════════════════════════════════════════╗
-echo ║            FSS-Mini-RAG Windows Installer       ║
-echo ║         Fast Semantic Search for Code           ║
-echo ╚══════════════════════════════════════════════════╝
-echo.
-echo 🚀 Comprehensive installation process:
-echo   • Python environment setup and validation
-echo   • Smart dependency management 
-echo   • Optional AI model downloads (with your consent)
-echo   • System testing and verification
-echo   • Interactive tutorial (optional)
-echo.
-echo 💡 Note: You'll be asked before downloading any models
-echo.
-
-set /p "continue=Begin installation? [Y/n]: "
-if /i "!continue!"=="n" (
-    echo Installation cancelled.
-    pause
-    exit /b 0
-)
-
-REM Get script directory
-set "SCRIPT_DIR=%~dp0"
-set "SCRIPT_DIR=%SCRIPT_DIR:~0,-1%"
-
-echo.
-echo ══════════════════════════════════════════════════
-echo [1/5] Checking Python Environment...
-python --version >nul 2>&1
-if errorlevel 1 (
-    echo ❌ ERROR: Python not found!
-    echo.
-    echo 📦 Please install Python from: https://python.org/downloads
-    echo 🔧 Installation requirements:
-    echo    • Python 3.8 or higher
-    echo    • Make sure to check "Add Python to PATH" during installation
-    echo    • Restart your command prompt after installation
-    echo.
-    echo 💡 Quick install options:
-    echo    • Download from python.org (recommended)
-    echo    • Or use: winget install Python.Python.3.11
-    echo    • Or use: choco install python311
-    echo.
-    pause
-    exit /b 1
-)
-
-for /f "tokens=2" %%i in ('python --version 2^>^&1') do set "PYTHON_VERSION=%%i"
-echo ✅ Found Python !PYTHON_VERSION!
-
-REM Check Python version (basic check for 3.x)
-for /f "tokens=1 delims=." %%a in ("!PYTHON_VERSION!") do set "MAJOR_VERSION=%%a"
-if !MAJOR_VERSION! LSS 3 (
-    echo ❌ ERROR: Python !PYTHON_VERSION! found, but Python 3.8+ required
-    echo 📦 Please upgrade Python to 3.8 or higher
-    pause
-    exit /b 1
-)
-
-echo.
-echo ══════════════════════════════════════════════════
-echo [2/5] Creating Python Virtual Environment...
-if exist "%SCRIPT_DIR%\.venv" (
-    echo 🔄 Removing old virtual environment...
-    rmdir /s /q "%SCRIPT_DIR%\.venv" 2>nul
-    if exist "%SCRIPT_DIR%\.venv" (
-        echo ⚠️ Could not remove old environment, creating anyway...
-    )
-)
-
-echo 📁 Creating fresh virtual environment...
-python -m venv "%SCRIPT_DIR%\.venv"
-if errorlevel 1 (
-    echo ❌ ERROR: Failed to create virtual environment
-    echo.
-    echo 🔧 This might be because:
-    echo    • Python venv module is not installed
-    echo    • Insufficient permissions
-    echo    • Path contains special characters
-    echo.
-    echo 💡 Try: python -m pip install --user virtualenv
-    pause
-    exit /b 1
-)
-echo ✅ Virtual environment created successfully
-
-echo.
-echo ══════════════════════════════════════════════════
-echo [3/5] Installing Python Dependencies...
-echo 📦 This may take 2-3 minutes depending on your internet speed...
-echo.
-
-call "%SCRIPT_DIR%\.venv\Scripts\activate.bat"
-if errorlevel 1 (
-    echo ❌ ERROR: Could not activate virtual environment
-    pause
-    exit /b 1
-)
-
-echo 🔧 Upgrading pip...
-"%SCRIPT_DIR%\.venv\Scripts\python.exe" -m pip install --upgrade pip --quiet
-if errorlevel 1 (
-    echo ⚠️ Warning: Could not upgrade pip, continuing anyway...
-)
-
-echo 📚 Installing core dependencies (lancedb, pandas, numpy, etc.)...
-echo    This provides semantic search capabilities
-"%SCRIPT_DIR%\.venv\Scripts\pip.exe" install -r "%SCRIPT_DIR%\requirements.txt"
-if errorlevel 1 (
-    echo ❌ ERROR: Failed to install dependencies
-    echo.
-    echo 🔧 Possible solutions:
-    echo    • Check internet connection
-    echo    • Try running as administrator
-    echo    • Check if antivirus is blocking pip
-    echo    • Manually run: pip install -r requirements.txt
-    echo.
-    pause
-    exit /b 1
-)
-echo ✅ Dependencies installed successfully
-
-echo.
-echo ══════════════════════════════════════════════════
-echo [4/5] Testing Installation...
-echo 🧪 Verifying Python imports...
-"%SCRIPT_DIR%\.venv\Scripts\python.exe" -c "from mini_rag import CodeEmbedder, ProjectIndexer, CodeSearcher; print('✅ Core imports successful')" 2>nul
-if errorlevel 1 (
-    echo ❌ ERROR: Installation test failed
-    echo.
-    echo 🔧 This usually means:
-    echo    • Dependencies didn't install correctly
-    echo    • Virtual environment is corrupted
-    echo    • Python path issues
-    echo.
-    echo 💡 Try running: pip install -r requirements.txt
-    pause
-    exit /b 1
-)
-
-echo 🔍 Testing embedding system...
-"%SCRIPT_DIR%\.venv\Scripts\python.exe" -c "from mini_rag import CodeEmbedder; embedder = CodeEmbedder(); info = embedder.get_embedding_info(); print(f'✅ Embedding method: {info[\"method\"]}')" 2>nul
-if errorlevel 1 (
-    echo ⚠️ Warning: Embedding test inconclusive, but core system is ready
-)
-
-echo.
-echo ══════════════════════════════════════════════════
-echo [5/6] Setting Up Desktop Integration...
-call :setup_windows_icon
-
-echo.
-echo ══════════════════════════════════════════════════
-echo [6/6] Checking AI Features (Optional)...
-call :check_ollama_enhanced
-
-echo.
-echo ╔══════════════════════════════════════════════════╗
-echo ║             INSTALLATION SUCCESSFUL!            ║
-echo ╚══════════════════════════════════════════════════╝
-echo.
-echo 🎯 Quick Start Options:
-echo.
-echo 🎨 For Beginners (Recommended):
-echo    rag.bat                 - Interactive interface with guided setup
-echo.
-echo 💻 For Developers:
-echo    rag.bat index C:\myproject      - Index a project
-echo    rag.bat search C:\myproject "authentication"  - Search project  
-echo    rag.bat help            - Show all commands
-echo.
-
-REM Offer interactive tutorial
-echo 🧪 Quick Test Available:
-echo    Test FSS-Mini-RAG with a small sample project (takes ~30 seconds)
-echo.
-set /p "run_test=Run interactive tutorial now? [Y/n]: "
-if /i "!run_test!" NEQ "n" (
-    call :run_tutorial
-) else (
-    echo 📚 You can run the tutorial anytime with: rag.bat
-)
-
-echo.
-echo 🎉 Setup complete! FSS-Mini-RAG is ready to use.
-echo 💡 Pro tip: Try indexing any folder with text files - code, docs, notes!
-echo.
-pause
-exit /b 0
-
-:check_ollama_enhanced
-echo 🤖 Checking for AI capabilities...
-echo.
-
-REM Check if Ollama is installed
-where ollama >nul 2>&1
-if errorlevel 1 (
-    echo ⚠️ Ollama not installed - using basic search mode
-    echo.
-    echo 🎯 For Enhanced AI Features:
-    echo    • 📥 Install Ollama: https://ollama.com/download
-    echo    • 🔄 Run: ollama serve  
-    echo    • 🧠 Download model: ollama pull qwen3:1.7b
-    echo.
-    echo 💡 Benefits of AI features:
-    echo    • Smart query expansion for better search results
-    echo    • Interactive exploration mode with conversation memory
-    echo    • AI-powered synthesis of search results  
-    echo    • Natural language understanding of your questions
-    echo.
-    goto :eof
-)
-
-REM Check if Ollama server is running
-curl -s http://localhost:11434/api/version >nul 2>&1
-if errorlevel 1 (
-    echo 🟡 Ollama installed but not running
-    echo.
-    set /p "start_ollama=Start Ollama server now? [Y/n]: "
-    if /i "!start_ollama!" NEQ "n" (
-        echo 🚀 Starting Ollama server...
-        start /b ollama serve
-        timeout /t 3 /nobreak >nul
-        curl -s http://localhost:11434/api/version >nul 2>&1
-        if errorlevel 1 (
-            echo ⚠️ Could not start Ollama automatically
-            echo 💡 Please run: ollama serve
-        ) else (
-            echo ✅ Ollama server started successfully!
-        )
-    )
-) else (
-    echo ✅ Ollama server is running!
-)
-
-REM Check for available models
-echo 🔍 Checking for AI models...
-ollama list 2>nul | findstr /v "NAME" | findstr /v "^$" >nul
-if errorlevel 1 (
-    echo 📦 No AI models found
-    echo.
-    echo 🧠 Recommended Models (choose one):
-    echo    • qwen3:1.7b    - Excellent for RAG (1.4GB, recommended)
-    echo    • qwen3:0.6b    - Lightweight and fast (~500MB)  
-    echo    • qwen3:4b      - Higher quality but slower (~2.5GB)
-    echo.
-    set /p "install_model=Download qwen3:1.7b model now? [Y/n]: "
-    if /i "!install_model!" NEQ "n" (
-        echo 📥 Downloading qwen3:1.7b model...
-        echo    This may take 5-10 minutes depending on your internet speed
-        ollama pull qwen3:1.7b
-        if errorlevel 1 (
-            echo ⚠️ Download failed - you can try again later with: ollama pull qwen3:1.7b
-        ) else (
-            echo ✅ Model downloaded successfully! AI features are now available.
-        )
-    )
-) else (
-    echo ✅ AI models found - full AI features available!
-    echo 🎉 Your system supports query expansion, exploration mode, and synthesis!
-)
-goto :eof
-
-:run_tutorial
-echo.
-echo ═══════════════════════════════════════════════════
-echo 🧪 Running Interactive Tutorial
-echo ═══════════════════════════════════════════════════
-echo.
-echo 📚 This tutorial will:
-echo    • Index the FSS-Mini-RAG documentation
-echo    • Show you how to search effectively
-echo    • Demonstrate AI features (if available)
-echo.
-
-call "%SCRIPT_DIR%\.venv\Scripts\activate.bat"
-
-echo 📁 Indexing project for demonstration...
-"%SCRIPT_DIR%\.venv\Scripts\python.exe" rag-mini.py index "%SCRIPT_DIR%" >nul 2>&1
-if errorlevel 1 (
-    echo ❌ Indexing failed - please check the installation
-    goto :eof
-)
-
-echo ✅ Indexing complete! 
-echo.
-echo 🔍 Example search: "embedding"
-"%SCRIPT_DIR%\.venv\Scripts\python.exe" rag-mini.py search "%SCRIPT_DIR%" "embedding" --top-k 3
-echo.
-echo 🎯 Try the interactive interface:
-echo    rag.bat
-echo.
-echo 💡 You can now search any project by indexing it first!
-goto :eof
-
-:setup_windows_icon
-echo 🎨 Setting up application icon and shortcuts...
-
-REM Check if icon exists
-if not exist "%SCRIPT_DIR%\assets\Fss_Mini_Rag.png" (
-    echo ⚠️ Icon file not found - skipping desktop integration
-    goto :eof
-)
-
-REM Create desktop shortcut
-echo 📱 Creating desktop shortcut...
-set "desktop=%USERPROFILE%\Desktop"
-set "shortcut=%desktop%\FSS-Mini-RAG.lnk"
-
-REM Use PowerShell to create shortcut with icon
-powershell -Command "& {$WshShell = New-Object -comObject WScript.Shell; $Shortcut = $WshShell.CreateShortcut('%shortcut%'); $Shortcut.TargetPath = '%SCRIPT_DIR%\rag.bat'; $Shortcut.WorkingDirectory = '%SCRIPT_DIR%'; $Shortcut.Description = 'FSS-Mini-RAG - Fast Semantic Search'; $Shortcut.Save()}" >nul 2>&1
-
-if exist "%shortcut%" (
-    echo ✅ Desktop shortcut created
-) else (
-    echo ⚠️ Could not create desktop shortcut
-)
-
-REM Create Start Menu shortcut
-echo 📂 Creating Start Menu entry...
-set "startmenu=%APPDATA%\Microsoft\Windows\Start Menu\Programs"
-set "startshortcut=%startmenu%\FSS-Mini-RAG.lnk"
-
-powershell -Command "& {$WshShell = New-Object -comObject WScript.Shell; $Shortcut = $WshShell.CreateShortcut('%startshortcut%'); $Shortcut.TargetPath = '%SCRIPT_DIR%\rag.bat'; $Shortcut.WorkingDirectory = '%SCRIPT_DIR%'; $Shortcut.Description = 'FSS-Mini-RAG - Fast Semantic Search'; $Shortcut.Save()}" >nul 2>&1
-
-if exist "%startshortcut%" (
-    echo ✅ Start Menu entry created
-) else (
-    echo ⚠️ Could not create Start Menu entry
-)
-
-echo 💡 FSS-Mini-RAG shortcuts have been created on your Desktop and Start Menu
-echo    You can now launch the application from either location
-goto :eof
--- a/mini_rag/config.py
+++ b/mini_rag/config.py
@ -81,10 +81,6 @@ class LLMConfig:
    enable_thinking: bool = True  # Enable thinking mode for Qwen3 models
    cpu_optimized: bool = True     # Prefer lightweight models
    
-    # Context window configuration (critical for RAG performance)
-    context_window: int = 16384    # Context window size in tokens (16K recommended)
-    auto_context: bool = True      # Auto-adjust context based on model capabilities
-    
    # Model preference rankings (configurable)
    model_rankings: list = None    # Will be set in __post_init__
    
@ -108,9 +104,9 @@ class LLMConfig:
                # Recommended model (excellent quality but larger)
                "qwen3:4b",
                
-                # Common fallbacks (prioritize Qwen models)  
+                # Common fallbacks (only include models we know exist)
+                "llama3.2:1b",
                "qwen2.5:1.5b",
-                "qwen2.5:3b",
            ]


@ -259,11 +255,6 @@ class ConfigManager:
            f"  max_expansion_terms: {config_dict['llm']['max_expansion_terms']}        # Maximum terms to add to queries",
            f"  enable_synthesis: {str(config_dict['llm']['enable_synthesis']).lower()}       # Enable synthesis by default",
            f"  synthesis_temperature: {config_dict['llm']['synthesis_temperature']}      # LLM temperature for analysis",
-            "",
-            "  # Context window configuration (critical for RAG performance)",
-            f"  context_window: {config_dict['llm']['context_window']}           # Context size in tokens (8K=fast, 16K=balanced, 32K=advanced)",
-            f"  auto_context: {str(config_dict['llm']['auto_context']).lower()}            # Auto-adjust context based on model capabilities",
-            "",
            "  model_rankings:          # Preferred model order (edit to change priority)",
        ])
        
--- a/mini_rag/explorer.py
+++ b/mini_rag/explorer.py
@ -115,13 +115,12 @@ class CodeExplorer:
        # Add to conversation history
        self.current_session.add_exchange(question, results, synthesis)
        
-        # Streaming already displayed the response
-        # Just return minimal status for caller
-        session_duration = time.time() - self.current_session.started_at
-        exchange_count = len(self.current_session.conversation_history)
+        # Format response with exploration context
+        response = self._format_exploration_response(
+            question, synthesis, len(results), search_time, synthesis_time
+        )
        
-        status = f"\n📊 Session: {session_duration/60:.1f}m | Question #{exchange_count} | Results: {len(results)} | Time: {search_time+synthesis_time:.1f}s"
-        return status
+        return response
    
    def _build_contextual_prompt(self, question: str, results: List[Any]) -> str:
        """Build a prompt that includes conversation context."""
@ -186,22 +185,33 @@ CURRENT QUESTION: "{question}"
 RELEVANT INFORMATION FOUND:
 {results_text}

-Please provide a helpful, natural explanation that answers their question. Write as if you're having a friendly conversation with a colleague who's exploring this project.
+Please provide a helpful analysis in JSON format:

-Structure your response to include:
-1. A clear explanation of what you found and how it answers their question
-2. The most important insights from the information you discovered  
-3. Relevant examples or code patterns when helpful
-4. Practical next steps they could take
+{{
+    "summary": "Clear explanation of what you found and how it answers their question",
+    "key_points": [
+        "Most important insight from the information",
+        "Secondary important point or relationship", 
+        "Third key point or practical consideration"
+    ],
+    "code_examples": [
+        "Relevant example or pattern from the information",
+        "Another useful example or demonstration"
+    ],
+    "suggested_actions": [
+        "Specific next step they could take",
+        "Additional exploration or investigation suggestion",
+        "Practical way to apply this information"
+    ],
+    "confidence": 0.85
+}}

 Guidelines:
- Write in a conversational, friendly tone
- Be educational but not condescending
+- Be educational and break things down clearly
 - Reference specific files and information when helpful
 - Give practical, actionable suggestions
- Connect everything back to their original question
- Use natural language, not structured formats
- Break complex topics into understandable pieces
+- Keep explanations beginner-friendly but not condescending
+- Connect information to their question directly
 """
        
        return prompt
@ -209,12 +219,16 @@ Guidelines:
    def _synthesize_with_context(self, prompt: str, results: List[Any]) -> SynthesisResult:
        """Synthesize results with full context and thinking."""
        try:
-            # Use streaming with thinking visible (don't collapse)
-            response = self.synthesizer._call_ollama(prompt, temperature=0.2, disable_thinking=False, use_streaming=True, collapse_thinking=False)
+            # TEMPORARILY: Use simple non-streaming call to avoid flow issues
+            # TODO: Re-enable streaming once flow is stable
+            response = self.synthesizer._call_ollama(prompt, temperature=0.2, disable_thinking=False)
            thinking_stream = ""
            
-            # Streaming already shows thinking and response
-            # No need for additional indicators
+            # Display simple thinking indicator
+            if response and len(response) > 200:
+                print("\n💭 Analysis in progress...")
+            
+            # Don't display thinking stream again - keeping it simple for now
            
            if not response:
                return SynthesisResult(
@ -225,14 +239,40 @@ Guidelines:
                    confidence=0.0
                )
            
-            # Use natural language response directly
-            return SynthesisResult(
-                summary=response.strip(),
-                key_points=[],  # Not used with natural language responses
-                code_examples=[],  # Not used with natural language responses
-                suggested_actions=[],  # Not used with natural language responses
-                confidence=0.85  # High confidence for natural responses
-            )
+            # Parse the structured response
+            try:
+                # Extract JSON from response
+                start_idx = response.find('{')
+                end_idx = response.rfind('}') + 1
+                if start_idx >= 0 and end_idx > start_idx:
+                    json_str = response[start_idx:end_idx]
+                    data = json.loads(json_str)
+                    
+                    return SynthesisResult(
+                        summary=data.get('summary', 'Analysis completed'),
+                        key_points=data.get('key_points', []),
+                        code_examples=data.get('code_examples', []),
+                        suggested_actions=data.get('suggested_actions', []),
+                        confidence=float(data.get('confidence', 0.7))
+                    )
+                else:
+                    # Fallback: use raw response as summary
+                    return SynthesisResult(
+                        summary=response[:400] + '...' if len(response) > 400 else response,
+                        key_points=[],
+                        code_examples=[],
+                        suggested_actions=[],
+                        confidence=0.5
+                    )
+                    
+            except json.JSONDecodeError:
+                return SynthesisResult(
+                    summary="Analysis completed but format parsing failed",
+                    key_points=[],
+                    code_examples=[],
+                    suggested_actions=["Try rephrasing your question"],
+                    confidence=0.3
+                )
                
        except Exception as e:
            logger.error(f"Context synthesis failed: {e}")
@ -260,12 +300,29 @@ Guidelines:
        output.append("=" * 60)
        output.append("")
        
-        # Response was already displayed via streaming
-        # Just show completion status
-        output.append("✅ Analysis complete")
-        output.append("")
+        # Main analysis
+        output.append(f"📝 Analysis:")
+        output.append(f"   {synthesis.summary}")
        output.append("")
        
+        if synthesis.key_points:
+            output.append("🔍 Key Insights:")
+            for point in synthesis.key_points:
+                output.append(f"   • {point}")
+            output.append("")
+        
+        if synthesis.code_examples:
+            output.append("💡 Code Examples:")
+            for example in synthesis.code_examples:
+                output.append(f"   {example}")
+            output.append("")
+        
+        if synthesis.suggested_actions:
+            output.append("🎯 Next Steps:")
+            for action in synthesis.suggested_actions:
+                output.append(f"   • {action}")
+            output.append("")
+        
        # Confidence and context indicator
        confidence_emoji = "🟢" if synthesis.confidence > 0.7 else "🟡" if synthesis.confidence > 0.4 else "🔴"
        context_indicator = f" | Context: {exchange_count-1} previous questions" if exchange_count > 1 else ""
@ -408,7 +465,7 @@ Guidelines:
                    "temperature": temperature,
                    "top_p": optimal_params.get("top_p", 0.9),
                    "top_k": optimal_params.get("top_k", 40),
-                    "num_ctx": self.synthesizer._get_optimal_context_size(model_to_use),
+                    "num_ctx": optimal_params.get("num_ctx", 32768),
                    "num_predict": optimal_params.get("num_predict", 2000),
                    "repeat_penalty": optimal_params.get("repeat_penalty", 1.1),
                    "presence_penalty": optimal_params.get("presence_penalty", 1.0)
--- a/mini_rag/llm_safeguards.py
+++ b/mini_rag/llm_safeguards.py
@ -195,7 +195,7 @@ class ModelRunawayDetector:
 • Try a more specific question
 • Break complex questions into smaller parts
 • Use exploration mode which handles context better: `rag-mini explore`
-• Consider: A larger model (qwen3:1.7b or qwen3:4b) would help"""
+• Consider: A larger model (qwen3:1.7b or qwen3:3b) would help"""

    def _explain_thinking_loop(self) -> str:
        return """🧠 The AI got caught in a "thinking loop" - overthinking the response.
@ -266,7 +266,7 @@ class ModelRunawayDetector:
        
        # Universal suggestions
        suggestions.extend([
-            "Consider using a larger model if available (qwen3:1.7b or qwen3:4b)",
+            "Consider using a larger model if available (qwen3:1.7b or qwen3:3b)",
            "Check model status: `ollama list`"
        ])
        
--- a/mini_rag/llm_synthesizer.py
+++ b/mini_rag/llm_synthesizer.py
@ -72,8 +72,8 @@ class LLMSynthesizer:
        else:
            # Fallback rankings if no config
            model_rankings = [
-                "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "qwen2.5:3b", 
-                "qwen2.5:1.5b", "qwen2.5-coder:1.5b"
+                "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "llama3.2:1b", 
+                "qwen2.5:1.5b", "qwen3:3b", "qwen2.5-coder:1.5b"
            ]
        
        # Find first available model from our ranked list (exact matches first)
@ -114,57 +114,12 @@ class LLMSynthesizer:
                
        self._initialized = True
    
-    def _get_optimal_context_size(self, model_name: str) -> int:
-        """Get optimal context size based on model capabilities and configuration."""
-        # Get configured context window
-        if self.config and hasattr(self.config, 'llm'):
-            configured_context = self.config.llm.context_window
-            auto_context = getattr(self.config.llm, 'auto_context', True)
-        else:
-            configured_context = 16384  # Default to 16K
-            auto_context = True
-        
-        # Model-specific maximum context windows (based on research)
-        model_limits = {
-            # Qwen3 models with native context support
-            'qwen3:0.6b': 32768,    # 32K native
-            'qwen3:1.7b': 32768,    # 32K native  
-            'qwen3:4b': 131072,     # 131K with YaRN extension
-            
-            # Qwen2.5 models
-            'qwen2.5:1.5b': 32768,  # 32K native
-            'qwen2.5:3b': 32768,    # 32K native
-            'qwen2.5-coder:1.5b': 32768,  # 32K native
-            
-            # Fallback for unknown models
-            'default': 8192
-        }
-        
-        # Find model limit (check for partial matches)
-        model_limit = model_limits.get('default', 8192)
-        for model_pattern, limit in model_limits.items():
-            if model_pattern != 'default' and model_pattern.lower() in model_name.lower():
-                model_limit = limit
-                break
-        
-        # If auto_context is enabled, respect model limits
-        if auto_context:
-            optimal_context = min(configured_context, model_limit)
-        else:
-            optimal_context = configured_context
-        
-        # Ensure minimum usable context for RAG
-        optimal_context = max(optimal_context, 4096)  # Minimum 4K for basic RAG
-        
-        logger.debug(f"Context for {model_name}: {optimal_context} tokens (configured: {configured_context}, limit: {model_limit})")
-        return optimal_context
-    
    def is_available(self) -> bool:
        """Check if Ollama is available and has models."""
        self._ensure_initialized()
        return len(self.available_models) > 0
    
-    def _call_ollama(self, prompt: str, temperature: float = 0.3, disable_thinking: bool = False, use_streaming: bool = True, collapse_thinking: bool = True) -> Optional[str]:
+    def _call_ollama(self, prompt: str, temperature: float = 0.3, disable_thinking: bool = False, use_streaming: bool = False) -> Optional[str]:
        """Make a call to Ollama API with safeguards."""
        start_time = time.time()
        
@ -219,16 +174,16 @@ class LLMSynthesizer:
                    "temperature": qwen3_temp,
                    "top_p": qwen3_top_p,
                    "top_k": qwen3_top_k,
-                    "num_ctx": self._get_optimal_context_size(model_to_use),  # Dynamic context based on model and config
+                    "num_ctx": 32000,  # Critical: Qwen3 context length (32K token limit)
                    "num_predict": optimal_params.get("num_predict", 2000),
                    "repeat_penalty": optimal_params.get("repeat_penalty", 1.1),
                    "presence_penalty": qwen3_presence
                }
            }
            
-            # Handle streaming with thinking display
+            # Handle streaming with early stopping
            if use_streaming:
-                return self._handle_streaming_with_thinking_display(payload, model_to_use, use_thinking, start_time, collapse_thinking)
+                return self._handle_streaming_with_early_stop(payload, model_to_use, use_thinking, start_time)
            
            response = requests.post(
                f"{self.ollama_url}/api/generate",
@ -329,130 +284,6 @@ This is normal with smaller AI models and helps ensure you get quality responses

 This is normal with smaller AI models and helps ensure you get quality responses."""

-    def _handle_streaming_with_thinking_display(self, payload: dict, model_name: str, use_thinking: bool, start_time: float, collapse_thinking: bool = True) -> Optional[str]:
-        """Handle streaming response with real-time thinking token display."""
-        import json
-        import sys
-        
-        try:
-            response = requests.post(
-                f"{self.ollama_url}/api/generate",
-                json=payload,
-                stream=True,
-                timeout=65
-            )
-            
-            if response.status_code != 200:
-                logger.error(f"Ollama API error: {response.status_code}")
-                return None
-            
-            full_response = ""
-            thinking_content = ""
-            is_in_thinking = False
-            is_thinking_complete = False
-            thinking_lines_printed = 0
-            
-            # ANSI escape codes for colors and cursor control
-            GRAY = '\033[90m'      # Dark gray for thinking
-            LIGHT_GRAY = '\033[37m'  # Light gray alternative
-            RESET = '\033[0m'      # Reset color
-            CLEAR_LINE = '\033[2K' # Clear entire line
-            CURSOR_UP = '\033[A'   # Move cursor up one line
-            
-            print(f"\n💭 {GRAY}Thinking...{RESET}", flush=True)
-            
-            for line in response.iter_lines():
-                if line:
-                    try:
-                        chunk_data = json.loads(line.decode('utf-8'))
-                        chunk_text = chunk_data.get('response', '')
-                        
-                        if chunk_text:
-                            full_response += chunk_text
-                            
-                            # Handle thinking tokens
-                            if use_thinking and '<think>' in chunk_text:
-                                is_in_thinking = True
-                                chunk_text = chunk_text.replace('<think>', '')
-                            
-                            if is_in_thinking and '</think>' in chunk_text:
-                                is_in_thinking = False
-                                is_thinking_complete = True
-                                chunk_text = chunk_text.replace('</think>', '')
-                                
-                                if collapse_thinking:
-                                    # Clear thinking content and show completion
-                                    # Move cursor up to clear thinking lines
-                                    for _ in range(thinking_lines_printed + 1):
-                                        print(f"{CURSOR_UP}{CLEAR_LINE}", end='', flush=True)
-                                    
-                                    print(f"💭 {GRAY}Thinking complete ✓{RESET}", flush=True)
-                                    thinking_lines_printed = 0
-                                else:
-                                    # Keep thinking visible, just show completion
-                                    print(f"\n💭 {GRAY}Thinking complete ✓{RESET}", flush=True)
-                                
-                                print("🤖 AI Response:", flush=True)
-                                continue
-                            
-                            # Display thinking content in gray with better formatting
-                            if is_in_thinking and chunk_text.strip():
-                                thinking_content += chunk_text
-                                
-                                # Handle line breaks and word wrapping properly
-                                if ' ' in chunk_text or '\n' in chunk_text or len(thinking_content) > 100:
-                                    # Split by sentences for better readability
-                                    sentences = thinking_content.replace('\n', ' ').split('. ')
-                                    
-                                    for sentence in sentences[:-1]:  # Process complete sentences
-                                        sentence = sentence.strip()
-                                        if sentence:
-                                            # Word wrap long sentences
-                                            words = sentence.split()
-                                            line = ""
-                                            for word in words:
-                                                if len(line + " " + word) > 70:
-                                                    if line:
-                                                        print(f"{GRAY}   {line.strip()}{RESET}", flush=True)
-                                                        thinking_lines_printed += 1
-                                                    line = word
-                                                else:
-                                                    line += " " + word if line else word
-                                            
-                                            if line.strip():
-                                                print(f"{GRAY}   {line.strip()}.{RESET}", flush=True)
-                                                thinking_lines_printed += 1
-                                    
-                                    # Keep the last incomplete sentence for next iteration
-                                    thinking_content = sentences[-1] if sentences else ""
-                            
-                            # Display regular response content (skip any leftover thinking)
-                            elif not is_in_thinking and is_thinking_complete and chunk_text.strip():
-                                # Filter out any remaining thinking tags that might leak through
-                                clean_text = chunk_text
-                                if '<think>' in clean_text or '</think>' in clean_text:
-                                    clean_text = clean_text.replace('<think>', '').replace('</think>', '')
-                                
-                                if clean_text.strip():
-                                    print(clean_text, end='', flush=True)
-                        
-                        # Check if response is done
-                        if chunk_data.get('done', False):
-                            print()  # Final newline
-                            break
-                            
-                    except json.JSONDecodeError:
-                        continue
-                    except Exception as e:
-                        logger.error(f"Error processing stream chunk: {e}")
-                        continue
-            
-            return full_response
-            
-        except Exception as e:
-            logger.error(f"Streaming failed: {e}")
-            return None
-
    def _handle_streaming_with_early_stop(self, payload: dict, model_name: str, use_thinking: bool, start_time: float) -> Optional[str]:
        """Handle streaming response with intelligent early stopping."""
        import json
--- a/mini_rag/query_expander.py
+++ b/mini_rag/query_expander.py
@ -170,8 +170,8 @@ Expanded query:"""
                
                # Use same model rankings as main synthesizer for consistency
                expansion_preferences = [
-                    "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "qwen2.5:3b", 
-                    "qwen2.5:1.5b", "qwen2.5-coder:1.5b"
+                    "qwen3:1.7b", "qwen3:0.6b", "qwen3:4b", "llama3.2:1b", 
+                    "qwen2.5:1.5b", "qwen3:3b", "qwen2.5-coder:1.5b"
                ]
                
                for preferred in expansion_preferences:
--- a/rag-mini.py
+++ b/rag-mini.py
@ -142,8 +142,8 @@ def search_project(project_path: Path, query: str, top_k: int = 10, synthesize:
            print("   • Search for file types: \"python class\" or \"javascript function\"")
            print()
            print("⚙️ Configuration adjustments:")
-            print(f"   • Lower threshold: ./rag-mini search \"{project_path}\" \"{query}\" --threshold 0.05")
-            print(f"   • More results: ./rag-mini search \"{project_path}\" \"{query}\" --top-k 20")
+            print(f"   • Lower threshold: ./rag-mini search {project_path} \"{query}\" --threshold 0.05")
+            print("   • More results: add --top-k 20")
            print()
            print("📚 Need help? See: docs/TROUBLESHOOTING.md")
            return
@ -201,7 +201,7 @@ def search_project(project_path: Path, query: str, top_k: int = 10, synthesize:
            else:
                print("❌ LLM synthesis unavailable")
                print("   • Ensure Ollama is running: ollama serve")
-                print("   • Install a model: ollama pull qwen3:1.7b")
+                print("   • Install a model: ollama pull llama3.2")
                print("   • Check connection to http://localhost:11434")
        
        # Save last search for potential enhancements
@ -317,27 +317,12 @@ def explore_interactive(project_path: Path):
        if not explorer.start_exploration_session():
            sys.exit(1)
        
-        # Show enhanced first-time guidance
        print(f"\n🤔 Ask your first question about {project_path.name}:")
-        print()
-        print("💡 Enter your search query or question below:")
-        print('   Examples: "How does authentication work?" or "Show me error handling"')
-        print()
-        print("🔧 Quick options:")
-        print("   1. Help - Show example questions")
-        print("   2. Status - Project information")  
-        print("   3. Suggest - Get a random starter question")
-        print()
-        
-        is_first_question = True
        
        while True:
            try:
-                # Get user input with clearer prompt
-                if is_first_question:
-                    question = input("📝 Enter question or option (1-3): ").strip()
-                else:
-                    question = input("\n> ").strip()
+                # Get user input
+                question = input("\n> ").strip()
                
                # Handle exit commands
                if question.lower() in ['quit', 'exit', 'q']:
@ -346,17 +331,14 @@ def explore_interactive(project_path: Path):
                
                # Handle empty input
                if not question:
-                    if is_first_question:
-                        print("Please enter a question or try option 3 for a suggestion.")
-                    else:
-                        print("Please enter a question or 'quit' to exit.")
+                    print("Please enter a question or 'quit' to exit.")
                    continue
                
-                # Handle numbered options and special commands
-                if question in ['1'] or question.lower() in ['help', 'h']:
+                # Special commands
+                if question.lower() in ['help', 'h']:
                    print("""
 🧠 EXPLORATION MODE HELP:
-  • Ask any question about your documents or code
+  • Ask any question about the codebase
  • I remember our conversation for follow-up questions
  • Use 'why', 'how', 'explain' for detailed reasoning
  • Type 'summary' to see session overview
@ -364,53 +346,11 @@ def explore_interactive(project_path: Path):
  
 💡 Example questions:
  • "How does authentication work?"
-  • "What are the main components?"
-  • "Show me error handling patterns"
  • "Why is this function slow?"
-  • "What security measures are in place?"
-  • "How does data flow through this system?"
+  • "Explain the database connection logic"
+  • "What are the security concerns here?"
 """)
                    continue
-                    
-                elif question in ['2'] or question.lower() == 'status':
-                    print(f"""
-📊 PROJECT STATUS: {project_path.name}
-  • Location: {project_path}
-  • Exploration session active
-  • AI model ready for questions
-  • Conversation memory enabled
-""")
-                    continue
-                    
-                elif question in ['3'] or question.lower() == 'suggest':
-                    # Random starter questions for first-time users
-                    if is_first_question:
-                        import random
-                        starters = [
-                            "What are the main components of this project?",
-                            "How is error handling implemented?", 
-                            "Show me the authentication and security logic",
-                            "What are the key functions I should understand first?",
-                            "How does data flow through this system?",
-                            "What configuration options are available?",
-                            "Show me the most important files to understand"
-                        ]
-                        suggested = random.choice(starters)
-                        print(f"\n💡 Suggested question: {suggested}")
-                        print("   Press Enter to use this, or type your own question:")
-                        
-                        next_input = input("📝 > ").strip()
-                        if not next_input:  # User pressed Enter to use suggestion
-                            question = suggested
-                        else:
-                            question = next_input
-                    else:
-                        # For subsequent questions, could add AI-powered suggestions here
-                        print("\n💡 Based on our conversation, you might want to ask:")
-                        print('   "Can you explain that in more detail?"')
-                        print('   "What are the security implications?"')
-                        print('   "Show me related code examples"')
-                        continue
                
                if question.lower() == 'summary':
                    print("\n" + explorer.get_session_summary())
@ -421,9 +361,6 @@ def explore_interactive(project_path: Path):
                print("🧠 Thinking with AI model...")
                response = explorer.explore_question(question)
                
-                # Mark as no longer first question after processing
-                is_first_question = False
-                
                if response:
                    print(f"\n{response}")
                else:
--- a/rag-tui.py
+++ b/rag-tui.py
--- a/rag.bat
+++ b/rag.bat
@ -1,51 +0,0 @@
-@echo off
-REM FSS-Mini-RAG Windows Launcher - Simple and Reliable
-
-setlocal
-set "SCRIPT_DIR=%~dp0"
-set "SCRIPT_DIR=%SCRIPT_DIR:~0,-1%"
-set "VENV_PYTHON=%SCRIPT_DIR%\.venv\Scripts\python.exe"
-
-REM Check if virtual environment exists
-if not exist "%VENV_PYTHON%" (
-    echo Virtual environment not found!
-    echo.
-    echo Run this first: install_windows.bat
-    echo.
-    pause
-    exit /b 1
-)
-
-REM Route commands
-if "%1"=="" goto :interactive
-if "%1"=="help" goto :help
-if "%1"=="--help" goto :help
-if "%1"=="-h" goto :help
-
-REM Pass all arguments to Python script
-"%VENV_PYTHON%" "%SCRIPT_DIR%\rag-mini.py" %*
-goto :end
-
-:interactive
-echo Starting interactive interface...
-"%VENV_PYTHON%" "%SCRIPT_DIR%\rag-tui.py"
-goto :end
-
-:help
-echo FSS-Mini-RAG - Semantic Code Search
-echo.
-echo Usage:
-echo   rag.bat                           - Interactive interface
-echo   rag.bat index ^<folder^>             - Index a project
-echo   rag.bat search ^<folder^> ^<query^>     - Search project
-echo   rag.bat status ^<folder^>            - Check status
-echo.
-echo Examples:
-echo   rag.bat index C:\myproject
-echo   rag.bat search C:\myproject "authentication"
-echo   rag.bat search . "error handling"
-echo.
-pause
-
-:end
-endlocal