Add comprehensive LLM provider support and educational error handling

✨ Features: - Multi-provider LLM support (OpenAI, Claude, OpenRouter, LM Studio) - Educational config examples with setup guides - Comprehensive documentation in docs/LLM_PROVIDERS.md - Config validation testing system 🎯 Beginner Experience: - Friendly error messages for common mistakes - Educational explanations for technical concepts - Step-by-step troubleshooting guidance - Clear next-steps for every error condition 🛠 Technical: - Extended LLMConfig dataclass for cloud providers - Automated config validation script - Enhanced error handling in core components - Backward-compatible configuration system 📚 Documentation: - Provider comparison tables with costs/quality - Setup instructions for each LLM provider - Troubleshooting guides and testing procedures - Environment variable configuration options All configs pass validation tests. Ready for production use.
2025-08-14 16:39:12 +10:00 · 2025-08-14 16:39:12 +10:00 · 2f2dd6880b
commit 2f2dd6880b
parent 3fe26ef138
8 changed files with 707 additions and 11 deletions
--- a/docs/LLM_PROVIDERS.md
+++ b/docs/LLM_PROVIDERS.md
@ -0,0 +1,264 @@
+# 🤖 LLM Provider Setup Guide
+
+This guide shows how to configure FSS-Mini-RAG with different LLM providers for synthesis and query expansion features.
+
+## 🎯 Quick Provider Comparison
+
+| Provider | Cost | Setup Difficulty | Quality | Privacy | Internet Required |
+|----------|------|------------------|---------|---------|-------------------|
+| **Ollama** | Free | Easy | Good | Excellent | No |
+| **LM Studio** | Free | Easy | Good | Excellent | No |
+| **OpenRouter** | Low ($0.10-0.50/M) | Medium | Excellent | Fair | Yes |
+| **OpenAI** | Medium ($0.15-2.50/M) | Medium | Excellent | Fair | Yes |
+| **Anthropic** | Medium-High | Medium | Excellent | Fair | Yes |
+
+## 🏠 Local Providers (Recommended for Beginners)
+
+### Ollama (Default)
+
+**Best for:** Privacy, learning, no ongoing costs
+
+```yaml
+llm:
+  provider: ollama
+  ollama_host: localhost:11434
+  synthesis_model: llama3.2
+  expansion_model: llama3.2
+  enable_synthesis: false
+  synthesis_temperature: 0.3
+  cpu_optimized: true
+  enable_thinking: true
+```
+
+**Setup:**
+1. Install Ollama: `curl -fsSL https://ollama.ai/install.sh | sh`
+2. Start service: `ollama serve`
+3. Download model: `ollama pull llama3.2`
+4. Test: `./rag-mini search /path/to/project "test" --synthesize`
+
+**Recommended Models:**
+- `qwen3:0.6b` - Ultra-fast, good for CPU-only systems
+- `llama3.2` - Balanced quality and speed  
+- `llama3.1:8b` - Higher quality, needs more RAM
+
+### LM Studio
+
+**Best for:** GUI users, model experimentation
+
+```yaml
+llm:
+  provider: openai
+  api_base: http://localhost:1234/v1
+  api_key: "not-needed"
+  synthesis_model: "any"
+  expansion_model: "any"
+  enable_synthesis: false
+  synthesis_temperature: 0.3
+```
+
+**Setup:**
+1. Download [LM Studio](https://lmstudio.ai)
+2. Install any model from the catalog
+3. Start local server (default port 1234)
+4. Use config above
+
+## ☁️ Cloud Providers (For Advanced Users)
+
+### OpenRouter (Best Value)
+
+**Best for:** Access to many models, reasonable pricing
+
+```yaml
+llm:
+  provider: openai
+  api_base: https://openrouter.ai/api/v1
+  api_key: "your-api-key-here"
+  synthesis_model: "meta-llama/llama-3.1-8b-instruct:free"
+  expansion_model: "meta-llama/llama-3.1-8b-instruct:free"
+  enable_synthesis: false
+  synthesis_temperature: 0.3
+  timeout: 30
+```
+
+**Setup:**
+1. Sign up at [openrouter.ai](https://openrouter.ai)
+2. Create API key in dashboard
+3. Add $5-10 credits (goes far with efficient models)
+4. Replace `your-api-key-here` with actual key
+
+**Budget Models:**
+- `meta-llama/llama-3.1-8b-instruct:free` - Free tier
+- `openai/gpt-4o-mini` - $0.15 per million tokens
+- `anthropic/claude-3-haiku` - $0.25 per million tokens
+
+### OpenAI (Premium Quality)
+
+**Best for:** Reliability, advanced features
+
+```yaml
+llm:
+  provider: openai
+  api_key: "your-openai-api-key"
+  synthesis_model: "gpt-4o-mini"
+  expansion_model: "gpt-4o-mini"
+  enable_synthesis: false
+  synthesis_temperature: 0.3
+  timeout: 30
+```
+
+**Setup:**
+1. Sign up at [platform.openai.com](https://platform.openai.com)
+2. Add payment method
+3. Create API key
+4. Start with `gpt-4o-mini` for cost efficiency
+
+### Anthropic Claude (Code Expert)
+
+**Best for:** Code analysis, thoughtful responses
+
+```yaml
+llm:
+  provider: anthropic
+  api_key: "your-anthropic-api-key"
+  synthesis_model: "claude-3-haiku-20240307"
+  expansion_model: "claude-3-haiku-20240307"
+  enable_synthesis: false
+  synthesis_temperature: 0.3
+  timeout: 30
+```
+
+**Setup:**
+1. Sign up at [console.anthropic.com](https://console.anthropic.com)
+2. Add credits to account
+3. Create API key
+4. Start with Claude Haiku for budget-friendly option
+
+## 🧪 Testing Your Setup
+
+### 1. Basic Functionality Test
+```bash
+# Test without LLM (should always work)
+./rag-mini search /path/to/project "authentication"
+```
+
+### 2. Synthesis Test
+```bash
+# Test LLM integration
+./rag-mini search /path/to/project "authentication" --synthesize
+```
+
+### 3. Interactive Test
+```bash
+# Test exploration mode
+./rag-mini explore /path/to/project
+# Then ask: "How does authentication work in this codebase?"
+```
+
+### 4. Query Expansion Test
+Enable `expand_queries: true` in config, then:
+```bash
+./rag-mini search /path/to/project "auth"
+# Should automatically expand to "auth authentication login user session"
+```
+
+## 🛠️ Configuration Tips
+
+### For Budget-Conscious Users
+```yaml
+llm:
+  synthesis_model: "gpt-4o-mini"  # or claude-haiku
+  enable_synthesis: false         # Manual control
+  synthesis_temperature: 0.1     # Factual responses
+  max_expansion_terms: 4          # Shorter expansions
+```
+
+### For Quality-Focused Users
+```yaml
+llm:
+  synthesis_model: "gpt-4o"       # or claude-sonnet
+  enable_synthesis: true          # Always on
+  synthesis_temperature: 0.3     # Balanced creativity
+  enable_thinking: true           # Show reasoning
+  max_expansion_terms: 8          # Comprehensive expansion
+```
+
+### For Privacy-Focused Users
+```yaml
+# Use only local providers
+embedding:
+  preferred_method: ollama        # Local embeddings
+llm:
+  provider: ollama               # Local LLM
+  # Never use cloud providers
+```
+
+## 🔧 Troubleshooting
+
+### Connection Issues
+- **Local:** Ensure Ollama/LM Studio is running: `ps aux | grep ollama`
+- **Cloud:** Check API key and internet: `curl -H "Authorization: Bearer $API_KEY" https://api.openai.com/v1/models`
+
+### Model Not Found
+- **Ollama:** `ollama pull model-name`
+- **Cloud:** Check provider's model list documentation
+
+### High Costs
+- Use mini/haiku models instead of full versions
+- Set `enable_synthesis: false` and use `--synthesize` selectively
+- Reduce `max_expansion_terms` to 4-6
+
+### Poor Quality
+- Try higher-tier models (gpt-4o, claude-sonnet)
+- Adjust `synthesis_temperature` (0.1 = factual, 0.5 = creative)
+- Enable `expand_queries` for better search coverage
+
+### Slow Responses
+- **Local:** Try smaller models (qwen3:0.6b)
+- **Cloud:** Increase `timeout` or switch providers
+- **General:** Reduce `max_size` in chunking config
+
+## 📋 Environment Variables (Alternative Setup)
+
+Instead of putting API keys in config files, use environment variables:
+
+```bash
+# In your shell profile (.bashrc, .zshrc, etc.)
+export OPENAI_API_KEY="your-openai-key"
+export ANTHROPIC_API_KEY="your-anthropic-key"
+export OPENROUTER_API_KEY="your-openrouter-key"
+```
+
+Then in config:
+```yaml
+llm:
+  api_key: "${OPENAI_API_KEY}"  # Reads from environment
+```
+
+## 🚀 Advanced: Multi-Provider Setup
+
+You can create different configs for different use cases:
+
+```bash
+# Fast local analysis
+cp examples/config-beginner.yaml .mini-rag/config-local.yaml
+
+# High-quality cloud analysis  
+cp examples/config-llm-providers.yaml .mini-rag/config-cloud.yaml
+# Edit to use OpenAI/Claude
+
+# Switch configs as needed
+ln -sf config-local.yaml .mini-rag/config.yaml   # Use local
+ln -sf config-cloud.yaml .mini-rag/config.yaml   # Use cloud
+```
+
+## 📚 Further Reading
+
+- [Ollama Model Library](https://ollama.ai/library)
+- [OpenRouter Pricing](https://openrouter.ai/docs#models)
+- [OpenAI API Documentation](https://platform.openai.com/docs)
+- [Anthropic Claude Documentation](https://docs.anthropic.com/claude)
+- [LM Studio Getting Started](https://lmstudio.ai/docs)
+
+---
+
+💡 **Pro Tip:** Start with local Ollama for learning, then upgrade to cloud providers when you need production-quality analysis or are working with large codebases.
--- a/examples/config-beginner.yaml
+++ b/examples/config-beginner.yaml
@ -47,6 +47,7 @@ search:
  expand_queries: false   # Keep it simple for now

 # 🤖 AI explanations (optional but helpful)
+# 💡 WANT DIFFERENT LLM? See examples/config-llm-providers.yaml for OpenAI, Claude, etc.
 llm:
  synthesis_model: auto         # Pick best available model
  enable_synthesis: false       # Turn on manually with --synthesize
--- a/examples/config-llm-providers.yaml
+++ b/examples/config-llm-providers.yaml
@ -0,0 +1,233 @@
+# 🌐 LLM PROVIDER ALTERNATIVES - OpenRouter, LM Studio, OpenAI & More
+# Educational guide showing how to configure different LLM providers
+# Copy sections you need to your main config.yaml
+
+#═════════════════════════════════════════════════════════════════════════════════
+# 🎯 QUICK PROVIDER SELECTION GUIDE:
+#
+# 🏠 LOCAL (Best Privacy, No Internet Needed):
+#   - Ollama: Great quality, easy setup, free
+#   - LM Studio: User-friendly GUI, works with many models
+#
+# ☁️ CLOUD (Powerful Models, Requires API Keys):
+#   - OpenRouter: Access to many models with one API
+#   - OpenAI: High quality, reliable, but more expensive
+#   - Anthropic: Excellent for code analysis
+#
+# 💰 BUDGET FRIENDLY:
+#   - OpenRouter (Qwen, Llama models): $0.10-0.50 per million tokens
+#   - Local Ollama/LM Studio: Completely free
+#
+# 🚀 PERFORMANCE:
+#   - Local: Limited by your hardware
+#   - Cloud: Fast and powerful, costs per use
+#═════════════════════════════════════════════════════════════════════════════════
+
+# Standard FSS-Mini-RAG settings (copy these to any config)
+chunking:
+  max_size: 2000
+  min_size: 150
+  strategy: semantic
+
+streaming:
+  enabled: true
+  threshold_bytes: 1048576
+
+files:
+  min_file_size: 50
+  exclude_patterns:
+    - "node_modules/**"
+    - ".git/**"
+    - "__pycache__/**"
+    - "*.pyc"
+    - ".venv/**"
+    - "build/**"
+    - "dist/**"
+  include_patterns:
+    - "**/*"
+
+embedding:
+  preferred_method: ollama    # Use Ollama for embeddings (works with all providers below)
+  ollama_model: nomic-embed-text
+  ollama_host: localhost:11434
+  batch_size: 32
+
+search:
+  default_limit: 10
+  enable_bm25: true
+  similarity_threshold: 0.1
+  expand_queries: false
+
+#═════════════════════════════════════════════════════════════════════════════════
+# 🤖 LLM PROVIDER CONFIGURATIONS
+#═════════════════════════════════════════════════════════════════════════════════
+
+# 🏠 OPTION 1: OLLAMA (LOCAL) - Default and Recommended
+# ✅ Pros: Free, private, no API keys, good quality
+# ❌ Cons: Uses your computer's resources, limited by hardware
+llm:
+  provider: ollama                    # Use local Ollama
+  ollama_host: localhost:11434        # Default Ollama location
+  synthesis_model: llama3.2           # Good all-around model
+  # alternatives: qwen3:0.6b (faster), llama3.2:3b (balanced), llama3.1:8b (quality)
+  expansion_model: llama3.2
+  enable_synthesis: false
+  synthesis_temperature: 0.3
+  cpu_optimized: true
+  enable_thinking: true
+  max_expansion_terms: 8
+
+# 🖥️ OPTION 2: LM STUDIO (LOCAL) - User-Friendly Alternative
+# ✅ Pros: Easy GUI, drag-drop model installation, compatible with Ollama
+# ❌ Cons: Another app to manage, similar hardware limitations
+# 
+# SETUP STEPS:
+# 1. Download LM Studio from lmstudio.ai
+# 2. Install a model (try "microsoft/DialoGPT-medium" or "TheBloke/Llama-2-7B-Chat-GGML")
+# 3. Start local server in LM Studio (usually port 1234)
+# 4. Use this config:
+#
+# llm:
+#   provider: openai                   # LM Studio uses OpenAI-compatible API
+#   api_base: http://localhost:1234/v1 # LM Studio default port
+#   api_key: "not-needed"             # LM Studio doesn't require real API key
+#   synthesis_model: "any"            # Use whatever model you loaded in LM Studio
+#   expansion_model: "any"
+#   enable_synthesis: false
+#   synthesis_temperature: 0.3
+#   cpu_optimized: true
+#   enable_thinking: true
+#   max_expansion_terms: 8
+
+# ☁️ OPTION 3: OPENROUTER (CLOUD) - Many Models, One API
+# ✅ Pros: Access to many models, good prices, no local setup
+# ❌ Cons: Requires internet, costs money, less private
+#
+# SETUP STEPS:
+# 1. Sign up at openrouter.ai
+# 2. Get API key from dashboard
+# 3. Add credits to account ($5-10 goes a long way)
+# 4. Use this config:
+#
+# llm:
+#   provider: openai                   # OpenRouter uses OpenAI-compatible API
+#   api_base: https://openrouter.ai/api/v1
+#   api_key: "your-openrouter-api-key-here"  # Replace with your actual key
+#   synthesis_model: "meta-llama/llama-3.1-8b-instruct:free"  # Free tier model
+#   # alternatives: "openai/gpt-4o-mini" ($0.15/M), "anthropic/claude-3-haiku" ($0.25/M)
+#   expansion_model: "meta-llama/llama-3.1-8b-instruct:free"
+#   enable_synthesis: false
+#   synthesis_temperature: 0.3
+#   cpu_optimized: false              # Cloud models don't need CPU optimization
+#   enable_thinking: true
+#   max_expansion_terms: 8
+#   timeout: 30                       # Longer timeout for internet requests
+
+# 🏢 OPTION 4: OPENAI (CLOUD) - Premium Quality
+# ✅ Pros: Excellent quality, very reliable, fast
+# ❌ Cons: More expensive, requires OpenAI account
+#
+# SETUP STEPS:
+# 1. Sign up at platform.openai.com
+# 2. Add payment method (pay-per-use)
+# 3. Create API key in dashboard
+# 4. Use this config:
+#
+# llm:
+#   provider: openai
+#   api_key: "your-openai-api-key-here"      # Replace with your actual key
+#   synthesis_model: "gpt-4o-mini"           # Affordable option (~$0.15/M tokens)
+#   # alternatives: "gpt-4o" (premium, ~$2.50/M), "gpt-3.5-turbo" (budget, ~$0.50/M)
+#   expansion_model: "gpt-4o-mini"
+#   enable_synthesis: false
+#   synthesis_temperature: 0.3
+#   cpu_optimized: false
+#   enable_thinking: true
+#   max_expansion_terms: 8
+#   timeout: 30
+
+# 🧠 OPTION 5: ANTHROPIC CLAUDE (CLOUD) - Excellent for Code
+# ✅ Pros: Great at code analysis, very thoughtful responses
+# ❌ Cons: Premium pricing, separate API account needed
+#
+# SETUP STEPS:
+# 1. Sign up at console.anthropic.com
+# 2. Get API key and add credits
+# 3. Use this config:
+#
+# llm:
+#   provider: anthropic
+#   api_key: "your-anthropic-api-key-here"   # Replace with your actual key
+#   synthesis_model: "claude-3-haiku-20240307"  # Most affordable option
+#   # alternatives: "claude-3-sonnet-20240229" (balanced), "claude-3-opus-20240229" (premium)
+#   expansion_model: "claude-3-haiku-20240307"
+#   enable_synthesis: false
+#   synthesis_temperature: 0.3
+#   cpu_optimized: false
+#   enable_thinking: true
+#   max_expansion_terms: 8
+#   timeout: 30
+
+#═════════════════════════════════════════════════════════════════════════════════
+# 🧪 TESTING YOUR CONFIGURATION
+#═════════════════════════════════════════════════════════════════════════════════
+#
+# After setting up any provider, test with these commands:
+#
+# 1. Test basic search (no LLM needed):
+#    ./rag-mini search /path/to/project "test query"
+#
+# 2. Test LLM synthesis:
+#    ./rag-mini search /path/to/project "test query" --synthesize
+#
+# 3. Test query expansion:
+#    Enable expand_queries: true in search section and try:
+#    ./rag-mini search /path/to/project "auth"
+#
+# 4. Test thinking mode:
+#    ./rag-mini explore /path/to/project
+#    Then ask: "explain the authentication system"
+#
+#═════════════════════════════════════════════════════════════════════════════════
+# 💡 TROUBLESHOOTING
+#═════════════════════════════════════════════════════════════════════════════════
+#
+# ❌ "Connection refused" or "API error":
+#    - Local: Make sure Ollama/LM Studio is running
+#    - Cloud: Check API key and internet connection
+#
+# ❌ "Model not found":
+#    - Local: Install model with `ollama pull model-name`
+#    - Cloud: Check model name matches provider's API docs
+#
+# ❌ "Token limit exceeded" or expensive bills:
+#    - Use cheaper models like gpt-4o-mini or claude-haiku
+#    - Enable shorter contexts with max_size: 1500
+#
+# ❌ Slow responses:
+#    - Local: Try smaller models (qwen3:0.6b)
+#    - Cloud: Increase timeout or try different provider
+#
+# ❌ Poor quality results:
+#    - Try higher-quality models
+#    - Adjust synthesis_temperature (0.1 for factual, 0.5 for creative)
+#    - Enable expand_queries for better search coverage
+#
+#═════════════════════════════════════════════════════════════════════════════════
+# 📚 LEARN MORE
+#═════════════════════════════════════════════════════════════════════════════════
+# 
+# Provider Documentation:
+# - Ollama: https://ollama.ai/library (model catalog)
+# - LM Studio: https://lmstudio.ai/docs (getting started)
+# - OpenRouter: https://openrouter.ai/docs (API reference)
+# - OpenAI: https://platform.openai.com/docs (API docs)
+# - Anthropic: https://docs.anthropic.com/claude/reference (Claude API)
+#
+# Model Recommendations:
+# - Code Analysis: claude-3-sonnet, gpt-4o, llama3.1:8b
+# - Fast Responses: gpt-4o-mini, claude-haiku, qwen3:0.6b  
+# - Budget Friendly: OpenRouter free tier, local Ollama
+# - Best Privacy: Local Ollama or LM Studio only
+#
+#═════════════════════════════════════════════════════════════════════════════════
--- a/mini_rag/config.py
+++ b/mini_rag/config.py
@ -72,13 +72,21 @@ class SearchConfig:
@dataclass 
 class LLMConfig:
    """Configuration for LLM synthesis and query expansion."""
-    ollama_host: str = "localhost:11434"
+    # Core settings
    synthesis_model: str = "auto"  # "auto", "qwen3:1.7b", "qwen2.5:1.5b", etc.
    expansion_model: str = "auto"  # Usually same as synthesis_model
    max_expansion_terms: int = 8   # Maximum additional terms to add
    enable_synthesis: bool = False # Enable by default when --synthesize used
    synthesis_temperature: float = 0.3
-    enable_thinking: bool = True  # Enable thinking mode for Qwen3 models (production: True, testing: toggle)
+    enable_thinking: bool = True  # Enable thinking mode for Qwen3 models
+    cpu_optimized: bool = True     # Prefer lightweight models
+    
+    # Provider-specific settings (for different LLM providers)
+    provider: str = "ollama"       # "ollama", "openai", "anthropic"
+    ollama_host: str = "localhost:11434"  # Ollama connection
+    api_key: Optional[str] = None  # API key for cloud providers
+    api_base: Optional[str] = None # Base URL for API (e.g., OpenRouter)
+    timeout: int = 20              # Request timeout in seconds


@dataclass
--- a/mini_rag/ollama_embeddings.py
+++ b/mini_rag/ollama_embeddings.py
@ -81,16 +81,36 @@ class OllamaEmbedder:
    
    def _verify_ollama_connection(self):
        """Verify Ollama server is running and model is available."""
-        # Check server status
-        response = requests.get(f"{self.base_url}/api/tags", timeout=5)
-        response.raise_for_status()
+        try:
+            # Check server status
+            response = requests.get(f"{self.base_url}/api/tags", timeout=5)
+            response.raise_for_status()
+        except requests.exceptions.ConnectionError:
+            print("🔌 Ollama Service Unavailable")
+            print("   Ollama provides AI embeddings that make semantic search possible")
+            print("   Start Ollama: ollama serve")
+            print("   Install models: ollama pull nomic-embed-text")
+            print()
+            raise ConnectionError("Ollama service not running. Start with: ollama serve")
+        except requests.exceptions.Timeout:
+            print("⏱️ Ollama Service Timeout")  
+            print("   Ollama is taking too long to respond")
+            print("   Check if Ollama is overloaded: ollama ps")
+            print("   Restart if needed: killall ollama && ollama serve")
+            print()
+            raise ConnectionError("Ollama service timeout")
        
        # Check if our model is available
        models = response.json().get('models', [])
        model_names = [model['name'] for model in models]
        
        if self.model_name not in model_names:
-            logger.warning(f"Model {self.model_name} not found. Available: {model_names}")
+            print(f"📦 Model '{self.model_name}' Not Found")
+            print("   Embedding models convert text into searchable vectors")
+            print(f"   Download model: ollama pull {self.model_name}")
+            if model_names:
+                print(f"   Available models: {', '.join(model_names[:3])}")
+            print()
            # Try to pull the model
            self._pull_model()
        
--- a/mini_rag/search.py
+++ b/mini_rag/search.py
@ -117,11 +117,21 @@ class CodeSearcher:
        """Connect to the LanceDB database."""
        try:
            if not self.rag_dir.exists():
+                print("🗃️ No Search Index Found")
+                print("   An index is a database that makes your files searchable")
+                print(f"   Create index: ./rag-mini index {self.project_path}")
+                print("   (This analyzes your files and creates semantic search vectors)")
+                print()
                raise FileNotFoundError(f"No RAG index found at {self.rag_dir}")
            
            self.db = lancedb.connect(self.rag_dir)
            
            if "code_vectors" not in self.db.table_names():
+                print("🔧 Index Database Corrupted") 
+                print("   The search index exists but is missing data tables")
+                print(f"   Rebuild index: rm -rf {self.rag_dir} && ./rag-mini index {self.project_path}")
+                print("   (This will recreate the search database)")
+                print()
                raise ValueError("No code_vectors table found. Run indexing first.")
            
            self.table = self.db.open_table("code_vectors")
--- a/rag-mini.py
+++ b/rag-mini.py
@ -15,11 +15,29 @@ import logging
 # Add the RAG system to the path
 sys.path.insert(0, str(Path(__file__).parent))

-from mini_rag.indexer import ProjectIndexer
-from mini_rag.search import CodeSearcher
-from mini_rag.ollama_embeddings import OllamaEmbedder
-from mini_rag.llm_synthesizer import LLMSynthesizer
-from mini_rag.explorer import CodeExplorer
+try:
+    from mini_rag.indexer import ProjectIndexer
+    from mini_rag.search import CodeSearcher
+    from mini_rag.ollama_embeddings import OllamaEmbedder
+    from mini_rag.llm_synthesizer import LLMSynthesizer
+    from mini_rag.explorer import CodeExplorer
+except ImportError as e:
+    print("❌ Error: Missing dependencies!")
+    print()
+    print("It looks like you haven't installed the required packages yet.")
+    print("This is a common mistake - here's how to fix it:")
+    print()
+    print("1. Make sure you're in the FSS-Mini-RAG directory")
+    print("2. Run the installer script:")
+    print("   ./install_mini_rag.sh")
+    print()
+    print("Or if you want to install manually:")
+    print("   python3 -m venv .venv")
+    print("   source .venv/bin/activate")
+    print("   pip install -r requirements.txt")
+    print()
+    print(f"Missing module: {e.name}")
+    sys.exit(1)

 # Configure logging for user-friendly output
 logging.basicConfig(
@ -68,7 +86,25 @@ def index_project(project_path: Path, force: bool = False):
        if not (project_path / '.mini-rag' / 'last_search').exists():
            print(f"\n💡 Try: rag-mini search {project_path} \"your search here\"")
            
+    except FileNotFoundError:
+        print(f"📁 Directory Not Found: {project_path}")
+        print("   Make sure the path exists and you're in the right location")
+        print(f"   Current directory: {Path.cwd()}")
+        print("   Check path: ls -la /path/to/your/project")
+        print()
+        sys.exit(1)
+    except PermissionError:
+        print("🔒 Permission Denied")
+        print("   FSS-Mini-RAG needs to read files and create index database")
+        print(f"   Check permissions: ls -la {project_path}")
+        print("   Try a different location with write access")
+        print()
+        sys.exit(1)
    except Exception as e:
+        # Connection errors are handled in the embedding module
+        if "ollama" in str(e).lower() or "connection" in str(e).lower():
+            sys.exit(1)  # Error already displayed
+            
        print(f"❌ Indexing failed: {e}")
        print()
        print("🔧 Common solutions:")
--- a/scripts/test-configs.py
+++ b/scripts/test-configs.py
@ -0,0 +1,124 @@
+#!/usr/bin/env python3
+"""
+Test script to validate all config examples are syntactically correct
+and contain required fields for FSS-Mini-RAG.
+"""
+
+import yaml
+import sys
+from pathlib import Path
+from typing import Dict, Any, List
+
+def validate_config_structure(config: Dict[str, Any], config_name: str) -> List[str]:
+    """Validate that config has required structure."""
+    errors = []
+    
+    # Required sections
+    required_sections = ['chunking', 'streaming', 'files', 'embedding', 'search']
+    for section in required_sections:
+        if section not in config:
+            errors.append(f"{config_name}: Missing required section '{section}'")
+    
+    # Validate chunking section
+    if 'chunking' in config:
+        chunking = config['chunking']
+        required_chunking = ['max_size', 'min_size', 'strategy']
+        for field in required_chunking:
+            if field not in chunking:
+                errors.append(f"{config_name}: Missing chunking.{field}")
+        
+        # Validate types and ranges
+        if 'max_size' in chunking and not isinstance(chunking['max_size'], int):
+            errors.append(f"{config_name}: chunking.max_size must be integer")
+        if 'min_size' in chunking and not isinstance(chunking['min_size'], int):
+            errors.append(f"{config_name}: chunking.min_size must be integer")
+        if 'strategy' in chunking and chunking['strategy'] not in ['semantic', 'fixed']:
+            errors.append(f"{config_name}: chunking.strategy must be 'semantic' or 'fixed'")
+    
+    # Validate embedding section
+    if 'embedding' in config:
+        embedding = config['embedding']
+        if 'preferred_method' in embedding:
+            valid_methods = ['ollama', 'ml', 'hash', 'auto']
+            if embedding['preferred_method'] not in valid_methods:
+                errors.append(f"{config_name}: embedding.preferred_method must be one of {valid_methods}")
+    
+    # Validate LLM section (if present)
+    if 'llm' in config:
+        llm = config['llm']
+        if 'synthesis_temperature' in llm:
+            temp = llm['synthesis_temperature']
+            if not isinstance(temp, (int, float)) or temp < 0 or temp > 1:
+                errors.append(f"{config_name}: llm.synthesis_temperature must be number between 0-1")
+    
+    return errors
+
+def test_config_file(config_path: Path) -> bool:
+    """Test a single config file."""
+    print(f"Testing {config_path.name}...")
+    
+    try:
+        # Test YAML parsing
+        with open(config_path, 'r') as f:
+            config = yaml.safe_load(f)
+        
+        if not config:
+            print(f"  ❌ {config_path.name}: Empty or invalid YAML")
+            return False
+        
+        # Test structure
+        errors = validate_config_structure(config, config_path.name)
+        
+        if errors:
+            print(f"  ❌ {config_path.name}: Structure errors:")
+            for error in errors:
+                print(f"     • {error}")
+            return False
+        
+        print(f"  ✅ {config_path.name}: Valid")
+        return True
+        
+    except yaml.YAMLError as e:
+        print(f"  ❌ {config_path.name}: YAML parsing error: {e}")
+        return False
+    except Exception as e:
+        print(f"  ❌ {config_path.name}: Unexpected error: {e}")
+        return False
+
+def main():
+    """Test all config examples."""
+    script_dir = Path(__file__).parent
+    project_root = script_dir.parent
+    examples_dir = project_root / 'examples'
+    
+    if not examples_dir.exists():
+        print(f"❌ Examples directory not found: {examples_dir}")
+        sys.exit(1)
+    
+    # Find all config files
+    config_files = list(examples_dir.glob('config*.yaml'))
+    
+    if not config_files:
+        print(f"❌ No config files found in {examples_dir}")
+        sys.exit(1)
+    
+    print(f"🧪 Testing {len(config_files)} config files...\n")
+    
+    all_passed = True
+    for config_file in sorted(config_files):
+        passed = test_config_file(config_file)
+        if not passed:
+            all_passed = False
+    
+    print(f"\n{'='*50}")
+    if all_passed:
+        print("✅ All config files are valid!")
+        print("\n💡 To use any config:")
+        print("   cp examples/config-NAME.yaml /path/to/project/.mini-rag/config.yaml")
+        sys.exit(0)
+    else:
+        print("❌ Some config files have issues - please fix before release")
+        sys.exit(1)
+
+if __name__ == '__main__':
+    main()