This comprehensive update enhances user experience with several key improvements: ## Enhanced Streaming & Thinking Display - Implement real-time streaming with gray thinking tokens that collapse after completion - Fix thinking token redisplay bug with proper content filtering - Add clear "AI Response:" headers to separate thinking from responses - Enable streaming by default for better user engagement - Keep thinking visible for exploration, collapse only for suggested questions ## Natural Conversation Responses - Convert clunky JSON exploration responses to natural, conversational format - Improve exploration prompts for friendly, colleague-style interactions - Update summary generation with better context handling - Eliminate double response display issues ## Model Reference Updates - Remove all llama3.2 references in favor of qwen3 models - Fix non-existent qwen3:3b references, replace with proper model names - Update model rankings to prioritize working qwen models across all components - Ensure consistent model recommendations in docs and examples ## Cross-Platform Icon Integration - Add desktop icon setup to Linux installer with .desktop entry - Add Windows shortcuts for desktop and Start Menu integration - Improve installer user experience with visual branding ## Configuration & Navigation Fixes - Fix "0" option in configuration menu to properly go back - Improve configuration menu user-friendliness - Update troubleshooting guides with correct model suggestions These changes significantly improve the beginner experience while maintaining technical accuracy and system reliability.
7.2 KiB
🤖 LLM Provider Setup Guide
This guide shows how to configure FSS-Mini-RAG with different LLM providers for synthesis and query expansion features.
🎯 Quick Provider Comparison
| Provider | Cost | Setup Difficulty | Quality | Privacy | Internet Required |
|---|---|---|---|---|---|
| Ollama | Free | Easy | Good | Excellent | No |
| LM Studio | Free | Easy | Good | Excellent | No |
| OpenRouter | Low ($0.10-0.50/M) | Medium | Excellent | Fair | Yes |
| OpenAI | Medium ($0.15-2.50/M) | Medium | Excellent | Fair | Yes |
| Anthropic | Medium-High | Medium | Excellent | Fair | Yes |
🏠 Local Providers (Recommended for Beginners)
Ollama (Default)
Best for: Privacy, learning, no ongoing costs
llm:
provider: ollama
ollama_host: localhost:11434
synthesis_model: qwen3:1.7b
expansion_model: qwen3:1.7b
enable_synthesis: false
synthesis_temperature: 0.3
cpu_optimized: true
enable_thinking: true
Setup:
- Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh - Start service:
ollama serve - Download model:
ollama pull qwen3:1.7b - Test:
./rag-mini search /path/to/project "test" --synthesize
Recommended Models:
qwen3:0.6b- Ultra-fast, good for CPU-only systemsqwen3:1.7b- Balanced quality and speed (recommended)qwen3:4b- Higher quality, excellent for most use cases
LM Studio
Best for: GUI users, model experimentation
llm:
provider: openai
api_base: http://localhost:1234/v1
api_key: "not-needed"
synthesis_model: "any"
expansion_model: "any"
enable_synthesis: false
synthesis_temperature: 0.3
Setup:
- Download LM Studio
- Install any model from the catalog
- Start local server (default port 1234)
- Use config above
☁️ Cloud Providers (For Advanced Users)
OpenRouter (Best Value)
Best for: Access to many models, reasonable pricing
llm:
provider: openai
api_base: https://openrouter.ai/api/v1
api_key: "your-api-key-here"
synthesis_model: "meta-llama/llama-3.1-8b-instruct:free"
expansion_model: "meta-llama/llama-3.1-8b-instruct:free"
enable_synthesis: false
synthesis_temperature: 0.3
timeout: 30
Setup:
- Sign up at openrouter.ai
- Create API key in dashboard
- Add $5-10 credits (goes far with efficient models)
- Replace
your-api-key-herewith actual key
Budget Models:
meta-llama/llama-3.1-8b-instruct:free- Free tieropenai/gpt-4o-mini- $0.15 per million tokensanthropic/claude-3-haiku- $0.25 per million tokens
OpenAI (Premium Quality)
Best for: Reliability, advanced features
llm:
provider: openai
api_key: "your-openai-api-key"
synthesis_model: "gpt-4o-mini"
expansion_model: "gpt-4o-mini"
enable_synthesis: false
synthesis_temperature: 0.3
timeout: 30
Setup:
- Sign up at platform.openai.com
- Add payment method
- Create API key
- Start with
gpt-4o-minifor cost efficiency
Anthropic Claude (Code Expert)
Best for: Code analysis, thoughtful responses
llm:
provider: anthropic
api_key: "your-anthropic-api-key"
synthesis_model: "claude-3-haiku-20240307"
expansion_model: "claude-3-haiku-20240307"
enable_synthesis: false
synthesis_temperature: 0.3
timeout: 30
Setup:
- Sign up at console.anthropic.com
- Add credits to account
- Create API key
- Start with Claude Haiku for budget-friendly option
🧪 Testing Your Setup
1. Basic Functionality Test
# Test without LLM (should always work)
./rag-mini search /path/to/project "authentication"
2. Synthesis Test
# Test LLM integration
./rag-mini search /path/to/project "authentication" --synthesize
3. Interactive Test
# Test exploration mode
./rag-mini explore /path/to/project
# Then ask: "How does authentication work in this codebase?"
4. Query Expansion Test
Enable expand_queries: true in config, then:
./rag-mini search /path/to/project "auth"
# Should automatically expand to "auth authentication login user session"
🛠️ Configuration Tips
For Budget-Conscious Users
llm:
synthesis_model: "gpt-4o-mini" # or claude-haiku
enable_synthesis: false # Manual control
synthesis_temperature: 0.1 # Factual responses
max_expansion_terms: 4 # Shorter expansions
For Quality-Focused Users
llm:
synthesis_model: "gpt-4o" # or claude-sonnet
enable_synthesis: true # Always on
synthesis_temperature: 0.3 # Balanced creativity
enable_thinking: true # Show reasoning
max_expansion_terms: 8 # Comprehensive expansion
For Privacy-Focused Users
# Use only local providers
embedding:
preferred_method: ollama # Local embeddings
llm:
provider: ollama # Local LLM
# Never use cloud providers
🔧 Troubleshooting
Connection Issues
- Local: Ensure Ollama/LM Studio is running:
ps aux | grep ollama - Cloud: Check API key and internet:
curl -H "Authorization: Bearer $API_KEY" https://api.openai.com/v1/models
Model Not Found
- Ollama:
ollama pull model-name - Cloud: Check provider's model list documentation
High Costs
- Use mini/haiku models instead of full versions
- Set
enable_synthesis: falseand use--synthesizeselectively - Reduce
max_expansion_termsto 4-6
Poor Quality
- Try higher-tier models (gpt-4o, claude-sonnet)
- Adjust
synthesis_temperature(0.1 = factual, 0.5 = creative) - Enable
expand_queriesfor better search coverage
Slow Responses
- Local: Try smaller models (qwen3:0.6b)
- Cloud: Increase
timeoutor switch providers - General: Reduce
max_sizein chunking config
📋 Environment Variables (Alternative Setup)
Instead of putting API keys in config files, use environment variables:
# In your shell profile (.bashrc, .zshrc, etc.)
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export OPENROUTER_API_KEY="your-openrouter-key"
Then in config:
llm:
api_key: "${OPENAI_API_KEY}" # Reads from environment
🚀 Advanced: Multi-Provider Setup
You can create different configs for different use cases:
# Fast local analysis
cp examples/config-beginner.yaml .mini-rag/config-local.yaml
# High-quality cloud analysis
cp examples/config-llm-providers.yaml .mini-rag/config-cloud.yaml
# Edit to use OpenAI/Claude
# Switch configs as needed
ln -sf config-local.yaml .mini-rag/config.yaml # Use local
ln -sf config-cloud.yaml .mini-rag/config.yaml # Use cloud
📚 Further Reading
- Ollama Model Library
- OpenRouter Pricing
- OpenAI API Documentation
- Anthropic Claude Documentation
- LM Studio Getting Started
💡 Pro Tip: Start with local Ollama for learning, then upgrade to cloud providers when you need production-quality analysis or are working with large codebases.