BobAi a84ff94fba Improve UX with streaming tokens, fix model references, and add icon integration

This comprehensive update enhances user experience with several key improvements:

## Enhanced Streaming & Thinking Display
- Implement real-time streaming with gray thinking tokens that collapse after completion
- Fix thinking token redisplay bug with proper content filtering
- Add clear "AI Response:" headers to separate thinking from responses
- Enable streaming by default for better user engagement
- Keep thinking visible for exploration, collapse only for suggested questions

## Natural Conversation Responses
- Convert clunky JSON exploration responses to natural, conversational format
- Improve exploration prompts for friendly, colleague-style interactions
- Update summary generation with better context handling
- Eliminate double response display issues

## Model Reference Updates
- Remove all llama3.2 references in favor of qwen3 models
- Fix non-existent qwen3:3b references, replace with proper model names
- Update model rankings to prioritize working qwen models across all components
- Ensure consistent model recommendations in docs and examples

## Cross-Platform Icon Integration
- Add desktop icon setup to Linux installer with .desktop entry
- Add Windows shortcuts for desktop and Start Menu integration
- Improve installer user experience with visual branding

## Configuration & Navigation Fixes
- Fix "0" option in configuration menu to properly go back
- Improve configuration menu user-friendliness
- Update troubleshooting guides with correct model suggestions

These changes significantly improve the beginner experience while maintaining
technical accuracy and system reliability.

2025-08-15 12:20:06 +10:00

7.2 KiB

Raw Blame History

🤖 LLM Provider Setup Guide

This guide shows how to configure FSS-Mini-RAG with different LLM providers for synthesis and query expansion features.

🎯 Quick Provider Comparison

Provider	Cost	Setup Difficulty	Quality	Privacy	Internet Required
Ollama	Free	Easy	Good	Excellent	No
LM Studio	Free	Easy	Good	Excellent	No
OpenRouter	Low ($0.10-0.50/M)	Medium	Excellent	Fair	Yes
OpenAI	Medium ($0.15-2.50/M)	Medium	Excellent	Fair	Yes
Anthropic	Medium-High	Medium	Excellent	Fair	Yes

🏠 Local Providers (Recommended for Beginners)

Ollama (Default)

Best for: Privacy, learning, no ongoing costs

llm:
  provider: ollama
  ollama_host: localhost:11434
  synthesis_model: qwen3:1.7b
  expansion_model: qwen3:1.7b
  enable_synthesis: false
  synthesis_temperature: 0.3
  cpu_optimized: true
  enable_thinking: true

Setup:

Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
Start service: ollama serve
Download model: ollama pull qwen3:1.7b
Test: ./rag-mini search /path/to/project "test" --synthesize

Recommended Models:

qwen3:0.6b - Ultra-fast, good for CPU-only systems
qwen3:1.7b - Balanced quality and speed (recommended)
qwen3:4b - Higher quality, excellent for most use cases

LM Studio

Best for: GUI users, model experimentation

llm:
  provider: openai
  api_base: http://localhost:1234/v1
  api_key: "not-needed"
  synthesis_model: "any"
  expansion_model: "any"
  enable_synthesis: false
  synthesis_temperature: 0.3

Setup:

Download LM Studio
Install any model from the catalog
Start local server (default port 1234)
Use config above

☁️ Cloud Providers (For Advanced Users)

OpenRouter (Best Value)

Best for: Access to many models, reasonable pricing

llm:
  provider: openai
  api_base: https://openrouter.ai/api/v1
  api_key: "your-api-key-here"
  synthesis_model: "meta-llama/llama-3.1-8b-instruct:free"
  expansion_model: "meta-llama/llama-3.1-8b-instruct:free"
  enable_synthesis: false
  synthesis_temperature: 0.3
  timeout: 30

Setup:

Sign up at openrouter.ai
Create API key in dashboard
Add $5-10 credits (goes far with efficient models)
Replace your-api-key-here with actual key

Budget Models:

meta-llama/llama-3.1-8b-instruct:free - Free tier
openai/gpt-4o-mini - $0.15 per million tokens
anthropic/claude-3-haiku - $0.25 per million tokens

OpenAI (Premium Quality)

Best for: Reliability, advanced features

llm:
  provider: openai
  api_key: "your-openai-api-key"
  synthesis_model: "gpt-4o-mini"
  expansion_model: "gpt-4o-mini"
  enable_synthesis: false
  synthesis_temperature: 0.3
  timeout: 30

Setup:

Sign up at platform.openai.com
Add payment method
Create API key
Start with gpt-4o-mini for cost efficiency

Anthropic Claude (Code Expert)

Best for: Code analysis, thoughtful responses

llm:
  provider: anthropic
  api_key: "your-anthropic-api-key"
  synthesis_model: "claude-3-haiku-20240307"
  expansion_model: "claude-3-haiku-20240307"
  enable_synthesis: false
  synthesis_temperature: 0.3
  timeout: 30

Setup:

Sign up at console.anthropic.com
Add credits to account
Create API key
Start with Claude Haiku for budget-friendly option

🧪 Testing Your Setup

1. Basic Functionality Test

# Test without LLM (should always work)
./rag-mini search /path/to/project "authentication"

2. Synthesis Test

# Test LLM integration
./rag-mini search /path/to/project "authentication" --synthesize

3. Interactive Test

# Test exploration mode
./rag-mini explore /path/to/project
# Then ask: "How does authentication work in this codebase?"

4. Query Expansion Test

Enable expand_queries: true in config, then:

./rag-mini search /path/to/project "auth"
# Should automatically expand to "auth authentication login user session"

🛠️ Configuration Tips

For Budget-Conscious Users

llm:
  synthesis_model: "gpt-4o-mini"  # or claude-haiku
  enable_synthesis: false         # Manual control
  synthesis_temperature: 0.1     # Factual responses
  max_expansion_terms: 4          # Shorter expansions

For Quality-Focused Users

llm:
  synthesis_model: "gpt-4o"       # or claude-sonnet
  enable_synthesis: true          # Always on
  synthesis_temperature: 0.3     # Balanced creativity
  enable_thinking: true           # Show reasoning
  max_expansion_terms: 8          # Comprehensive expansion

For Privacy-Focused Users

# Use only local providers
embedding:
  preferred_method: ollama        # Local embeddings
llm:
  provider: ollama               # Local LLM
  # Never use cloud providers

🔧 Troubleshooting

Connection Issues

Local: Ensure Ollama/LM Studio is running: ps aux | grep ollama
Cloud: Check API key and internet: curl -H "Authorization: Bearer $API_KEY" https://api.openai.com/v1/models

Model Not Found

Ollama: ollama pull model-name
Cloud: Check provider's model list documentation

High Costs

Use mini/haiku models instead of full versions
Set enable_synthesis: false and use --synthesize selectively
Reduce max_expansion_terms to 4-6

Poor Quality

Try higher-tier models (gpt-4o, claude-sonnet)
Adjust synthesis_temperature (0.1 = factual, 0.5 = creative)
Enable expand_queries for better search coverage

Slow Responses

Local: Try smaller models (qwen3:0.6b)
Cloud: Increase timeout or switch providers
General: Reduce max_size in chunking config

📋 Environment Variables (Alternative Setup)

Instead of putting API keys in config files, use environment variables:

# In your shell profile (.bashrc, .zshrc, etc.)
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export OPENROUTER_API_KEY="your-openrouter-key"

Then in config:

llm:
  api_key: "${OPENAI_API_KEY}"  # Reads from environment

🚀 Advanced: Multi-Provider Setup

You can create different configs for different use cases:

# Fast local analysis
cp examples/config-beginner.yaml .mini-rag/config-local.yaml

# High-quality cloud analysis  
cp examples/config-llm-providers.yaml .mini-rag/config-cloud.yaml
# Edit to use OpenAI/Claude

# Switch configs as needed
ln -sf config-local.yaml .mini-rag/config.yaml   # Use local
ln -sf config-cloud.yaml .mini-rag/config.yaml   # Use cloud

📚 Further Reading

💡 Pro Tip: Start with local Ollama for learning, then upgrade to cloud providers when you need production-quality analysis or are working with large codebases.

7.2 KiB Raw Blame History

🤖 LLM Provider Setup Guide

🎯 Quick Provider Comparison

🏠 Local Providers (Recommended for Beginners)

Ollama (Default)

LM Studio

☁️ Cloud Providers (For Advanced Users)

OpenRouter (Best Value)

OpenAI (Premium Quality)

Anthropic Claude (Code Expert)

🧪 Testing Your Setup

1. Basic Functionality Test

2. Synthesis Test

3. Interactive Test

4. Query Expansion Test

🛠️ Configuration Tips

For Budget-Conscious Users

For Quality-Focused Users

For Privacy-Focused Users

🔧 Troubleshooting

Connection Issues

Model Not Found

High Costs

Poor Quality

Slow Responses

📋 Environment Variables (Alternative Setup)

🚀 Advanced: Multi-Provider Setup

📚 Further Reading

7.2 KiB

Raw Blame History