This comprehensive update enhances user experience with several key improvements: ## Enhanced Streaming & Thinking Display - Implement real-time streaming with gray thinking tokens that collapse after completion - Fix thinking token redisplay bug with proper content filtering - Add clear "AI Response:" headers to separate thinking from responses - Enable streaming by default for better user engagement - Keep thinking visible for exploration, collapse only for suggested questions ## Natural Conversation Responses - Convert clunky JSON exploration responses to natural, conversational format - Improve exploration prompts for friendly, colleague-style interactions - Update summary generation with better context handling - Eliminate double response display issues ## Model Reference Updates - Remove all llama3.2 references in favor of qwen3 models - Fix non-existent qwen3:3b references, replace with proper model names - Update model rankings to prioritize working qwen models across all components - Ensure consistent model recommendations in docs and examples ## Cross-Platform Icon Integration - Add desktop icon setup to Linux installer with .desktop entry - Add Windows shortcuts for desktop and Start Menu integration - Improve installer user experience with visual branding ## Configuration & Navigation Fixes - Fix "0" option in configuration menu to properly go back - Improve configuration menu user-friendliness - Update troubleshooting guides with correct model suggestions These changes significantly improve the beginner experience while maintaining technical accuracy and system reliability.
3.1 KiB
3.1 KiB
Query Expansion Guide
What Is Query Expansion?
Query expansion automatically adds related terms to your search to find more relevant results.
Example:
- You search:
"authentication" - System expands to:
"authentication login user verification credentials security" - Result: 2-3x more relevant matches!
How It Works
graph LR
A[User Query] --> B[LLM Expands]
B --> C[Enhanced Search]
C --> D[Better Results]
style A fill:#e1f5fe
style D fill:#e8f5e8
- Your query goes to a small, fast LLM (like qwen3:1.7b)
- LLM adds related terms that people might use when writing about the topic
- Both semantic and keyword search use the expanded query
- You get much better results without changing anything
When Is It Enabled?
- ❌ CLI commands: Disabled by default (for speed)
- ✅ TUI interface: Auto-enabled (when you have time to explore)
- ⚙️ Configurable: Can be enabled/disabled in config.yaml
Configuration
Easy Configuration (TUI)
Use the interactive Configuration Manager in the TUI:
- Start TUI:
./rag-tuiorrag.bat(Windows) - Select Option 6: Configuration Manager
- Choose Option 2: Toggle query expansion
- Follow prompts: Get explanation and easy on/off toggle
The TUI will:
- Explain benefits and requirements clearly
- Check if Ollama is available
- Show current status (enabled/disabled)
- Save changes automatically
Manual Configuration (Advanced)
Edit config.yaml directly:
# Search behavior settings
search:
expand_queries: false # Enable automatic query expansion
# LLM expansion settings
llm:
max_expansion_terms: 8 # How many terms to add
expansion_model: auto # Which model to use
ollama_host: localhost:11434 # Ollama server
Performance
- Speed: ~100ms on most systems (depends on your hardware)
- Caching: Repeated queries are instant
- Model Selection: Automatically uses fastest available model
Examples
Code Search:
"error handling" → "error handling exception try catch fault tolerance recovery"
Documentation Search:
"installation" → "installation setup install deploy configuration getting started"
Any Content:
"budget planning" → "budget planning financial forecast cost analysis spending plan"
Troubleshooting
Query expansion not working?
- Check if Ollama is running:
curl http://localhost:11434/api/tags - Verify you have a model installed:
ollama list - Check logs with
--verboseflag
Too slow?
- Disable in config.yaml:
expand_queries: false - Or use faster model:
expansion_model: "qwen3:0.6b"
Poor expansions?
- Try different model:
expansion_model: "qwen3:1.7b" - Reduce terms:
max_expansion_terms: 5
Technical Details
The QueryExpander class:
- Uses temperature 0.1 for consistent results
- Limits expansions to prevent very long queries
- Handles model selection automatically
- Includes smart caching to avoid repeated calls
Perfect for beginners because it "just works" - enable it when you want better results, disable when you want maximum speed.