fss-mini-rag-github

Author	SHA1	Message	Date
BobAi	5f42751e9a	🛡️ Add comprehensive LLM safeguards and dual-mode demo scripts 🛡️ SMART MODEL SAFEGUARDS: - Implement runaway prevention with pattern detection (repetition, thinking loops, rambling) - Add context length management with optimal parameters per model size - Quality validation prevents problematic responses before reaching users - Helpful explanations when issues occur with recovery suggestions - Model-specific parameter optimization (qwen3:0.6b vs 1.7b vs 3b+) - Timeout protection and graceful degradation ⚡ OPTIMAL PERFORMANCE SETTINGS: - Context window: 32k tokens for good balance - Repeat penalty: 1.15 for 0.6b, 1.1 for 1.7b, 1.05 for larger models - Presence penalty: 1.5 for quantized models to prevent repetition - Smart output limits: 1500 tokens for 0.6b, 2000+ for larger models - Top-p/top-k tuning based on research best practices 🎬 DUAL-MODE DEMO SCRIPTS: - create_synthesis_demo.py: Shows fast search with AI synthesis workflow - create_exploration_demo.py: Interactive thinking mode with conversation memory - Realistic typing simulation and response timing for quality GIFs - Clear demonstration of when to use each mode Perfect for creating compelling demo videos showing both RAG experiences!	2025-08-12 19:07:48 +10:00
BobAi	bebb0016d0	Implement clean model state management with user confirmation - Add user confirmation before stopping models for optimal mode switching - Clean separation: synthesis mode never uses thinking, exploration always does - Add intelligent restart detection based on response quality heuristics - Include helpful guidance messages suggesting exploration mode for deep analysis - Default synthesis mode to no-thinking for consistent fast responses - Handle graceful fallbacks when model stop fails or user declines - Provide clear explanations for why model restart improves thinking quality	2025-08-12 18:15:30 +10:00
BobAi	a7e3e6f474	Add interactive exploration mode with thinking and context memory - Create separate explore mode with thinking enabled for debugging/learning - Add lazy loading with LLM warmup using 'testing, just say "hi" <no_think>' - Implement context-aware conversation memory across questions - Add interactive CLI with help, summary, and session management - Enable Qwen3 thinking mode toggle for experimentation - Support multi-turn conversations for better debugging workflow - Clean separation between fast synthesis and deep exploration modes	2025-08-12 18:06:08 +10:00
BobAi	16199375fc	Add CPU-only deployment support with qwen3:0.6b model - Update model rankings to prioritize ultra-efficient CPU models (qwen3:0.6b first) - Add comprehensive CPU deployment documentation with performance benchmarks - Configure CPU-optimized settings in default config - Enable 796MB total model footprint for standard systems - Support Raspberry Pi, older laptops, and CPU-only environments - Maintain excellent quality with 522MB qwen3:0.6b model	2025-08-12 17:49:02 +10:00
BobAi	ba28246178	Add LLM synthesis feature with smart model selection and increase default results to 10 🧠 NEW: LLM Synthesis Feature - Intelligent analysis of RAG search results using Ollama LLMs - Smart model selection: Qwen3 → Qwen2.5 → Mistral → Llama3.2 - Prioritizes efficient models (1.5B-3B parameters) for best performance - Structured output: summary, key findings, code patterns, suggested actions - Confidence scoring for result reliability - Graceful fallback with setup instructions if Ollama unavailable 📊 Enhanced Search Experience - Increased default search results from 5 to 10 across all components - Updated demo script to show all 8 results with richer previews - Better user experience with more comprehensive result sets 🎯 New CLI Options - Added --synthesize/-s flag: rag-mini search project "query" --synthesize - Zero-configuration setup - automatically detects best available model - Never downloads models - only uses what's already installed 🧪 Tested with qwen3:1.7b - Confirmed excellent performance with 1.7B parameter model - Professional-grade analysis including security recommendations - Fast response times with quality RAG context Perfect for users who already have Ollama - transforms FSS-Mini-RAG from search tool into AI-powered code assistant\!	2025-08-12 17:12:51 +10:00

5 Commits