Update model recommendations to Qwen3 4B and fix status command
- Changed primary model recommendation from qwen3:1.7b to qwen3:4b - Added Q8 quantization info in technical docs for production users - Fixed method name error: get_embedding_info() -> get_status() - Updated all error messages and test files with new recommendations - Maintained beginner-friendly options (1.7b still very good, 0.6b surprisingly good) - Added explanation of why small models work well with RAG context - Comprehensive testing completed - system ready for clean release
This commit is contained in:
parent
a96ddba3c9
commit
a1f84e2bd5
53
.mini-rag/config.yaml
Normal file
53
.mini-rag/config.yaml
Normal file
@ -0,0 +1,53 @@
|
|||||||
|
# FSS-Mini-RAG Configuration
|
||||||
|
# Edit this file to customize indexing and search behavior
|
||||||
|
# See docs/GETTING_STARTED.md for detailed explanations
|
||||||
|
|
||||||
|
# Text chunking settings
|
||||||
|
chunking:
|
||||||
|
max_size: 2000 # Maximum characters per chunk
|
||||||
|
min_size: 150 # Minimum characters per chunk
|
||||||
|
strategy: semantic # 'semantic' (language-aware) or 'fixed'
|
||||||
|
|
||||||
|
# Large file streaming settings
|
||||||
|
streaming:
|
||||||
|
enabled: true
|
||||||
|
threshold_bytes: 1048576 # Files larger than this use streaming (1MB)
|
||||||
|
|
||||||
|
# File processing settings
|
||||||
|
files:
|
||||||
|
min_file_size: 50 # Skip files smaller than this
|
||||||
|
exclude_patterns:
|
||||||
|
- "node_modules/**"
|
||||||
|
- ".git/**"
|
||||||
|
- "__pycache__/**"
|
||||||
|
- "*.pyc"
|
||||||
|
- ".venv/**"
|
||||||
|
- "venv/**"
|
||||||
|
- "build/**"
|
||||||
|
- "dist/**"
|
||||||
|
include_patterns:
|
||||||
|
- "**/*" # Include all files by default
|
||||||
|
|
||||||
|
# Embedding generation settings
|
||||||
|
embedding:
|
||||||
|
preferred_method: ollama # 'ollama', 'ml', 'hash', or 'auto'
|
||||||
|
ollama_model: nomic-embed-text
|
||||||
|
ollama_host: localhost:11434
|
||||||
|
ml_model: sentence-transformers/all-MiniLM-L6-v2
|
||||||
|
batch_size: 32 # Embeddings processed per batch
|
||||||
|
|
||||||
|
# Search behavior settings
|
||||||
|
search:
|
||||||
|
default_limit: 10 # Default number of results
|
||||||
|
enable_bm25: true # Enable keyword matching boost
|
||||||
|
similarity_threshold: 0.1 # Minimum similarity score
|
||||||
|
expand_queries: false # Enable automatic query expansion
|
||||||
|
|
||||||
|
# LLM synthesis and query expansion settings
|
||||||
|
llm:
|
||||||
|
ollama_host: localhost:11434
|
||||||
|
synthesis_model: auto # 'auto', 'qwen3:1.7b', etc.
|
||||||
|
expansion_model: auto # Usually same as synthesis_model
|
||||||
|
max_expansion_terms: 8 # Maximum terms to add to queries
|
||||||
|
enable_synthesis: false # Enable synthesis by default
|
||||||
|
synthesis_temperature: 0.3 # LLM temperature for analysis
|
||||||
1
.mini-rag/last_search
Normal file
1
.mini-rag/last_search
Normal file
@ -0,0 +1 @@
|
|||||||
|
chunking
|
||||||
@ -787,4 +787,36 @@ def repair_index(self, project_path: Path) -> bool:
|
|||||||
return False
|
return False
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## LLM Model Selection & Performance
|
||||||
|
|
||||||
|
### Model Recommendations by Use Case
|
||||||
|
|
||||||
|
FSS-Mini-RAG works well with various LLM sizes because our rich context and guided prompts help small models perform excellently:
|
||||||
|
|
||||||
|
**Recommended (Best Balance):**
|
||||||
|
- **qwen3:4b** - Excellent quality, good performance
|
||||||
|
- **qwen3:4b:q8_0** - High-precision quantized version for production
|
||||||
|
|
||||||
|
**Still Excellent (Faster/CPU-friendly):**
|
||||||
|
- **qwen3:1.7b** - Very good results, faster responses
|
||||||
|
- **qwen3:0.6b** - Surprisingly good considering size (522MB)
|
||||||
|
|
||||||
|
### Why Small Models Work Well Here
|
||||||
|
|
||||||
|
Small models can produce excellent results in RAG systems because:
|
||||||
|
|
||||||
|
1. **Rich Context**: Our chunking provides substantial context around each match
|
||||||
|
2. **Guided Prompts**: Well-structured prompts give models a clear "runway" to continue
|
||||||
|
3. **Specific Domain**: Code analysis is more predictable than general conversation
|
||||||
|
|
||||||
|
Without good context, small models tend to get lost and produce erratic output. But with RAG's rich context and focused prompts, even the 0.6B model can provide meaningful analysis.
|
||||||
|
|
||||||
|
### Quantization Benefits
|
||||||
|
|
||||||
|
For production deployments, consider quantized models like `qwen3:4b:q8_0`:
|
||||||
|
- **Q8_0**: 8-bit quantization with minimal quality loss
|
||||||
|
- **Smaller memory footprint**: ~50% reduction vs full precision
|
||||||
|
- **Better CPU performance**: Faster inference on CPU-only systems
|
||||||
|
- **Production ready**: Maintains analysis quality while improving efficiency
|
||||||
|
|
||||||
This technical guide provides the deep implementation details that developers need to understand, modify, and extend the system, while keeping the main README focused on getting users started quickly.
|
This technical guide provides the deep implementation details that developers need to understand, modify, and extend the system, while keeping the main README focused on getting users started quickly.
|
||||||
@ -165,7 +165,9 @@ python3 -c "import mini_rag; print('✅ Installation successful')"
|
|||||||
|
|
||||||
2. **Try different model:**
|
2. **Try different model:**
|
||||||
```bash
|
```bash
|
||||||
ollama pull qwen3:1.7b # Good balance of speed/quality
|
ollama pull qwen3:4b # Recommended: excellent quality
|
||||||
|
ollama pull qwen3:1.7b # Still very good, faster
|
||||||
|
ollama pull qwen3:0.6b # Surprisingly good for CPU-only
|
||||||
```
|
```
|
||||||
|
|
||||||
3. **Use synthesis mode instead of exploration:**
|
3. **Use synthesis mode instead of exploration:**
|
||||||
|
|||||||
@ -68,11 +68,14 @@ class LLMSynthesizer:
|
|||||||
# Modern model preference ranking (CPU-friendly first)
|
# Modern model preference ranking (CPU-friendly first)
|
||||||
# Prioritize: Ultra-efficient > Standard efficient > Larger models
|
# Prioritize: Ultra-efficient > Standard efficient > Larger models
|
||||||
model_rankings = [
|
model_rankings = [
|
||||||
|
# Recommended model (excellent quality)
|
||||||
|
"qwen3:4b",
|
||||||
|
|
||||||
# Ultra-efficient models (perfect for CPU-only systems)
|
# Ultra-efficient models (perfect for CPU-only systems)
|
||||||
"qwen3:0.6b", "qwen3:1.7b", "llama3.2:1b",
|
"qwen3:0.6b", "qwen3:1.7b", "llama3.2:1b",
|
||||||
|
|
||||||
# Standard efficient models
|
# Standard efficient models
|
||||||
"qwen2.5:1.5b", "qwen3:3b", "qwen3:4b",
|
"qwen2.5:1.5b", "qwen3:3b",
|
||||||
|
|
||||||
# Qwen2.5 models (excellent performance/size ratio)
|
# Qwen2.5 models (excellent performance/size ratio)
|
||||||
"qwen2.5-coder:1.5b", "qwen2.5:1.5b", "qwen2.5:3b", "qwen2.5-coder:3b",
|
"qwen2.5-coder:1.5b", "qwen2.5:1.5b", "qwen2.5:3b", "qwen2.5-coder:3b",
|
||||||
|
|||||||
@ -117,7 +117,12 @@ def search_project(project_path: Path, query: str, limit: int = 10, synthesize:
|
|||||||
|
|
||||||
for i, result in enumerate(results, 1):
|
for i, result in enumerate(results, 1):
|
||||||
# Clean up file path display
|
# Clean up file path display
|
||||||
rel_path = result.file_path.relative_to(project_path) if result.file_path.is_absolute() else result.file_path
|
file_path = Path(result.file_path)
|
||||||
|
try:
|
||||||
|
rel_path = file_path.relative_to(project_path)
|
||||||
|
except ValueError:
|
||||||
|
# If relative_to fails, just show the basename
|
||||||
|
rel_path = file_path.name
|
||||||
|
|
||||||
print(f"{i}. {rel_path}")
|
print(f"{i}. {rel_path}")
|
||||||
print(f" Score: {result.score:.3f}")
|
print(f" Score: {result.score:.3f}")
|
||||||
@ -236,7 +241,7 @@ def status_check(project_path: Path):
|
|||||||
print("🧠 Embedding System:")
|
print("🧠 Embedding System:")
|
||||||
try:
|
try:
|
||||||
embedder = OllamaEmbedder()
|
embedder = OllamaEmbedder()
|
||||||
emb_info = embedder.get_embedding_info()
|
emb_info = embedder.get_status()
|
||||||
method = emb_info.get('method', 'unknown')
|
method = emb_info.get('method', 'unknown')
|
||||||
|
|
||||||
if method == 'ollama':
|
if method == 'ollama':
|
||||||
|
|||||||
@ -514,7 +514,7 @@ class SimpleTUI:
|
|||||||
from mini_rag.ollama_embeddings import OllamaEmbedder
|
from mini_rag.ollama_embeddings import OllamaEmbedder
|
||||||
|
|
||||||
embedder = OllamaEmbedder()
|
embedder = OllamaEmbedder()
|
||||||
info = embedder.get_embedding_info()
|
info = embedder.get_status()
|
||||||
|
|
||||||
print("🧠 Embedding System:")
|
print("🧠 Embedding System:")
|
||||||
method = info.get('method', 'unknown')
|
method = info.get('method', 'unknown')
|
||||||
|
|||||||
@ -68,7 +68,7 @@ class TestOllamaIntegration(unittest.TestCase):
|
|||||||
if len(models) > 5:
|
if len(models) > 5:
|
||||||
print(f" ... and {len(models)-5} more")
|
print(f" ... and {len(models)-5} more")
|
||||||
else:
|
else:
|
||||||
print(" ⚠️ No models found. Install with: ollama pull qwen3:1.7b")
|
print(" ⚠️ No models found. Install with: ollama pull qwen3:4b")
|
||||||
|
|
||||||
self.assertTrue(True)
|
self.assertTrue(True)
|
||||||
else:
|
else:
|
||||||
@ -146,7 +146,7 @@ class TestOllamaIntegration(unittest.TestCase):
|
|||||||
if not synthesizer.is_available():
|
if not synthesizer.is_available():
|
||||||
self.fail(
|
self.fail(
|
||||||
"❌ No LLM models available.\n"
|
"❌ No LLM models available.\n"
|
||||||
" 💡 Install a model like: ollama pull qwen3:1.7b"
|
" 💡 Install a model like: ollama pull qwen3:4b"
|
||||||
)
|
)
|
||||||
|
|
||||||
print(f" ✅ Found {len(synthesizer.available_models)} LLM models")
|
print(f" ✅ Found {len(synthesizer.available_models)} LLM models")
|
||||||
@ -426,7 +426,7 @@ def run_troubleshooting():
|
|||||||
print("💡 Common Solutions:")
|
print("💡 Common Solutions:")
|
||||||
print(" • Install Ollama: https://ollama.ai/download")
|
print(" • Install Ollama: https://ollama.ai/download")
|
||||||
print(" • Start server: ollama serve")
|
print(" • Start server: ollama serve")
|
||||||
print(" • Install models: ollama pull qwen3:1.7b")
|
print(" • Install models: ollama pull qwen3:4b")
|
||||||
print(" • Install embedding model: ollama pull nomic-embed-text")
|
print(" • Install embedding model: ollama pull nomic-embed-text")
|
||||||
print()
|
print()
|
||||||
print("📚 For more help, see docs/QUERY_EXPANSION.md")
|
print("📚 For more help, see docs/QUERY_EXPANSION.md")
|
||||||
|
|||||||
@ -50,7 +50,7 @@ def main():
|
|||||||
print(" • Check docs/QUERY_EXPANSION.md for setup help")
|
print(" • Check docs/QUERY_EXPANSION.md for setup help")
|
||||||
print(" • Ensure Ollama is installed: https://ollama.ai/download")
|
print(" • Ensure Ollama is installed: https://ollama.ai/download")
|
||||||
print(" • Start Ollama server: ollama serve")
|
print(" • Start Ollama server: ollama serve")
|
||||||
print(" • Install models: ollama pull qwen3:1.7b")
|
print(" • Install models: ollama pull qwen3:4b")
|
||||||
|
|
||||||
def run_test(test_file):
|
def run_test(test_file):
|
||||||
"""Run a specific test file."""
|
"""Run a specific test file."""
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user