Complete rebrand for v1.0-simple-search branch: Directory Changes: - claude_rag/ → mini_rag/ (preserving git history) Content Changes: - Updated all imports: from claude_rag → from mini_rag - Updated all file paths: .claude-rag → .mini-rag - Updated documentation and comments - Updated configuration files and examples - Updated all tests to use mini_rag imports This ensures complete independence from Claude/Anthropic branding while maintaining all functionality and git history. Simple branch contains the basic RAG system without LLM features.
4.9 KiB
4.9 KiB
Getting Started with FSS-Mini-RAG
Step 1: Installation
Choose your installation based on what you want:
Option A: Ollama Only (Recommended)
# Install Ollama first
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the embedding model
ollama pull nomic-embed-text
# Install Python dependencies
pip install -r requirements.txt
Option B: Full ML Stack
# Install everything including PyTorch
pip install -r requirements-full.txt
Step 2: Test Installation
# Index this RAG system itself
./rag-mini index ~/my-project
# Search for something
./rag-mini search ~/my-project "chunker function"
# Check what got indexed
./rag-mini status ~/my-project
Step 3: Index Your First Project
# Index any project directory
./rag-mini index /path/to/your/project
# The system creates .mini-rag/ directory with:
# - config.json (settings)
# - manifest.json (file tracking)
# - database.lance/ (vector database)
Step 4: Search Your Code
# Basic semantic search
./rag-mini search /path/to/project "user login logic"
# Enhanced search with smart features
./rag-mini-enhanced search /path/to/project "authentication"
# Find similar patterns
./rag-mini-enhanced similar /path/to/project "def validate_input"
Step 5: Customize Configuration
Edit project/.mini-rag/config.json:
{
"chunking": {
"max_size": 3000,
"strategy": "semantic"
},
"files": {
"min_file_size": 100
}
}
Then re-index to apply changes:
./rag-mini index /path/to/project --force
Common Use Cases
Find Functions by Name
./rag-mini search /project "function named connect_to_database"
Find Code Patterns
./rag-mini search /project "error handling try catch"
./rag-mini search /project "database query with parameters"
Find Configuration
./rag-mini search /project "database connection settings"
./rag-mini search /project "environment variables"
Find Documentation
./rag-mini search /project "how to deploy"
./rag-mini search /project "API documentation"
Python API Usage
from mini_rag import ProjectIndexer, CodeSearcher, CodeEmbedder
from pathlib import Path
# Initialize
project_path = Path("/path/to/your/project")
embedder = CodeEmbedder()
indexer = ProjectIndexer(project_path, embedder)
searcher = CodeSearcher(project_path, embedder)
# Index the project
print("Indexing project...")
result = indexer.index_project()
print(f"Indexed {result['files_processed']} files, {result['chunks_created']} chunks")
# Search
print("\nSearching for authentication code...")
results = searcher.search("user authentication logic", limit=5)
for i, result in enumerate(results, 1):
print(f"\n{i}. {result.file_path}")
print(f" Score: {result.score:.3f}")
print(f" Type: {result.chunk_type}")
print(f" Content: {result.content[:100]}...")
Advanced Features
Auto-optimization
# Get optimization suggestions
./rag-mini-enhanced analyze /path/to/project
# This analyzes your codebase and suggests:
# - Better chunk sizes for your language mix
# - Streaming settings for large files
# - File filtering optimizations
File Watching
from mini_rag import FileWatcher
# Watch for file changes and auto-update index
watcher = FileWatcher(project_path, indexer)
watcher.start_watching()
# Now any file changes automatically update the index
Custom Chunking
from mini_rag import CodeChunker
chunker = CodeChunker()
# Chunk a Python file
with open("example.py") as f:
content = f.read()
chunks = chunker.chunk_text(content, "python", "example.py")
for chunk in chunks:
print(f"Type: {chunk.chunk_type}")
print(f"Content: {chunk.content}")
Tips and Best Practices
For Better Search Results
- Use descriptive phrases: "function that validates email addresses"
- Try different phrasings if first search doesn't work
- Search for concepts, not just exact variable names
For Better Indexing
- Exclude build directories:
node_modules/,build/,dist/ - Include documentation files - they often contain valuable context
- Use semantic chunking strategy for most projects
For Configuration
- Start with default settings
- Use
analyzecommand to get optimization suggestions - Increase chunk size for larger functions/classes
- Decrease chunk size for more granular search
For Troubleshooting
- Check
./rag-mini statusto see what was indexed - Look at
.mini-rag/manifest.jsonfor file details - Run with
--forceto completely rebuild index - Check logs in
.mini-rag/directory for errors
What's Next?
-
Try the test suite to understand how components work:
python -m pytest tests/ -v -
Look at the examples in
examples/directory -
Read the main README.md for complete technical details
-
Customize the system for your specific project needs