Fss-Rag-Mini/tests/test_min_chunk_size.py
BobAi 4166d0a362 Initial release: FSS-Mini-RAG - Lightweight semantic code search system
🎯 Complete transformation from 5.9GB bloated system to 70MB optimized solution

 Key Features:
- Hybrid embedding system (Ollama + ML fallback + hash backup)
- Intelligent chunking with language-aware parsing
- Semantic + BM25 hybrid search with rich context
- Zero-config portable design with graceful degradation
- Beautiful TUI for beginners + powerful CLI for experts
- Comprehensive documentation with 8+ Mermaid diagrams
- Professional animated demo (183KB optimized GIF)

🏗️ Architecture Highlights:
- LanceDB vector storage with streaming indexing
- Smart file tracking (size/mtime) to avoid expensive rehashing
- Progressive chunking: Markdown headers → Python functions → fixed-size
- Quality filtering: 200+ chars, 20+ words, 30% alphanumeric content
- Concurrent batch processing with error recovery

📦 Package Contents:
- Core engine: claude_rag/ (11 modules, 2,847 lines)
- Entry points: rag-mini (unified), rag-tui (beginner interface)
- Documentation: README + 6 guides with visual diagrams
- Assets: 3D icon, optimized demo GIF, recording tools
- Tests: 8 comprehensive integration and validation tests
- Examples: Usage patterns, config templates, dependency analysis

🎥 Demo System:
- Scripted demonstration showing 12 files → 58 chunks indexing
- Semantic search with multi-line result previews
- Complete workflow from TUI startup to CLI mastery
- Professional recording pipeline with asciinema + GIF conversion

🛡️ Security & Quality:
- Complete .gitignore with personal data protection
- Dependency optimization (removed python-dotenv)
- Code quality validation and educational test suite
- Agent-reviewed architecture and documentation

Ready for production use - copy folder, run ./rag-mini, start searching\!
2025-08-12 16:38:28 +10:00

27 lines
689 B
Python

"""Test with smaller min_chunk_size."""
from claude_rag.chunker import CodeChunker
from pathlib import Path
test_code = '''"""Test module."""
import os
class MyClass:
def method(self):
return 42
def my_function():
return "hello"
'''
# Create chunker with smaller min_chunk_size
chunker = CodeChunker(min_chunk_size=1) # Allow tiny chunks
chunks = chunker.chunk_file(Path("test.py"), test_code)
print(f"Created {len(chunks)} chunks:")
for i, chunk in enumerate(chunks):
print(f"\nChunk {i}: {chunk.chunk_type} '{chunk.name}'")
print(f"Lines {chunk.start_line}-{chunk.end_line}")
print(f"Size: {len(chunk.content.splitlines())} lines")
print("-" * 40)