FSSCoding 10862583ad Add batch LLM classifier tool with prompt caching optimization
- Created standalone batch_llm_classifier.py for custom email queries
- Optimized all LLM prompts for caching (static instructions first, variables last)
- Configured rtx3090 vLLM endpoint (qwen3-coder-30b)
- Tested batch_size=4 optimal (100% success, 4.65 req/sec)
- Added comprehensive documentation (tools/README.md, BATCH_LLM_QUICKSTART.md)

Tool is completely separate from main ML pipeline - no interference.
Prerequisite: vLLM server must be running at rtx3090.bobai.com.au
2025-11-14 16:01:57 +11:00
..