High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
-
Updated
Dec 2, 2025 - Python
High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
Tools, libraries, papers, and patterns for reducing the cost of running large language models in production.
Cut LLM agent token costs by 93%. Execution cache for LangChain, CrewAI, AutoGen — 2.66ms vs 20 seconds, zero tokens on repeat runs.
Agent skill for caching structured document briefings — summarize once, reuse everywhere. Reduces redundant LLM calls with fingerprint-based caching.
Add a description, image, and links to the llm-caching topic page so that developers can more easily learn about it.
To associate your repository with the llm-caching topic, visit your repo's landing page and select "manage topics."