Cache Poisoning
Summary
The article discusses "Semantic caching" and its risks, particularly "Cache Poisoning" in agentic systems, contrasting it with "Prompt caching." Semantic caching reuses conclusions based on query similarity, which can lead to stale or malicious plans being executed. An experiment with a "fake DevOps planner agent" demonstrated this: a poisoned cache entry for "rotate staging database credentials in the legacy cluster" was returned for "Rotate staging database credentials for the new cluster" due to a 0.913 similarity score exceeding a 0.87 threshold, bypassing LLM reasoning. The author emphasizes that "similarity is not permission" and notes that current LLM validation approaches fail due to context bias. Seven methods are proposed to secure semantic caching, including mandatory metadata, query-time filtering, risk-based behavior, TTL + source-version invalidation, explicit trust tracking in agent graphs, serious logging of cache hits, and controlling writes from untrusted sources. The broader issue of "memory poisoning" in agents is also highlighted.
Key takeaway
For MLOps Engineers deploying agentic systems, relying solely on semantic cache similarity for decision reuse introduces significant "cache poisoning" risks. You must implement robust security measures like mandatory metadata, query-time filtering based on context and permissions, and explicit cache hit tracking within agent workflows. Avoid direct cache returns for high-risk operations; instead, use cached answers as candidate context for fresh LLM reasoning.
Key insights
Semantic caching, while cost-saving, risks "cache poisoning" in agentic systems by reusing conclusions based on similarity, not permission.
Principles
- Similarity does not equate to permission for reuse.
- Authorization is binary, similarity is continuous.
- Invisible infrastructure leads to fragile systems.
Method
The basic semantic caching flow involves embedding a query, searching a vector database for similar old entries, and returning a cached answer if similarity is high enough; otherwise, calling the LLM and storing the fresh answer.
In practice
- Implement mandatory metadata for cache entries.
- Filter cache searches at query time, not after.
- Track cache involvement explicitly through agent graphs.
Topics
- Semantic Caching
- Cache Poisoning
- Agentic Systems
- LLM Security
- Vector Databases
- MLOps
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.