Cache Poisoning

· Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Advanced, long

Summary

The article discusses "Semantic caching" and its risks, particularly "Cache Poisoning" in agentic systems, contrasting it with "Prompt caching." Semantic caching reuses conclusions based on query similarity, which can lead to stale or malicious plans being executed. An experiment with a "fake DevOps planner agent" demonstrated this: a poisoned cache entry for "rotate staging database credentials in the legacy cluster" was returned for "Rotate staging database credentials for the new cluster" due to a 0.913 similarity score exceeding a 0.87 threshold, bypassing LLM reasoning. The author emphasizes that "similarity is not permission" and notes that current LLM validation approaches fail due to context bias. Seven methods are proposed to secure semantic caching, including mandatory metadata, query-time filtering, risk-based behavior, TTL + source-version invalidation, explicit trust tracking in agent graphs, serious logging of cache hits, and controlling writes from untrusted sources. The broader issue of "memory poisoning" in agents is also highlighted.

Key takeaway

For MLOps Engineers deploying agentic systems, relying solely on semantic cache similarity for decision reuse introduces significant "cache poisoning" risks. You must implement robust security measures like mandatory metadata, query-time filtering based on context and permissions, and explicit cache hit tracking within agent workflows. Avoid direct cache returns for high-risk operations; instead, use cached answers as candidate context for fresh LLM reasoning.

Key insights

Semantic caching, while cost-saving, risks "cache poisoning" in agentic systems by reusing conclusions based on similarity, not permission.

Principles

Method

The basic semantic caching flow involves embedding a query, searching a vector database for similar old entries, and returning a cached answer if similarity is high enough; otherwise, calling the LLM and storing the fresh answer.

In practice

Topics

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.