Jump to pattern

Semantic Caching Creates Collision Attacks

Why caching by embedding similarity introduces a new class of vulnerabilities

The Conventional Framing

Semantic caching improves latency and reduces costs by caching responses to similar queries. Instead of exact string matching, queries are embedded and cached responses are returned for semantically similar inputs.

The framing is operational efficiency: cache hits save compute, users get faster responses, costs go down.

Why This Creates New Vulnerabilities

Exact-match caching has a security property: you get the cached response only if your query is identical. Semantic caching breaks this property. You get a cached response if your query is "close enough" in embedding space.

"Close enough" is not a security boundary. It's an attack surface.

The collision problem:

Embedding models map infinite possible strings into finite-dimensional space. Different strings can map to nearby points. An attacker who understands your embedding model can craft queries that collide with cached responses.

The poisoning problem:

If an attacker can populate your cache, they control what future "similar" queries return. The cache becomes an injection persistence mechanism.

Architecture

Components:

Query embedding— encodes query for similarity lookup
Vector cache— stores query embeddings with responses
Similarity threshold— defines 'close enough' for cache hits
Cache management— TTL, eviction, invalidation

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ CURRENT USER QUERY │ │ (untrusted) │ └───────────────────────┬─────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ SIMILARITY LOOKUP │ │ │ │ Does "similar" mean "same authorization context"? │ │ Does "similar" mean "safe to share response"? │ │ Almost certainly not. │ └───────────────────────┬─────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ CACHED RESPONSE │ │ (generated for different user/context) │ │ │ │ Now returned to current user │ └─────────────────────────────────────────────────────────┘

Query → Similarity lookup — adversarial queries find collisions
Cache → Response — cached content from different context
Cache write → Future reads — poisoning persists

Threat Surface

Threat	Vector	Impact
Collision attack	Craft query that embeds near sensitive cached query	Access responses from other users/contexts
Cache poisoning	Populate cache with malicious responses	Injection delivered to future similar queries
Cross-user leakage	Similarity doesn't respect user boundaries	Data exposure across authorization contexts
Embedding inversion	Analyze cache hits to infer cached queries	Privacy violation, query reconstruction
Cache timing attacks	Measure response latency to detect cache hits	Information leakage about other users' queries

The ZIVIS Position

•
Similarity is not authorization.Just because two queries are semantically similar doesn't mean they should share a response. Cache partitioning must respect authorization boundaries.
•
Per-user or per-session cache isolation.The simplest fix: don't share cached responses across users. You lose some efficiency, you gain actual security boundaries.
•
If sharing, responses must be authorization-neutral.Only cache responses safe to return to any user. This dramatically limits what's cacheable.
•
Embedding model security.If your embedding model is known, collision attacks are easier. Consider model diversity or perturbation.
•
Short TTLs limit poisoning.Poisoned entries expire faster with short TTLs. Balance efficiency against poisoning window.

What We Tell Clients

Semantic caching trades a security property (exact match) for an operational benefit (similarity match). That trade has consequences.

If you're caching across users or authorization contexts, you've created a cross-user data leakage vulnerability. If users can influence what gets cached, you've created an injection persistence mechanism.

Use semantic caching within a single user's session. Use it for public, authorization- neutral content. Don't use it as a shared cache for sensitive or personalized responses without understanding exactly what you're exposing.

Related Patterns

Naive RAG— semantic caching has similar injection concerns
Canary Tokens— could detect some cache leakage
Audit Logging— log cache hits for forensics

Authoring the Agent Trust Protocol — the open standard for agentic trust attestation, currently under IETF review
Jim Goldman: Salesforce’s first VP of Global Security GRC, FBI Cybercrime Task Force, Purdue cyber forensics founder
Jake Miller: Co-Founder & CEO. 25 years engineering complex enterprise systems, now applied to AI offensive security
Proprietary ZIVIS platform: 120+ adversarial AI attack scenarios, continuous coverage across OWASP Web, API, LLM, and Agentic AI
Mesh Mesh: approved Salesforce sub-processor. Every review stage cleared.