Semantic Caching Creates Collision Attacks

Why caching by embedding similarity introduces a new class of vulnerabilities

The Conventional Framing

Semantic caching improves latency and reduces costs by caching responses to similar queries. Instead of exact string matching, queries are embedded and cached responses are returned for semantically similar inputs.

The framing is operational efficiency: cache hits save compute, users get faster responses, costs go down.

Why This Creates New Vulnerabilities

Exact-match caching has a security property: you get the cached response only if your query is identical. Semantic caching breaks this property. You get a cached response if your query is "close enough" in embedding space.

"Close enough" is not a security boundary. It's an attack surface.

The collision problem:

Embedding models map infinite possible strings into finite-dimensional space. Different strings can map to nearby points. An attacker who understands your embedding model can craft queries that collide with cached responses.

The poisoning problem:

If an attacker can populate your cache, they control what future "similar" queries return. The cache becomes an injection persistence mechanism.

Architecture

Components:

  • Query embeddingencodes query for similarity lookup
  • Vector cachestores query embeddings with responses
  • Similarity thresholddefines 'close enough' for cache hits
  • Cache managementTTL, eviction, invalidation

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ CURRENT USER QUERY │ │ (untrusted) │ └───────────────────────┬─────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ SIMILARITY LOOKUP │ │ │ │ Does "similar" mean "same authorization context"? │ │ Does "similar" mean "safe to share response"? │ │ Almost certainly not. │ └───────────────────────┬─────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ CACHED RESPONSE │ │ (generated for different user/context) │ │ │ │ Now returned to current user │ └─────────────────────────────────────────────────────────┘
  1. Query → Similarity lookupadversarial queries find collisions
  2. Cache → Responsecached content from different context
  3. Cache write → Future readspoisoning persists

Threat Surface

ThreatVectorImpact
Collision attackCraft query that embeds near sensitive cached queryAccess responses from other users/contexts
Cache poisoningPopulate cache with malicious responsesInjection delivered to future similar queries
Cross-user leakageSimilarity doesn't respect user boundariesData exposure across authorization contexts
Embedding inversionAnalyze cache hits to infer cached queriesPrivacy violation, query reconstruction
Cache timing attacksMeasure response latency to detect cache hitsInformation leakage about other users' queries

The ZIVIS Position

  • Similarity is not authorization.Just because two queries are semantically similar doesn't mean they should share a response. Cache partitioning must respect authorization boundaries.
  • Per-user or per-session cache isolation.The simplest fix: don't share cached responses across users. You lose some efficiency, you gain actual security boundaries.
  • If sharing, responses must be authorization-neutral.Only cache responses safe to return to any user. This dramatically limits what's cacheable.
  • Embedding model security.If your embedding model is known, collision attacks are easier. Consider model diversity or perturbation.
  • Short TTLs limit poisoning.Poisoned entries expire faster with short TTLs. Balance efficiency against poisoning window.

What We Tell Clients

Semantic caching trades a security property (exact match) for an operational benefit (similarity match). That trade has consequences.

If you're caching across users or authorization contexts, you've created a cross-user data leakage vulnerability. If users can influence what gets cached, you've created an injection persistence mechanism.

Use semantic caching within a single user's session. Use it for public, authorization- neutral content. Don't use it as a shared cache for sensitive or personalized responses without understanding exactly what you're exposing.

Related Patterns