Hypothetical Answers Can Be Poisoned
Why generating hypothetical documents for retrieval creates an injection bypass
The Conventional Framing
HyDE (Hypothetical Document Embeddings) generates a hypothetical answer to the query, then uses that answer's embedding for retrieval. The intuition is that similar documents will be closer in embedding space to the hypothetical answer than to the original query.
The pattern addresses the vocabulary mismatch problem—queries use different words than documents.
Why This Is Dangerous
The hypothetical generation is performed by an LLM operating on user input. If that input contains an injection, the hypothetical document can encode whatever the attacker wants—and that content will be used for retrieval.
This is injection that directly controls what gets retrieved, not just influences it.
Architecture
Components:
- Query— original user question
- Hypothetical generator— LLM creates imagined answer
- Hypothetical document— generated content used for search
- Retrieval— finds real docs similar to hypothetical
Trust Boundaries
- Query → Generator — injection enters hypothetical generation
- Generator → Retrieval — poisoned hypothetical controls search
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Hypothetical poisoning | Injection controls what hypothetical document says | Retrieval searches for attacker-specified content |
| Content steering | Craft hypotheticals that retrieve specific documents | Targeted extraction of sensitive documents |
| Schema leakage | Hypothetical generation reveals document structure | Information about corpus exposed |
The ZIVIS Position
- •HyDE adds control, not safety.The hypothetical document gives attackers another control point. They can influence both query and retrieval target.
- •Validate hypothetical content.Before using hypothetical for retrieval, check it's plausibly related to the original query. Reject hypotheticals that diverge significantly.
- •Consider the trade-off.HyDE improves retrieval quality but significantly expands attack surface. Is the quality gain worth the security cost?
What We Tell Clients
HyDE gives attackers a way to directly specify what they want retrieved. The hypothetical document is generated from their input and used as the search target.
If you use HyDE, validate that hypotheticals are semantically related to queries and don't contain obvious injection attempts. Consider whether the retrieval quality improvement justifies the security cost.
Related Patterns
- Query Rewriting— lighter-weight query transformation
- Self-RAG— self-evaluation with similar issues