RAG Is Not a Security Architecture
Why retrieval-augmented generation is an injection surface, not just a data pipeline
The Conventional Framing
RAG reduces hallucination by grounding generation in retrieved facts. Security concerns focus on access control over the document store—making sure users can only retrieve documents they're authorized to see.
The architecture seems straightforward: embed the query, search the vector store, inject relevant chunks into the prompt, generate. Security is a data access problem.
Why This Is Insufficient
Most RAG implementations treat retrieval as a data problem—get the right documents to the model. But retrieval is an injection surface. Every chunk you retrieve is untrusted input that will be interpreted by a reasoning system.
The industry is repeating the SQL injection mistake. We spent a decade treating database queries as a data flow problem before understanding them as an interpreter boundary. RAG has the same structure: untrusted content crosses into an interpreter (the LLM) that can't distinguish instruction from data.
The uncomfortable realities:
- Your documents are attack surface. Every PDF, Confluence page, and Slack export you index is a potential payload. You're not just exposing data—you're exposing influence.
- Access control is necessary but not sufficient. Even if users only retrieve documents they're authorized to see, those documents can contain injected instructions. Your own employees' notes can compromise your system.
- "Sanitization" is largely theater. There's no reliable way to strip adversarial content from natural language while preserving semantic value. You can't regex your way out of this.
Architecture
Components:
- Embedding model— encodes query for similarity search
- Vector store— indexes document embeddings
- Retrieval layer— top-k selection, optional reranking
- Context assembly— constructs prompt with retrieved content
- Inference model— generates response
Trust Boundaries
- Query → Embedding model — user input influences retrieval
- Vector store → Context assembly — retrieved content is untrusted
- Assembled prompt → LLM — the critical boundary most ignore
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Indirect prompt injection | Malicious instructions embedded in indexed documents | Arbitrary behavior change, data exfiltration |
| Data poisoning | Adversarial documents planted in knowledge base | Persistent compromise of all retrievals |
| Information disclosure | Retrieval returns documents outside user authorization | Data leakage across permission boundaries |
| Context overflow | Queries designed to retrieve excessive content | Denial of service, budget exhaustion |
| Embedding inversion | Query embeddings analyzed to infer document content | Privacy violation, content reconstruction |
| Retrieval manipulation | Crafted queries that surface specific malicious content | Targeted injection delivery |
The ZIVIS Position
- •Assume breach of the document store.Design as if adversarial content will be retrieved. Your threat model starts with "attacker has write access to indexed content."
- •Structural isolation.Retrieved content goes into a constrained context region. System instructions should be architecturally separated, not just positionally. Positional authority is not a security property.
- •Output validation over input sanitization.You can't reliably clean natural language inputs. You can detect anomalous outputs—tool calls that shouldn't follow from the query, format shifts, instruction leakage.
- •Retrieval provenance as first-class telemetry.Every generation traces to retrieved document IDs. Anomaly detection runs on the retrieval-to-output relationship, not just output content.
- •Bound context injection.Hard limits on retrieved token count. Prefer summarization over truncation—truncation is predictable and gameable.
What We Tell Clients
If you're building RAG, budget 30% of your development time for the security layer. If that sounds excessive, you're underestimating the attack surface.
If you've already shipped RAG without this, you have an open injection vector in production. The question is whether an attacker has found it yet.
Related Patterns
- Query Rewriting— adds attack surface via rewrite manipulation
- Self-RAG— self-grading doesn't solve the trust problem
- Input Sanitization— defense in depth, but unreliable for NL
- Output Validation— the more defensible checkpoint