RAG Is Not a Security Architecture

Why retrieval-augmented generation is an injection surface, not just a data pipeline

The Conventional Framing

RAG reduces hallucination by grounding generation in retrieved facts. Security concerns focus on access control over the document store—making sure users can only retrieve documents they're authorized to see.

The architecture seems straightforward: embed the query, search the vector store, inject relevant chunks into the prompt, generate. Security is a data access problem.

Why This Is Insufficient

Most RAG implementations treat retrieval as a data problem—get the right documents to the model. But retrieval is an injection surface. Every chunk you retrieve is untrusted input that will be interpreted by a reasoning system.

The industry is repeating the SQL injection mistake. We spent a decade treating database queries as a data flow problem before understanding them as an interpreter boundary. RAG has the same structure: untrusted content crosses into an interpreter (the LLM) that can't distinguish instruction from data.

The uncomfortable realities:

Your documents are attack surface. Every PDF, Confluence page, and Slack export you index is a potential payload. You're not just exposing data—you're exposing influence.
Access control is necessary but not sufficient. Even if users only retrieve documents they're authorized to see, those documents can contain injected instructions. Your own employees' notes can compromise your system.
"Sanitization" is largely theater. There's no reliable way to strip adversarial content from natural language while preserving semantic value. You can't regex your way out of this.

Architecture

Components:

Embedding model— encodes query for similarity search
Vector store— indexes document embeddings
Retrieval layer— top-k selection, optional reranking
Context assembly— constructs prompt with retrieved content
Inference model— generates response

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ UNTRUSTED │ │ ┌──────────────┐ ┌──────────────────────────┐ │ │ │ User Query │ │ Document Store │ │ │ └──────┬───────┘ │ (indexed content) │ │ │ │ └───────────┬──────────────┘ │ └─────────┼─────────────────────────────┼─────────────────┘ │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ TRUST BOUNDARY CROSSING │ │ Context Assembly / Prompt Construction │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ TRUSTED (supposedly) │ │ LLM Inference │ └─────────────────────────────────────────────────────────┘

Query → Embedding model — user input influences retrieval
Vector store → Context assembly — retrieved content is untrusted
Assembled prompt → LLM — the critical boundary most ignore

Threat Surface

Threat	Vector	Impact
Indirect prompt injection	Malicious instructions embedded in indexed documents	Arbitrary behavior change, data exfiltration
Data poisoning	Adversarial documents planted in knowledge base	Persistent compromise of all retrievals
Information disclosure	Retrieval returns documents outside user authorization	Data leakage across permission boundaries
Context overflow	Queries designed to retrieve excessive content	Denial of service, budget exhaustion
Embedding inversion	Query embeddings analyzed to infer document content	Privacy violation, content reconstruction
Retrieval manipulation	Crafted queries that surface specific malicious content	Targeted injection delivery

The ZIVIS Position

•
Assume breach of the document store.Design as if adversarial content will be retrieved. Your threat model starts with "attacker has write access to indexed content."
•
Structural isolation.Retrieved content goes into a constrained context region. System instructions should be architecturally separated, not just positionally. Positional authority is not a security property.
•
Output validation over input sanitization.You can't reliably clean natural language inputs. You can detect anomalous outputs—tool calls that shouldn't follow from the query, format shifts, instruction leakage.
•
Retrieval provenance as first-class telemetry.Every generation traces to retrieved document IDs. Anomaly detection runs on the retrieval-to-output relationship, not just output content.
•
Bound context injection.Hard limits on retrieved token count. Prefer summarization over truncation—truncation is predictable and gameable.

What We Tell Clients

If you're building RAG, budget 30% of your development time for the security layer. If that sounds excessive, you're underestimating the attack surface.

If you've already shipped RAG without this, you have an open injection vector in production. The question is whether an attacker has found it yet.

Related Patterns

Query Rewriting— adds attack surface via rewrite manipulation
Self-RAG— self-grading doesn't solve the trust problem
Input Sanitization— defense in depth, but unreliable for NL
Output Validation— the more defensible checkpoint