RAG Is Not a Security Architecture

Why retrieval-augmented generation is an injection surface, not just a data pipeline

The Conventional Framing

RAG reduces hallucination by grounding generation in retrieved facts. Security concerns focus on access control over the document store—making sure users can only retrieve documents they're authorized to see.

The architecture seems straightforward: embed the query, search the vector store, inject relevant chunks into the prompt, generate. Security is a data access problem.

Why This Is Insufficient

Most RAG implementations treat retrieval as a data problem—get the right documents to the model. But retrieval is an injection surface. Every chunk you retrieve is untrusted input that will be interpreted by a reasoning system.

The industry is repeating the SQL injection mistake. We spent a decade treating database queries as a data flow problem before understanding them as an interpreter boundary. RAG has the same structure: untrusted content crosses into an interpreter (the LLM) that can't distinguish instruction from data.

The uncomfortable realities:

  1. Your documents are attack surface. Every PDF, Confluence page, and Slack export you index is a potential payload. You're not just exposing data—you're exposing influence.
  2. Access control is necessary but not sufficient. Even if users only retrieve documents they're authorized to see, those documents can contain injected instructions. Your own employees' notes can compromise your system.
  3. "Sanitization" is largely theater. There's no reliable way to strip adversarial content from natural language while preserving semantic value. You can't regex your way out of this.

Architecture

Components:

  • Embedding modelencodes query for similarity search
  • Vector storeindexes document embeddings
  • Retrieval layertop-k selection, optional reranking
  • Context assemblyconstructs prompt with retrieved content
  • Inference modelgenerates response

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ UNTRUSTED │ │ ┌──────────────┐ ┌──────────────────────────┐ │ │ │ User Query │ │ Document Store │ │ │ └──────┬───────┘ │ (indexed content) │ │ │ │ └───────────┬──────────────┘ │ └─────────┼─────────────────────────────┼─────────────────┘ │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ TRUST BOUNDARY CROSSING │ │ Context Assembly / Prompt Construction │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ TRUSTED (supposedly) │ │ LLM Inference │ └─────────────────────────────────────────────────────────┘
  1. Query → Embedding modeluser input influences retrieval
  2. Vector store → Context assemblyretrieved content is untrusted
  3. Assembled prompt → LLMthe critical boundary most ignore

Threat Surface

ThreatVectorImpact
Indirect prompt injectionMalicious instructions embedded in indexed documentsArbitrary behavior change, data exfiltration
Data poisoningAdversarial documents planted in knowledge basePersistent compromise of all retrievals
Information disclosureRetrieval returns documents outside user authorizationData leakage across permission boundaries
Context overflowQueries designed to retrieve excessive contentDenial of service, budget exhaustion
Embedding inversionQuery embeddings analyzed to infer document contentPrivacy violation, content reconstruction
Retrieval manipulationCrafted queries that surface specific malicious contentTargeted injection delivery

The ZIVIS Position

  • Assume breach of the document store.Design as if adversarial content will be retrieved. Your threat model starts with "attacker has write access to indexed content."
  • Structural isolation.Retrieved content goes into a constrained context region. System instructions should be architecturally separated, not just positionally. Positional authority is not a security property.
  • Output validation over input sanitization.You can't reliably clean natural language inputs. You can detect anomalous outputs—tool calls that shouldn't follow from the query, format shifts, instruction leakage.
  • Retrieval provenance as first-class telemetry.Every generation traces to retrieved document IDs. Anomaly detection runs on the retrieval-to-output relationship, not just output content.
  • Bound context injection.Hard limits on retrieved token count. Prefer summarization over truncation—truncation is predictable and gameable.

What We Tell Clients

If you're building RAG, budget 30% of your development time for the security layer. If that sounds excessive, you're underestimating the attack surface.

If you've already shipped RAG without this, you have an open injection vector in production. The question is whether an attacker has found it yet.

Related Patterns