Trust Separation That Actually Works

Using privileged and sandboxed models for real security boundaries

The Conventional Framing

Dual-LLM patterns use two separate models with different trust levels. A privileged model has access to sensitive operations and system context. A sandboxed model handles untrusted input and user interaction. The privileged model supervises and validates outputs from the sandboxed one.

This is one of the few patterns that attempts genuine trust separation rather than hoping the same model will behave differently in different contexts.

Why This Is Hard to Get Right

Dual-LLM is the right architecture, but the boundary enforcement is where implementations fail. The separation is only as good as the interface between models—and that interface is often just text passed from one to the other.

If the sandboxed model can craft outputs that manipulate the privileged model's decisions, you've just moved the injection one layer up.

Implementation challenges:

  • Interface design. How does the sandboxed model communicate with the privileged one? If it's free-form text, injection vectors exist. If it's structured, the schema must be carefully designed.
  • Privilege scope. What can the privileged model actually do? If it has broad authority, a single manipulation is catastrophic.
  • Latency and cost. Two models means twice the inference cost and latency. Teams cut corners under pressure.

Architecture

Components:

  • Sandboxed modelhandles untrusted input, limited capabilities
  • Privileged modelhas authority, protected context
  • Interface layerstructured communication between models
  • Tool accessonly available to privileged model

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ UNTRUSTED ZONE │ │ │ │ User Input ──► Sandboxed Model ──► Structured Output │ │ │ │ [Injection attempts contained here] │ └─────────────────────────────────────────────────────────┘ │ [Interface Boundary] [Schema validation] [Output filtering] │ ▼ ┌─────────────────────────────────────────────────────────┐ │ TRUSTED ZONE │ │ │ │ Privileged Model ──► Tools ──► Sensitive Actions │ │ │ │ [Protected system prompt] │ │ [No direct user input] │ │ [Validated requests only] │ └─────────────────────────────────────────────────────────┘
  1. User → Sandboxed modeluntrusted input enters here, contained
  2. Sandboxed → Interfacecritical boundary, must validate
  3. Interface → Privilegedonly structured, validated requests

Threat Surface

ThreatVectorImpact
Interface bypassSandboxed output manipulates privileged modelInjection crosses trust boundary
Schema exploitationMalformed structured output abuses interfaceUnexpected privileged model behavior
Privilege creepSandboxed model gains capabilities over timeTrust boundary erosion
Side channelInformation leakage from privileged to sandboxedSensitive data exposure

The ZIVIS Position

  • This is the right pattern.Dual-LLM is one of the few architectures that provides real security properties. The challenge is implementation, not concept.
  • Interface is everything.The boundary between models must be rigorously designed. Structured data, schema validation, explicit allow-lists for what the sandboxed model can request.
  • Minimize privileged model exposure.The privileged model should have a small, well-defined API. It validates requests, not processes arbitrary text from the sandboxed model.
  • No context sharing.The privileged model's system prompt and context should never be visible to the sandboxed model or derivable from its outputs.
  • Audit the boundary.Log all cross-boundary communication. Anomaly detection on request patterns. The interface is your security surface.

What We Tell Clients

Dual-LLM is what you should build when security matters. It's also hard to get right. The interface between models is where implementations fail.

Invest heavily in the boundary design. Structured requests, validated schemas, minimal privileged model authority. If you get the interface right, you have real security properties. If you get it wrong, you have two models instead of one and the same vulnerabilities.

Related Patterns