Jump to pattern

Trust Separation That Actually Works

Using privileged and sandboxed models for real security boundaries

The Conventional Framing

Dual-LLM patterns use two separate models with different trust levels. A privileged model has access to sensitive operations and system context. A sandboxed model handles untrusted input and user interaction. The privileged model supervises and validates outputs from the sandboxed one.

This is one of the few patterns that attempts genuine trust separation rather than hoping the same model will behave differently in different contexts.

Why This Is Hard to Get Right

Dual-LLM is the right architecture, but the boundary enforcement is where implementations fail. The separation is only as good as the interface between models—and that interface is often just text passed from one to the other.

If the sandboxed model can craft outputs that manipulate the privileged model's decisions, you've just moved the injection one layer up.

Implementation challenges:

Interface design. How does the sandboxed model communicate with the privileged one? If it's free-form text, injection vectors exist. If it's structured, the schema must be carefully designed.
Privilege scope. What can the privileged model actually do? If it has broad authority, a single manipulation is catastrophic.
Latency and cost. Two models means twice the inference cost and latency. Teams cut corners under pressure.

Architecture

Components:

Sandboxed model— handles untrusted input, limited capabilities
Privileged model— has authority, protected context
Interface layer— structured communication between models
Tool access— only available to privileged model

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ UNTRUSTED ZONE │ │ │ │ User Input ──► Sandboxed Model ──► Structured Output │ │ │ │ [Injection attempts contained here] │ └─────────────────────────────────────────────────────────┘ │ [Interface Boundary] [Schema validation] [Output filtering] │ ▼ ┌─────────────────────────────────────────────────────────┐ │ TRUSTED ZONE │ │ │ │ Privileged Model ──► Tools ──► Sensitive Actions │ │ │ │ [Protected system prompt] │ │ [No direct user input] │ │ [Validated requests only] │ └─────────────────────────────────────────────────────────┘

User → Sandboxed model — untrusted input enters here, contained
Sandboxed → Interface — critical boundary, must validate
Interface → Privileged — only structured, validated requests

Threat Surface

Threat	Vector	Impact
Interface bypass	Sandboxed output manipulates privileged model	Injection crosses trust boundary
Schema exploitation	Malformed structured output abuses interface	Unexpected privileged model behavior
Privilege creep	Sandboxed model gains capabilities over time	Trust boundary erosion
Side channel	Information leakage from privileged to sandboxed	Sensitive data exposure

The ZIVIS Position

•
This is the right pattern.Dual-LLM is one of the few architectures that provides real security properties. The challenge is implementation, not concept.
•
Interface is everything.The boundary between models must be rigorously designed. Structured data, schema validation, explicit allow-lists for what the sandboxed model can request.
•
Minimize privileged model exposure.The privileged model should have a small, well-defined API. It validates requests, not processes arbitrary text from the sandboxed model.
•
No context sharing.The privileged model's system prompt and context should never be visible to the sandboxed model or derivable from its outputs.
•
Audit the boundary.Log all cross-boundary communication. Anomaly detection on request patterns. The interface is your security surface.

What We Tell Clients

Dual-LLM is what you should build when security matters. It's also hard to get right. The interface between models is where implementations fail.

Invest heavily in the boundary design. Structured requests, validated schemas, minimal privileged model authority. If you get the interface right, you have real security properties. If you get it wrong, you have two models instead of one and the same vulnerabilities.

Related Patterns

Privilege Separation— the principle this pattern implements
Guardrails— can strengthen the interface boundary
Multi-Agent Orchestration— more agents, more boundaries, more complexity

Authoring the Agent Trust Protocol — the open standard for agentic trust attestation, currently under IETF review
Jim Goldman: Salesforce’s first VP of Global Security GRC, FBI Cybercrime Task Force, Purdue cyber forensics founder
Jake Miller: Co-Founder & CEO. 25 years engineering complex enterprise systems, now applied to AI offensive security
Proprietary ZIVIS platform: 120+ adversarial AI attack scenarios, continuous coverage across OWASP Web, API, LLM, and Agentic AI
Mesh Mesh: approved Salesforce sub-processor. Every review stage cleared.