Trust Separation That Actually Works
Using privileged and sandboxed models for real security boundaries
The Conventional Framing
Dual-LLM patterns use two separate models with different trust levels. A privileged model has access to sensitive operations and system context. A sandboxed model handles untrusted input and user interaction. The privileged model supervises and validates outputs from the sandboxed one.
This is one of the few patterns that attempts genuine trust separation rather than hoping the same model will behave differently in different contexts.
Why This Is Hard to Get Right
Dual-LLM is the right architecture, but the boundary enforcement is where implementations fail. The separation is only as good as the interface between models—and that interface is often just text passed from one to the other.
If the sandboxed model can craft outputs that manipulate the privileged model's decisions, you've just moved the injection one layer up.
Implementation challenges:
- Interface design. How does the sandboxed model communicate with the privileged one? If it's free-form text, injection vectors exist. If it's structured, the schema must be carefully designed.
- Privilege scope. What can the privileged model actually do? If it has broad authority, a single manipulation is catastrophic.
- Latency and cost. Two models means twice the inference cost and latency. Teams cut corners under pressure.
Architecture
Components:
- Sandboxed model— handles untrusted input, limited capabilities
- Privileged model— has authority, protected context
- Interface layer— structured communication between models
- Tool access— only available to privileged model
Trust Boundaries
- User → Sandboxed model — untrusted input enters here, contained
- Sandboxed → Interface — critical boundary, must validate
- Interface → Privileged — only structured, validated requests
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Interface bypass | Sandboxed output manipulates privileged model | Injection crosses trust boundary |
| Schema exploitation | Malformed structured output abuses interface | Unexpected privileged model behavior |
| Privilege creep | Sandboxed model gains capabilities over time | Trust boundary erosion |
| Side channel | Information leakage from privileged to sandboxed | Sensitive data exposure |
The ZIVIS Position
- •This is the right pattern.Dual-LLM is one of the few architectures that provides real security properties. The challenge is implementation, not concept.
- •Interface is everything.The boundary between models must be rigorously designed. Structured data, schema validation, explicit allow-lists for what the sandboxed model can request.
- •Minimize privileged model exposure.The privileged model should have a small, well-defined API. It validates requests, not processes arbitrary text from the sandboxed model.
- •No context sharing.The privileged model's system prompt and context should never be visible to the sandboxed model or derivable from its outputs.
- •Audit the boundary.Log all cross-boundary communication. Anomaly detection on request patterns. The interface is your security surface.
What We Tell Clients
Dual-LLM is what you should build when security matters. It's also hard to get right. The interface between models is where implementations fail.
Invest heavily in the boundary design. Structured requests, validated schemas, minimal privileged model authority. If you get the interface right, you have real security properties. If you get it wrong, you have two models instead of one and the same vulnerabilities.
Related Patterns
- Privilege Separation— the principle this pattern implements
- Guardrails— can strengthen the interface boundary
- Multi-Agent Orchestration— more agents, more boundaries, more complexity