Isolating Execution, Not Instructions
Why sandboxing code execution doesn't protect against instruction-following attacks
The Conventional Framing
Sandboxing runs model-generated code in isolated environments—containers, VMs, restricted interpreters. This prevents malicious code from affecting the host system.
The pattern is essential for safe code execution in LLM applications.
Why Sandboxing Addresses One Attack Vector
Sandboxing protects against malicious code execution—the model generates code that would harm the system, and the sandbox contains it. This is real protection for a real threat.
But prompt injection often doesn't need code execution. The model can leak data, send emails, call APIs—all within "normal" operation, all inside the sandbox.
The capability gap:
The sandbox limits what code can do to the system. It doesn't limit what the model can do with its legitimate capabilities. Exfiltrating data through an allowed API call isn't a sandbox escape—it's using permissions.
Architecture
Components:
- Isolated environment— container, VM, restricted runtime
- Resource limits— CPU, memory, network constraints
- Filesystem isolation— no access to host files
- Network restrictions— limited or no network access
Trust Boundaries
- Model → Code — generated code enters sandbox
- Sandbox → Host — execution isolated from host
- Model → APIs — API access outside sandbox
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Capability abuse | Use legitimate capabilities maliciously | Sandboxing doesn't limit model's intended features |
| Data exfiltration | Leak data through allowed channels | Sandbox permits normal API communication |
| Resource exhaustion | Consume all allowed resources | Sandbox limits don't prevent DoS within limits |
| Sandbox escape | Exploit sandbox vulnerabilities | All sandboxes have escape risk; more capability = more surface |
The ZIVIS Position
- •Sandboxing is necessary but not sufficient.Sandbox code execution: yes. But also control what the model can do with its non-code capabilities.
- •Principle of least privilege.Don't just sandbox code; limit all capabilities to what's actually needed. Fewer permissions = smaller attack surface.
- •Monitor sandbox activity.Even in a sandbox, unusual behavior is a signal. Log what the sandboxed code does.
- •Defense in depth.Sandboxing is one layer. Combine with input validation, output filtering, capability limits, and monitoring.
What We Tell Clients
Sandboxing is essential for code execution but doesn't protect against most prompt injection attacks. Injection typically works through legitimate capabilities—API calls, tool use, response generation—not code execution.
Sandbox your code execution AND apply privilege separation to all model capabilities. The sandbox contains code; capability limits contain the model.
Related Patterns
- Privilege Separation— limiting all capabilities, not just code
- Tool Use Router— controlling tool access