Jump to pattern

Isolating Execution, Not Instructions

Why sandboxing code execution doesn't protect against instruction-following attacks

The Conventional Framing

Sandboxing runs model-generated code in isolated environments—containers, VMs, restricted interpreters. This prevents malicious code from affecting the host system.

The pattern is essential for safe code execution in LLM applications.

Why Sandboxing Addresses One Attack Vector

Sandboxing protects against malicious code execution—the model generates code that would harm the system, and the sandbox contains it. This is real protection for a real threat.

But prompt injection often doesn't need code execution. The model can leak data, send emails, call APIs—all within "normal" operation, all inside the sandbox.

The capability gap:

The sandbox limits what code can do to the system. It doesn't limit what the model can do with its legitimate capabilities. Exfiltrating data through an allowed API call isn't a sandbox escape—it's using permissions.

Architecture

Components:

Isolated environment— container, VM, restricted runtime
Resource limits— CPU, memory, network constraints
Filesystem isolation— no access to host files
Network restrictions— limited or no network access

Trust Boundaries

Sandboxed code execution: ✓ Can't access host filesystem ✓ Can't install malware ✓ Can't pivot to other systems Model's other capabilities (not sandboxed): ✗ Can call allowed APIs with any parameters ✗ Can include any content in responses ✗ Can request any tool with any arguments Injection uses capabilities, not sandbox escapes.

Model → Code — generated code enters sandbox
Sandbox → Host — execution isolated from host
Model → APIs — API access outside sandbox

Threat Surface

Threat	Vector	Impact
Capability abuse	Use legitimate capabilities maliciously	Sandboxing doesn't limit model's intended features
Data exfiltration	Leak data through allowed channels	Sandbox permits normal API communication
Resource exhaustion	Consume all allowed resources	Sandbox limits don't prevent DoS within limits
Sandbox escape	Exploit sandbox vulnerabilities	All sandboxes have escape risk; more capability = more surface

The ZIVIS Position

•
Sandboxing is necessary but not sufficient.Sandbox code execution: yes. But also control what the model can do with its non-code capabilities.
•
Principle of least privilege.Don't just sandbox code; limit all capabilities to what's actually needed. Fewer permissions = smaller attack surface.
•
Monitor sandbox activity.Even in a sandbox, unusual behavior is a signal. Log what the sandboxed code does.
•
Defense in depth.Sandboxing is one layer. Combine with input validation, output filtering, capability limits, and monitoring.

What We Tell Clients

Sandboxing is essential for code execution but doesn't protect against most prompt injection attacks. Injection typically works through legitimate capabilities—API calls, tool use, response generation—not code execution.

Sandbox your code execution AND apply privilege separation to all model capabilities. The sandbox contains code; capability limits contain the model.

Related Patterns

Privilege Separation— limiting all capabilities, not just code
Tool Use Router— controlling tool access

Authoring the Agent Trust Protocol — the open standard for agentic trust attestation, currently under IETF review
Jim Goldman: Salesforce’s first VP of Global Security GRC, FBI Cybercrime Task Force, Purdue cyber forensics founder
Jake Miller: Co-Founder & CEO. 25 years engineering complex enterprise systems, now applied to AI offensive security
Proprietary ZIVIS platform: 120+ adversarial AI attack scenarios, continuous coverage across OWASP Web, API, LLM, and Agentic AI
Mesh Mesh: approved Salesforce sub-processor. Every review stage cleared.