Execution Follows Instruction, Including Injected Ones
Why LLMs that execute code amplify injection into arbitrary computation
The Conventional Framing
Code interpreter patterns allow LLMs to write and execute code to solve problems—running Python, manipulating data, creating visualizations. This dramatically expands model capabilities.
The pattern enables complex computation and data analysis through natural language interfaces.
Why Code Execution Amplifies Injection
When injection can influence what code gets written and executed, the attack surface expands from "manipulate text output" to "execute arbitrary computation." The model becomes a code-writing proxy for the attacker.
Sandboxing helps contain what the code can do to the system, but doesn't prevent data exfiltration, resource abuse, or attacks that operate within the sandbox's allowed capabilities.
The computation bridge:
Prompt injection → model writes malicious code → code executes. The injection is amplified through code generation into actual computation the attacker controls.
Architecture
Components:
- Code generation— model writes executable code
- Execution environment— sandbox for running code
- Result processing— handling execution output
- Iteration loop— refining code based on results
Trust Boundaries
- Input → Model — injection enters with request
- Model → Code — injection influences generated code
- Code → Execution — malicious code runs
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Code injection | Influence model to generate malicious code | Arbitrary computation executed |
| Data exfiltration via code | Generated code sends data to attacker | Sensitive data leaked through code execution |
| Resource abuse | Generated code consumes excessive resources | Denial of service, cost escalation |
| Sandbox escape | Generated code exploits sandbox vulnerabilities | Full system compromise |
The ZIVIS Position
- •Code generation from untrusted input is dangerous.If the model generates code based on adversarial input, you're giving attackers indirect code execution.
- •Sandbox, but don't rely solely on sandbox.Sandboxing is essential but not sufficient. Many attacks work within sandbox constraints.
- •Validate generated code.Before execution, analyze generated code for dangerous patterns. This is hard but worth attempting.
- •Limit code capabilities.Restrict what libraries/functions are available. No network access unless essential. Minimal filesystem.
What We Tell Clients
Code interpreter capabilities amplify injection from text manipulation to arbitrary computation. Sandboxing helps but doesn't prevent attacks that operate within allowed capabilities.
Minimize execution capabilities, validate generated code before running, and treat code interpreter as a high-risk feature requiring careful controls.
Related Patterns
- Sandboxing— containing code execution
- Tool Use Router— controlling what tools/functions available