Reasoning Steps Are Manipulable Context
Why showing your work creates intervention points for adversarial inputs
The Conventional Framing
Chain-of-thought prompting asks the model to show its reasoning step by step. Breaking complex problems into smaller steps improves accuracy on math, logic, and multi-step reasoning tasks.
The pattern is presented as a cognitive enhancement—better reasoning through explicit intermediate steps.
Why Reasoning Steps Are Attack Surface
Each reasoning step becomes part of the context for subsequent steps. Adversarial input doesn't need to corrupt the final answer directly—it can corrupt an intermediate step, and the corruption propagates.
The model's "thinking" is visible and influenceable. An attacker who can inject into the reasoning chain can guide the entire thought process.
The amplification problem:
Small manipulations early in the chain compound. A subtle bias introduced in step 1 shapes step 2, which shapes step 3. By the conclusion, the reasoning looks sound—because each step followed from the previous one.
Architecture
Components:
- Initial prompt— sets up the reasoning task
- Step generation— model produces intermediate reasoning
- Step chaining— each step feeds into next
- Conclusion synthesis— final answer from reasoning chain
Trust Boundaries
- Input → First step — injection enters reasoning
- Step → Step — corruption propagates through chain
- Final step → Output — corrupted conclusion emerges
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Early chain injection | Inject instructions that alter initial reasoning steps | Entire chain builds on corrupted foundation |
| Step manipulation | Target specific reasoning steps with tailored injections | Guide reasoning toward attacker-desired conclusion |
| Premise poisoning | Inject false premises that the chain then reasons from | Logically valid but factually wrong conclusions |
| Reasoning exposure | Chain reveals model's decision process | Attacker learns how to construct more effective attacks |
The ZIVIS Position
- •Reasoning transparency is dual-use.Visible reasoning helps humans verify model work. It also shows attackers exactly how to intervene in the thought process.
- •Validate conclusions independently.Don't trust the conclusion just because the reasoning chain looks coherent. Validate outputs against known constraints regardless of how they were derived.
- •Consider hiding intermediate steps.For security-sensitive applications, you might generate reasoning internally but only expose validated conclusions. Trade-off: less interpretability.
What We Tell Clients
Chain-of-thought is powerful for complex reasoning but creates a larger attack surface. Each step is a place where adversarial input can intervene.
Don't trust conclusions solely because the reasoning chain looks coherent— that coherence may be the result of a corrupted early step cascading through. Validate outputs against external constraints.
Related Patterns
- Tree of Thoughts— branching reasoning with same issues
- Self-Consistency— multiple chains as potential mitigation