Let's Think Step By Step... Through Poisoned Context
Why prompting for reasoning without examples still inherits context vulnerabilities
The Conventional Framing
Zero-shot chain-of-thought uses a simple prompt like "Let's think step by step" to elicit reasoning without providing examples. The model generates its own reasoning chain from scratch.
The approach is appealing because it requires no example engineering while still improving performance on reasoning tasks.
Why Unprompted Reasoning Is Still Contextual
"Let's think step by step" doesn't create reasoning in a vacuum. The model reasons about whatever is in its context—including any injections. You've asked it to reason carefully about potentially poisoned input.
Zero-shot CoT makes the model more thorough, not more secure. Thorough reasoning about adversarial content means thorough execution of adversarial instructions.
Architecture
Components:
- Trigger phrase— initiates reasoning mode
- Context inheritance— all context feeds into reasoning
- Step generation— model produces reasoning steps
- Conclusion— answer derived from reasoning
Trust Boundaries
- Input → Reasoning context — adversarial input reasoned about
- Reasoning → Steps — careful consideration of injections
- Steps → Output — well-reasoned wrong answer
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Reasoning amplification | Injection benefits from careful model attention | Model thoroughly executes injected instructions |
| Step-by-step attack guidance | Inject multi-step attack that model reasons through | Model plans and executes complex attack sequences |
| False legitimacy | Reasoning chain makes bad outputs look justified | Humans trust output because reasoning looks sound |
The ZIVIS Position
- •Reasoning doesn't filter.Asking the model to think carefully doesn't mean it thinks critically about injection. It means it processes everything thoroughly.
- •CoT is orthogonal to security.Chain-of-thought is a capability enhancement. It makes the model better at tasks—including tasks an attacker wants it to perform.
- •Visible reasoning aids attackers.When reasoning steps are visible, attackers can see exactly how their injections are being processed and refine their attacks.
What We Tell Clients
Zero-shot CoT improves reasoning quality, not reasoning safety. The model will carefully reason through adversarial inputs just as thoroughly as legitimate ones.
Don't rely on "thinking carefully" as a defense. The model doesn't distinguish between reasoning about your task and reasoning about an attacker's injection—it reasons about all context equally.
Related Patterns
- Chain-of-Thought— same issues with examples
- Self-Consistency— multiple reasoning paths