Reasoning Traces Are Attack Surface

Why interleaved reasoning and action loops create manipulable decision chains

The Conventional Framing

ReAct (Reasoning + Acting) interleaves chain-of-thought reasoning with action execution. The model thinks through a problem step by step, takes actions, observes results, and continues reasoning. This produces more reliable, interpretable agent behavior.

The pattern is praised for transparency—you can see the model's reasoning process. This supposedly makes it easier to debug and trust.

Why This Is Dangerous

Every reasoning step is an opportunity for injection to influence the next action. The model's "thoughts" aren't protected—they're just more text in the context window, subject to the same manipulation as any other input.

Worse, the transparency that makes ReAct appealing also makes it exploitable. An attacker who can see (or infer) the reasoning format can craft injections that hijack the reasoning chain itself.

The compounding problem:

  • Each action result re-enters the reasoning loop. Tool outputs, API responses, retrieved documents—all become part of the context that influences the next "thought."
  • Reasoning traces leak information. The model's step-by-step thinking can reveal system prompts, internal logic, and security boundaries.
  • Actions are authority decisions. Each action the model takes is an implicit authorization. The reasoning trace doesn't verify intent—it just rationalizes the decision.

Architecture

Components:

  • Thoughtreasoning step visible in context
  • Actiontool call or external interaction
  • Observationresult of action, re-enters context
  • Loop controllerdecides when to stop iterating

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ CONTEXT WINDOW │ │ │ │ [System Prompt] ← Can be extracted via reasoning │ │ [User Query] ← May contain injection │ │ [Thought 1] ← Manipulable │ │ [Action 1] ← Authority decision │ │ [Observation 1] ← UNTRUSTED external input │ │ [Thought 2] ← Now reasoning on poisoned context │ │ ... │ │ │ │ Every observation is a new injection opportunity │ └─────────────────────────────────────────────────────────┘
  1. User query → First thoughtinitial injection opportunity
  2. Observation → Next thoughteach observation is untrusted input
  3. Thought → Actionreasoning doesn't verify authorization

Threat Surface

ThreatVectorImpact
Reasoning hijackInjection that mimics reasoning formatAttacker controls subsequent thoughts and actions
Observation poisoningMalicious content in tool/API responsesCorrupted reasoning from step N onward
System prompt extractionQueries that cause reasoning to reveal instructionsSecurity boundary disclosure
Action chain manipulationInjection that causes specific action sequenceUnauthorized multi-step operations
Loop exploitationObservations that prevent terminationResource exhaustion, infinite loops

The ZIVIS Position

  • Observations are untrusted input.Every tool result, API response, or retrieved document should be treated as potentially adversarial. Don't let observations directly influence high-stakes actions.
  • Bound the reasoning chain.Hard limits on iteration count. Escalating scrutiny as chains get longer. Long chains are more likely to be manipulated.
  • Separate reasoning from authorization.The model reasoning about an action is not the same as authorizing that action. Build explicit checkpoints for sensitive operations.
  • Don't expose raw reasoning traces.If users can see the thinking, attackers can learn the format. Summarize or filter reasoning before exposure.

What We Tell Clients

ReAct's transparency is a double-edged sword. The same visibility that helps you debug helps attackers craft targeted injections.

Use ReAct for low-stakes reasoning. For anything involving real authority—data access, external actions, sensitive operations—add explicit verification gates that don't rely on the model's own reasoning to authorize.

Related Patterns