Jump to pattern

Reasoning Traces Are Attack Surface

Why interleaved reasoning and action loops create manipulable decision chains

The Conventional Framing

ReAct (Reasoning + Acting) interleaves chain-of-thought reasoning with action execution. The model thinks through a problem step by step, takes actions, observes results, and continues reasoning. This produces more reliable, interpretable agent behavior.

The pattern is praised for transparency—you can see the model's reasoning process. This supposedly makes it easier to debug and trust.

Why This Is Dangerous

Every reasoning step is an opportunity for injection to influence the next action. The model's "thoughts" aren't protected—they're just more text in the context window, subject to the same manipulation as any other input.

Worse, the transparency that makes ReAct appealing also makes it exploitable. An attacker who can see (or infer) the reasoning format can craft injections that hijack the reasoning chain itself.

The compounding problem:

Each action result re-enters the reasoning loop. Tool outputs, API responses, retrieved documents—all become part of the context that influences the next "thought."
Reasoning traces leak information. The model's step-by-step thinking can reveal system prompts, internal logic, and security boundaries.
Actions are authority decisions. Each action the model takes is an implicit authorization. The reasoning trace doesn't verify intent—it just rationalizes the decision.

Architecture

Components:

Thought— reasoning step visible in context
Action— tool call or external interaction
Observation— result of action, re-enters context
Loop controller— decides when to stop iterating

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ CONTEXT WINDOW │ │ │ │ [System Prompt] ← Can be extracted via reasoning │ │ [User Query] ← May contain injection │ │ [Thought 1] ← Manipulable │ │ [Action 1] ← Authority decision │ │ [Observation 1] ← UNTRUSTED external input │ │ [Thought 2] ← Now reasoning on poisoned context │ │ ... │ │ │ │ Every observation is a new injection opportunity │ └─────────────────────────────────────────────────────────┘

User query → First thought — initial injection opportunity
Observation → Next thought — each observation is untrusted input
Thought → Action — reasoning doesn't verify authorization

Threat Surface

Threat	Vector	Impact
Reasoning hijack	Injection that mimics reasoning format	Attacker controls subsequent thoughts and actions
Observation poisoning	Malicious content in tool/API responses	Corrupted reasoning from step N onward
System prompt extraction	Queries that cause reasoning to reveal instructions	Security boundary disclosure
Action chain manipulation	Injection that causes specific action sequence	Unauthorized multi-step operations
Loop exploitation	Observations that prevent termination	Resource exhaustion, infinite loops

The ZIVIS Position

•
Observations are untrusted input.Every tool result, API response, or retrieved document should be treated as potentially adversarial. Don't let observations directly influence high-stakes actions.
•
Bound the reasoning chain.Hard limits on iteration count. Escalating scrutiny as chains get longer. Long chains are more likely to be manipulated.
•
Separate reasoning from authorization.The model reasoning about an action is not the same as authorizing that action. Build explicit checkpoints for sensitive operations.
•
Don't expose raw reasoning traces.If users can see the thinking, attackers can learn the format. Summarize or filter reasoning before exposure.

What We Tell Clients

ReAct's transparency is a double-edged sword. The same visibility that helps you debug helps attackers craft targeted injections.

Use ReAct for low-stakes reasoning. For anything involving real authority—data access, external actions, sensitive operations—add explicit verification gates that don't rely on the model's own reasoning to authorize.

Related Patterns

Plan-and-Execute— separates planning from execution, but plans can be poisoned
Reflection— self-critique uses the same compromised context
Chain-of-Thought— the prompting pattern ReAct builds on

Authoring the Agent Trust Protocol — the open standard for agentic trust attestation, currently under IETF review
Jim Goldman: Salesforce’s first VP of Global Security GRC, FBI Cybercrime Task Force, Purdue cyber forensics founder
Jake Miller: Co-Founder & CEO. 25 years engineering complex enterprise systems, now applied to AI offensive security
Proprietary ZIVIS platform: 120+ adversarial AI attack scenarios, continuous coverage across OWASP Web, API, LLM, and Agentic AI
Mesh Mesh: approved Salesforce sub-processor. Every review stage cleared.