Jump to pattern

Humans Approve What They Don't Understand

Why escalation thresholds and approval workflows don't provide security guarantees

The Conventional Framing

Human-in-the-Loop (HITL) patterns require human approval for sensitive operations. The agent escalates to a human when actions exceed defined thresholds—high-value transactions, destructive operations, or uncertain classifications. Humans provide the judgment that models lack.

The pattern is positioned as the ultimate safety net: when automation fails, humans catch the error.

Why This Fails in Practice

Humans rubber-stamp approvals. This isn't a character flaw—it's an inevitable consequence of workflow design. Approval requests are interruptive, context is incomplete, and the human has other work to do.

HITL systems train users to approve. Every legitimate request that gets approved reinforces the behavior. By the time an attack comes through, clicking "approve" is muscle memory.

Why human judgment fails here:

Approval fatigue. Hundreds of legitimate approvals create habituation. The malicious one looks like all the others.
Context stripping. Approval dialogs show summaries, not full context. The summary can obscure what's actually happening.
Staging attacks. Request benign approvals first, building trust. Then slip in the malicious one.
Timing attacks. Request approval when the user is busy, distracted, or at end of day.

Architecture

Components:

Threshold detector— identifies when escalation needed
Approval queue— pending requests for human review
Approval interface— what humans see when deciding
Execution gate— waits for approval before proceeding

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ APPROVAL INTERFACE │ │ │ │ "Agent wants to: Delete 47 files" │ │ [Approve] [Reject] │ │ │ │ What the human sees: A summary │ │ What's actually happening: ??? │ │ Full context: Not shown │ │ Attack indicators: Not visible │ │ │ │ Human cognitive state: │ │ - Busy with other work │ │ - Approved 50 similar requests today │ │ - Just wants to clear the queue │ └─────────────────────────────────────────────────────────┘

Agent → Approval request — agent controls what human sees
Human → Approval decision — decision made with incomplete info
Approval → Execution — approved action may differ from understood

Threat Surface

Threat	Vector	Impact
Approval fatigue	Volume of legitimate requests normalizes approval	Malicious request approved without scrutiny
Context manipulation	Approval summary obscures true action	Human approves what they don't understand
Staging attacks	Benign requests build trust before malicious one	Pattern of approval exploited
Timing exploitation	Request approval during low-attention periods	Reduced scrutiny for malicious request
Threshold gaming	Split action to stay below escalation threshold	Bypass human review entirely

The ZIVIS Position

•
HITL is a compliance mechanism, not a security control.Human approval provides legal cover ('a human approved this'). It doesn't reliably prevent attacks because humans don't reliably catch them.
•
Design for attention, not approval.If you must use HITL, design the interface to draw attention to anomalies, not to facilitate quick approval. Friction is a feature.
•
Approval context must be complete and verified.The human should see what will actually happen, not what the agent claims will happen. Verify the action independently.
•
Limit approval volume.If a human is approving more than a few requests per hour, habituation is inevitable. Reduce what requires approval.
•
Out-of-band verification for high-stakes actions.For truly sensitive operations, require approval through a different channel with different context. Don't let the compromised system control the approval interface.

What We Tell Clients

Human-in-the-loop is liability management, not security. It gives you someone to point to when things go wrong: "A human approved it."

If you're relying on human approval for security, you're relying on humans to maintain vigilance under conditions designed to erode it. Build technical controls instead. Use HITL as a backstop, not a primary defense.

Related Patterns

Confirmation Loops— same problems in a different framing
Plan-and-Execute— plan review has similar approval problems
Audit Logging— at least know what was approved

Authoring the Agent Trust Protocol — the open standard for agentic trust attestation, currently under IETF review
Jim Goldman: Salesforce’s first VP of Global Security GRC, FBI Cybercrime Task Force, Purdue cyber forensics founder
Jake Miller: Co-Founder & CEO. 25 years engineering complex enterprise systems, now applied to AI offensive security
Proprietary ZIVIS platform: 120+ adversarial AI attack scenarios, continuous coverage across OWASP Web, API, LLM, and Agentic AI
Mesh Mesh: approved Salesforce sub-processor. Every review stage cleared.