Humans Approve What They Don't Understand

Why escalation thresholds and approval workflows don't provide security guarantees

The Conventional Framing

Human-in-the-Loop (HITL) patterns require human approval for sensitive operations. The agent escalates to a human when actions exceed defined thresholds—high-value transactions, destructive operations, or uncertain classifications. Humans provide the judgment that models lack.

The pattern is positioned as the ultimate safety net: when automation fails, humans catch the error.

Why This Fails in Practice

Humans rubber-stamp approvals. This isn't a character flaw—it's an inevitable consequence of workflow design. Approval requests are interruptive, context is incomplete, and the human has other work to do.

HITL systems train users to approve. Every legitimate request that gets approved reinforces the behavior. By the time an attack comes through, clicking "approve" is muscle memory.

Why human judgment fails here:

  • Approval fatigue. Hundreds of legitimate approvals create habituation. The malicious one looks like all the others.
  • Context stripping. Approval dialogs show summaries, not full context. The summary can obscure what's actually happening.
  • Staging attacks. Request benign approvals first, building trust. Then slip in the malicious one.
  • Timing attacks. Request approval when the user is busy, distracted, or at end of day.

Architecture

Components:

  • Threshold detectoridentifies when escalation needed
  • Approval queuepending requests for human review
  • Approval interfacewhat humans see when deciding
  • Execution gatewaits for approval before proceeding

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ APPROVAL INTERFACE │ │ │ │ "Agent wants to: Delete 47 files" │ │ [Approve] [Reject] │ │ │ │ What the human sees: A summary │ │ What's actually happening: ??? │ │ Full context: Not shown │ │ Attack indicators: Not visible │ │ │ │ Human cognitive state: │ │ - Busy with other work │ │ - Approved 50 similar requests today │ │ - Just wants to clear the queue │ └─────────────────────────────────────────────────────────┘
  1. Agent → Approval requestagent controls what human sees
  2. Human → Approval decisiondecision made with incomplete info
  3. Approval → Executionapproved action may differ from understood

Threat Surface

ThreatVectorImpact
Approval fatigueVolume of legitimate requests normalizes approvalMalicious request approved without scrutiny
Context manipulationApproval summary obscures true actionHuman approves what they don't understand
Staging attacksBenign requests build trust before malicious onePattern of approval exploited
Timing exploitationRequest approval during low-attention periodsReduced scrutiny for malicious request
Threshold gamingSplit action to stay below escalation thresholdBypass human review entirely

The ZIVIS Position

  • HITL is a compliance mechanism, not a security control.Human approval provides legal cover ('a human approved this'). It doesn't reliably prevent attacks because humans don't reliably catch them.
  • Design for attention, not approval.If you must use HITL, design the interface to draw attention to anomalies, not to facilitate quick approval. Friction is a feature.
  • Approval context must be complete and verified.The human should see what will actually happen, not what the agent claims will happen. Verify the action independently.
  • Limit approval volume.If a human is approving more than a few requests per hour, habituation is inevitable. Reduce what requires approval.
  • Out-of-band verification for high-stakes actions.For truly sensitive operations, require approval through a different channel with different context. Don't let the compromised system control the approval interface.

What We Tell Clients

Human-in-the-loop is liability management, not security. It gives you someone to point to when things go wrong: "A human approved it."

If you're relying on human approval for security, you're relying on humans to maintain vigilance under conditions designed to erode it. Build technical controls instead. Use HITL as a backstop, not a primary defense.

Related Patterns