Humans Approve What They Don't Understand
Why escalation thresholds and approval workflows don't provide security guarantees
The Conventional Framing
Human-in-the-Loop (HITL) patterns require human approval for sensitive operations. The agent escalates to a human when actions exceed defined thresholds—high-value transactions, destructive operations, or uncertain classifications. Humans provide the judgment that models lack.
The pattern is positioned as the ultimate safety net: when automation fails, humans catch the error.
Why This Fails in Practice
Humans rubber-stamp approvals. This isn't a character flaw—it's an inevitable consequence of workflow design. Approval requests are interruptive, context is incomplete, and the human has other work to do.
HITL systems train users to approve. Every legitimate request that gets approved reinforces the behavior. By the time an attack comes through, clicking "approve" is muscle memory.
Why human judgment fails here:
- Approval fatigue. Hundreds of legitimate approvals create habituation. The malicious one looks like all the others.
- Context stripping. Approval dialogs show summaries, not full context. The summary can obscure what's actually happening.
- Staging attacks. Request benign approvals first, building trust. Then slip in the malicious one.
- Timing attacks. Request approval when the user is busy, distracted, or at end of day.
Architecture
Components:
- Threshold detector— identifies when escalation needed
- Approval queue— pending requests for human review
- Approval interface— what humans see when deciding
- Execution gate— waits for approval before proceeding
Trust Boundaries
- Agent → Approval request — agent controls what human sees
- Human → Approval decision — decision made with incomplete info
- Approval → Execution — approved action may differ from understood
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Approval fatigue | Volume of legitimate requests normalizes approval | Malicious request approved without scrutiny |
| Context manipulation | Approval summary obscures true action | Human approves what they don't understand |
| Staging attacks | Benign requests build trust before malicious one | Pattern of approval exploited |
| Timing exploitation | Request approval during low-attention periods | Reduced scrutiny for malicious request |
| Threshold gaming | Split action to stay below escalation threshold | Bypass human review entirely |
The ZIVIS Position
- •HITL is a compliance mechanism, not a security control.Human approval provides legal cover ('a human approved this'). It doesn't reliably prevent attacks because humans don't reliably catch them.
- •Design for attention, not approval.If you must use HITL, design the interface to draw attention to anomalies, not to facilitate quick approval. Friction is a feature.
- •Approval context must be complete and verified.The human should see what will actually happen, not what the agent claims will happen. Verify the action independently.
- •Limit approval volume.If a human is approving more than a few requests per hour, habituation is inevitable. Reduce what requires approval.
- •Out-of-band verification for high-stakes actions.For truly sensitive operations, require approval through a different channel with different context. Don't let the compromised system control the approval interface.
What We Tell Clients
Human-in-the-loop is liability management, not security. It gives you someone to point to when things go wrong: "A human approved it."
If you're relying on human approval for security, you're relying on humans to maintain vigilance under conditions designed to erode it. Build technical controls instead. Use HITL as a backstop, not a primary defense.
Related Patterns
- Confirmation Loops— same problems in a different framing
- Plan-and-Execute— plan review has similar approval problems
- Audit Logging— at least know what was approved