Quis Custodiet Ipsos Custodes?
Why meta-agents that monitor workers have the same vulnerabilities as the workers
The Conventional Framing
Supervisor patterns use a meta-agent to monitor and correct worker agents. The supervisor reviews worker outputs, identifies errors, and intervenes when needed. This provides oversight and quality control.
The pattern is positioned as adding a layer of safety—even if a worker makes mistakes, the supervisor catches them.
Why This Doesn't Add Security
The supervisor operates in the same context as the workers. It sees the same compromised inputs, reads the same poisoned content, and is vulnerable to the same injections. A supervisor that's supposed to catch attacks is itself susceptible to those attacks.
"Who watches the watchmen?" In LLM systems, the answer is: nobody, because the watchman has the same vulnerabilities as everyone else.
Why supervisors fail:
- Same context, same vulnerabilities. If workers are compromised by context injection, the supervisor seeing that context is also compromised.
- Authority to override. Supervisors often have more authority than workers. A compromised supervisor can force workers to do anything.
- False confidence. "The supervisor approved it" becomes justification for trusting bad output.
Architecture
Components:
- Supervisor— meta-agent that monitors and corrects
- Workers— agents doing the actual work
- Review loop— supervisor evaluates worker outputs
- Correction mechanism— supervisor's ability to modify or retry
Trust Boundaries
- Input → Supervisor — supervisor sees compromised input
- Input → Workers — workers see same compromised input
- Workers → Supervisor review — supervisor evaluates in same context
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Supervisor compromise | Same injection that affects workers affects supervisor | Oversight fails, malicious output approved |
| Authority abuse | Compromised supervisor forces workers to act | Supervisor's elevated authority exploited |
| Review bypass | Injection includes instructions for supervisor | Supervisor approves what it should reject |
| False assurance | Supervisor approval used as security signal | Bad output trusted because 'supervisor checked' |
The ZIVIS Position
- •Supervisors need independent context.A supervisor operating in the same context as workers provides no security benefit. Real oversight requires different information sources.
- •Don't elevate supervisor authority.If the supervisor has more capabilities than workers, compromising the supervisor is more valuable. Keep supervisor authority minimal.
- •Supervisors don't replace security controls.A supervisor is a quality mechanism, not a security mechanism. Don't use it as your primary defense against injection.
- •Consider non-LLM supervisors.Rule-based validators, anomaly detection, human oversight—these can provide supervision that's not vulnerable to the same attacks as LLM workers.
What We Tell Clients
Adding a supervisor agent doesn't add a security layer. It adds another agent with the same vulnerabilities, operating in the same context.
If you need oversight, use mechanisms that aren't vulnerable to the same attacks: deterministic validators, out-of-context review, or actual humans with full context on a clean interface.
Related Patterns
- Reflection— self-supervision with same problems
- Guardrails— external validation instead of LLM oversight
- Multi-Agent Orchestration— supervisor is one orchestration approach