Quis Custodiet Ipsos Custodes?

Why meta-agents that monitor workers have the same vulnerabilities as the workers

The Conventional Framing

Supervisor patterns use a meta-agent to monitor and correct worker agents. The supervisor reviews worker outputs, identifies errors, and intervenes when needed. This provides oversight and quality control.

The pattern is positioned as adding a layer of safety—even if a worker makes mistakes, the supervisor catches them.

Why This Doesn't Add Security

The supervisor operates in the same context as the workers. It sees the same compromised inputs, reads the same poisoned content, and is vulnerable to the same injections. A supervisor that's supposed to catch attacks is itself susceptible to those attacks.

"Who watches the watchmen?" In LLM systems, the answer is: nobody, because the watchman has the same vulnerabilities as everyone else.

Why supervisors fail:

  • Same context, same vulnerabilities. If workers are compromised by context injection, the supervisor seeing that context is also compromised.
  • Authority to override. Supervisors often have more authority than workers. A compromised supervisor can force workers to do anything.
  • False confidence. "The supervisor approved it" becomes justification for trusting bad output.

Architecture

Components:

  • Supervisormeta-agent that monitors and corrects
  • Workersagents doing the actual work
  • Review loopsupervisor evaluates worker outputs
  • Correction mechanismsupervisor's ability to modify or retry

Trust Boundaries

┌─────────────────────────────────────────────────────────┐ │ SAME CONTEXT │ │ │ │ User Input (may contain injection) │ │ │ │ │ ├──────────────────────────────────────┐ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌───────────┐ │ │ │ Supervisor │ ◄──── reviews ─────── │ Workers │ │ │ │ (vulnerable) │ │(vulnerable)│ │ │ └──────────────┘ └───────────┘ │ │ │ │ │ ▼ │ │ If workers are fooled by injection, │ │ supervisor reviewing their output is likely │ │ fooled by the same injection. │ └─────────────────────────────────────────────────────────┘
  1. Input → Supervisorsupervisor sees compromised input
  2. Input → Workersworkers see same compromised input
  3. Workers → Supervisor reviewsupervisor evaluates in same context

Threat Surface

ThreatVectorImpact
Supervisor compromiseSame injection that affects workers affects supervisorOversight fails, malicious output approved
Authority abuseCompromised supervisor forces workers to actSupervisor's elevated authority exploited
Review bypassInjection includes instructions for supervisorSupervisor approves what it should reject
False assuranceSupervisor approval used as security signalBad output trusted because 'supervisor checked'

The ZIVIS Position

  • Supervisors need independent context.A supervisor operating in the same context as workers provides no security benefit. Real oversight requires different information sources.
  • Don't elevate supervisor authority.If the supervisor has more capabilities than workers, compromising the supervisor is more valuable. Keep supervisor authority minimal.
  • Supervisors don't replace security controls.A supervisor is a quality mechanism, not a security mechanism. Don't use it as your primary defense against injection.
  • Consider non-LLM supervisors.Rule-based validators, anomaly detection, human oversight—these can provide supervision that's not vulnerable to the same attacks as LLM workers.

What We Tell Clients

Adding a supervisor agent doesn't add a security layer. It adds another agent with the same vulnerabilities, operating in the same context.

If you need oversight, use mechanisms that aren't vulnerable to the same attacks: deterministic validators, out-of-context review, or actual humans with full context on a clean interface.

Related Patterns