Security Is the Weakest Link

Why multiple LLMs collaborating on responses inherit the vulnerabilities of all of them

The Conventional Framing

Mixture of Agents uses multiple LLMs to collaborate on generating a single response. Different models contribute perspectives, outputs are aggregated or synthesized, and the final response benefits from diverse model capabilities.

The pattern is positioned as getting the best of multiple models—combining strengths while compensating for individual weaknesses.

Why This Compounds Vulnerabilities

When multiple models contribute to a response, the security posture is determined by the weakest model. If any model in the mixture is vulnerable to a particular injection, that injection can affect the final output.

You're not getting the most secure model's protection—you're getting the least secure model's vulnerabilities.

Why mixtures multiply risk:

  • Weakest link security. An attacker only needs to compromise one model to influence the output.
  • Aggregation amplifies. If a poisoned response gets aggregated with clean responses, the poison may still propagate.
  • Diverse vulnerabilities. Different models have different injection techniques that work. More models means more attack vectors.

Architecture

Components:

  • Input distributorsends query to multiple models
  • Participating modelsdifferent LLMs contributing
  • Aggregatorcombines model outputs into final response
  • Weighting logichow much each model influences result

Trust Boundaries

Query ──► Model A (secure) ──┐ │ │ ├──► Model B (vulnerable) ──┼──► Aggregator ──► Output │ │ └──► Model C (secure) ──┘ If Model B is compromised, the aggregated output may contain B's malicious contribution. Security = min(security of all models) Attack surface = union(attack surfaces of all models)
  1. Query → Each modelsame injection reaches all models
  2. Model outputs → Aggregatorpoisoned output enters aggregation
  3. Aggregator → Final outputaggregation may preserve poison

Threat Surface

ThreatVectorImpact
Weakest link exploitationTarget injection at most vulnerable modelCompromise propagates through aggregation
Aggregation manipulationCraft output that dominates aggregationOne model's output overweights others
Model-specific attacksDifferent injection for each model typeAttack multiple models simultaneously
Consistency attacksMake all models agree on malicious outputBypass voting or consensus defenses

The ZIVIS Position

  • Security is minimum, not maximum.The mixture is only as secure as its least secure component. Adding models adds attack surface, not defense.
  • Aggregation doesn't sanitize.Combining outputs doesn't remove injections. Poisoned content from one model can propagate through aggregation.
  • Prefer homogeneous security.If you must mix models, ensure all have similar security properties. A weak model negates the security of strong ones.
  • Validate aggregated output.The final output should go through security validation regardless of how many models contributed to it.

What We Tell Clients

Mixture of Agents optimizes for capability diversity, not security. Every model you add is another potential entry point for attacks.

If security matters, use the most secure single model you have. If you must mix, ensure all participating models have equivalent security properties, and validate the aggregated output as rigorously as any single-model output.

Related Patterns