Corrective Retrieval Uses Compromised Context

Why evaluating retrieval quality and retrying doesn't help when evaluation is vulnerable

The Conventional Framing

CRAG (Corrective RAG) evaluates retrieval quality and takes corrective action. If retrieved content is low quality, it retries with different queries or sources, or falls back to generation without retrieval.

The pattern improves reliability by catching and correcting poor retrieval.

Why Correction Uses the Same Compromised Context

The model evaluating retrieval quality is in the same context as the compromised retrieval. Poisoned content can influence the evaluation, causing good content to be rejected or bad content to be accepted.

Correction also creates more opportunities for manipulation—each retry is another chance for poison to be retrieved.

Architecture

Components:

  • Initial retrievalfirst attempt at getting content
  • Quality evaluatorLLM judges retrieval quality
  • Corrective actionretry, reformulate, or fallback
  • Retry loopcontinues until quality acceptable

Trust Boundaries

Initial retrieval: [Poisoned content] Quality evaluation (in poisoned context): "Is this content relevant and high quality?" → Poison may influence evaluation Corrective actions: ├── Retry → More chances to hit poison ├── Reformulate → New query may be manipulated └── Fallback → Skip retrieval (maybe good?) The "correction" doesn't know what's wrong.
  1. Retrieval → Evaluationevaluating in poisoned context
  2. Evaluation → Correctioncorrection based on compromised judgment
  3. Retry → Retrievalmore attempts, more attack opportunities

Threat Surface

ThreatVectorImpact
Evaluation manipulationPoison influences quality judgmentBad content accepted, good content rejected
Retry exploitationEach retry is another poison opportunityEventually hit malicious content
Correction steeringManipulate what corrective action is takenCorrection leads to attacker-desired behavior

The ZIVIS Position

  • Quality evaluation is not security evaluation.CRAG checks if content is good enough to use. It doesn't check if content is safe to use. Different objectives.
  • Retry limits are security relevant.More retries mean more attack opportunities. Cap retries and consider whether persistent poor retrieval is itself suspicious.
  • Independent evaluation is hard.For the evaluator to catch injections, it needs context independent of the retrieval. That's architecturally difficult.

What We Tell Clients

CRAG improves retrieval reliability, not security. The quality evaluator operates in the same context as potentially poisoned retrieval—it can be manipulated.

Limit retries to bound attack opportunities. Don't rely on quality evaluation to catch injections. If you need security checks, implement them separately with different context.

Related Patterns

  • Self-RAGsimilar self-evaluation issues
  • Reflectionsame pattern of self-critique failing for security