Documentation Is Untrusted Input

Why agents learning new tool usage from docs inherit whatever's in those docs

The Conventional Framing

Tool Learning enables agents to learn how to use new tools from documentation. Instead of hardcoding tool knowledge, the agent reads API docs, examples, and guides to understand how to invoke unfamiliar tools.

The pattern is praised for extensibility—agents can adapt to new tools without retraining or code changes.

Why This Is Dangerous

The documentation the agent learns from is untrusted input. If docs are compromised, the agent learns compromised behaviors. Malicious usage patterns become part of the agent's understanding of how to use the tool.

Worse, learned behaviors persist. The agent doesn't just follow the bad docs once—it incorporates them into its model of how the tool works for all future uses.

Why this is particularly risky:

  • Persistence. Bad learning persists across sessions if the agent maintains tool knowledge.
  • Authority inheritance. The agent uses the tool with whatever authority it has—executing what the docs taught it.
  • Third-party docs. Tool documentation often comes from third parties. You don't control what's in those docs.

Architecture

Components:

  • Documentation sourceAPI docs, examples, guides
  • Learning processagent reads and internalizes docs
  • Tool knowledge storelearned usage patterns
  • Tool invocationagent uses tool based on learned knowledge

Trust Boundaries

Documentation Sources: ├── Official API docs (hopefully trustworthy) ├── Community examples (who wrote these?) ├── Stack Overflow answers (anonymous contributors) └── Third-party tutorials (unknown provenance) All of these feed into agent's understanding. Malicious examples become learned behavior. Agent executes what it learned.
  1. Docs → Learninguntrusted docs become learned patterns
  2. Learning → Knowledgemalicious patterns persist
  3. Knowledge → Invocationlearned behavior executed with authority

Threat Surface

ThreatVectorImpact
Documentation poisoningMalicious examples in tool docsAgent learns to use tool maliciously
Persistent bad behaviorLearned patterns persist across sessionsOne-time poisoning, ongoing exploitation
Supply chain attackCompromise third-party documentation sourceAll agents learning from that source compromised
Capability manipulationDocs describe capabilities tool doesn't haveAgent attempts unauthorized operations

The ZIVIS Position

  • Docs are untrusted input.Treat documentation the same as any other external input. It can contain injections, malicious examples, or misleading information.
  • Prefer curated tool knowledge.Hardcoded, reviewed tool definitions are more secure than dynamic learning. Less flexible, more secure.
  • Validate learned behaviors.If you must do tool learning, validate the learned patterns before allowing execution. What did the agent learn, and is it safe?
  • Control documentation sources.If the agent learns from docs, you control which docs. Don't let it learn from arbitrary internet sources.

What We Tell Clients

Tool learning trades security for flexibility. Every documentation source is a potential injection vector, and learned behaviors persist.

For high-stakes tools, use hardcoded definitions. If you need dynamic learning, control the documentation sources strictly and validate learned behaviors before execution.

Related Patterns