Documentation Is Untrusted Input
Why agents learning new tool usage from docs inherit whatever's in those docs
The Conventional Framing
Tool Learning enables agents to learn how to use new tools from documentation. Instead of hardcoding tool knowledge, the agent reads API docs, examples, and guides to understand how to invoke unfamiliar tools.
The pattern is praised for extensibility—agents can adapt to new tools without retraining or code changes.
Why This Is Dangerous
The documentation the agent learns from is untrusted input. If docs are compromised, the agent learns compromised behaviors. Malicious usage patterns become part of the agent's understanding of how to use the tool.
Worse, learned behaviors persist. The agent doesn't just follow the bad docs once—it incorporates them into its model of how the tool works for all future uses.
Why this is particularly risky:
- Persistence. Bad learning persists across sessions if the agent maintains tool knowledge.
- Authority inheritance. The agent uses the tool with whatever authority it has—executing what the docs taught it.
- Third-party docs. Tool documentation often comes from third parties. You don't control what's in those docs.
Architecture
Components:
- Documentation source— API docs, examples, guides
- Learning process— agent reads and internalizes docs
- Tool knowledge store— learned usage patterns
- Tool invocation— agent uses tool based on learned knowledge
Trust Boundaries
- Docs → Learning — untrusted docs become learned patterns
- Learning → Knowledge — malicious patterns persist
- Knowledge → Invocation — learned behavior executed with authority
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Documentation poisoning | Malicious examples in tool docs | Agent learns to use tool maliciously |
| Persistent bad behavior | Learned patterns persist across sessions | One-time poisoning, ongoing exploitation |
| Supply chain attack | Compromise third-party documentation source | All agents learning from that source compromised |
| Capability manipulation | Docs describe capabilities tool doesn't have | Agent attempts unauthorized operations |
The ZIVIS Position
- •Docs are untrusted input.Treat documentation the same as any other external input. It can contain injections, malicious examples, or misleading information.
- •Prefer curated tool knowledge.Hardcoded, reviewed tool definitions are more secure than dynamic learning. Less flexible, more secure.
- •Validate learned behaviors.If you must do tool learning, validate the learned patterns before allowing execution. What did the agent learn, and is it safe?
- •Control documentation sources.If the agent learns from docs, you control which docs. Don't let it learn from arbitrary internet sources.
What We Tell Clients
Tool learning trades security for flexibility. Every documentation source is a potential injection vector, and learned behaviors persist.
For high-stakes tools, use hardcoded definitions. If you need dynamic learning, control the documentation sources strictly and validate learned behaviors before execution.
Related Patterns
- MCP— structured tool definitions instead of learned
- Continuous Fine-Tuning— same supply chain issues with training data
- Tool Allowlisting— restrict what tools can be learned/used