Jump to pattern

Documentation Is Untrusted Input

Why agents learning new tool usage from docs inherit whatever's in those docs

The Conventional Framing

Tool Learning enables agents to learn how to use new tools from documentation. Instead of hardcoding tool knowledge, the agent reads API docs, examples, and guides to understand how to invoke unfamiliar tools.

The pattern is praised for extensibility—agents can adapt to new tools without retraining or code changes.

Why This Is Dangerous

The documentation the agent learns from is untrusted input. If docs are compromised, the agent learns compromised behaviors. Malicious usage patterns become part of the agent's understanding of how to use the tool.

Worse, learned behaviors persist. The agent doesn't just follow the bad docs once—it incorporates them into its model of how the tool works for all future uses.

Why this is particularly risky:

Persistence. Bad learning persists across sessions if the agent maintains tool knowledge.
Authority inheritance. The agent uses the tool with whatever authority it has—executing what the docs taught it.
Third-party docs. Tool documentation often comes from third parties. You don't control what's in those docs.

Architecture

Components:

Documentation source— API docs, examples, guides
Learning process— agent reads and internalizes docs
Tool knowledge store— learned usage patterns
Tool invocation— agent uses tool based on learned knowledge

Trust Boundaries

Documentation Sources: ├── Official API docs (hopefully trustworthy) ├── Community examples (who wrote these?) ├── Stack Overflow answers (anonymous contributors) └── Third-party tutorials (unknown provenance) All of these feed into agent's understanding. Malicious examples become learned behavior. Agent executes what it learned.

Docs → Learning — untrusted docs become learned patterns
Learning → Knowledge — malicious patterns persist
Knowledge → Invocation — learned behavior executed with authority

Threat Surface

Threat	Vector	Impact
Documentation poisoning	Malicious examples in tool docs	Agent learns to use tool maliciously
Persistent bad behavior	Learned patterns persist across sessions	One-time poisoning, ongoing exploitation
Supply chain attack	Compromise third-party documentation source	All agents learning from that source compromised
Capability manipulation	Docs describe capabilities tool doesn't have	Agent attempts unauthorized operations

The ZIVIS Position

•
Docs are untrusted input.Treat documentation the same as any other external input. It can contain injections, malicious examples, or misleading information.
•
Prefer curated tool knowledge.Hardcoded, reviewed tool definitions are more secure than dynamic learning. Less flexible, more secure.
•
Validate learned behaviors.If you must do tool learning, validate the learned patterns before allowing execution. What did the agent learn, and is it safe?
•
Control documentation sources.If the agent learns from docs, you control which docs. Don't let it learn from arbitrary internet sources.

What We Tell Clients

Tool learning trades security for flexibility. Every documentation source is a potential injection vector, and learned behaviors persist.

For high-stakes tools, use hardcoded definitions. If you need dynamic learning, control the documentation sources strictly and validate learned behaviors before execution.

Related Patterns

MCP— structured tool definitions instead of learned
Continuous Fine-Tuning— same supply chain issues with training data
Tool Allowlisting— restrict what tools can be learned/used

Authoring the Agent Trust Protocol — the open standard for agentic trust attestation, currently under IETF review
Jim Goldman: Salesforce’s first VP of Global Security GRC, FBI Cybercrime Task Force, Purdue cyber forensics founder
Jake Miller: Co-Founder & CEO. 25 years engineering complex enterprise systems, now applied to AI offensive security
Proprietary ZIVIS platform: 120+ adversarial AI attack scenarios, continuous coverage across OWASP Web, API, LLM, and Agentic AI
Mesh Mesh: approved Salesforce sub-processor. Every review stage cleared.