Jump to pattern

APIs Leak Model Behavior

Why model access enables stealing model capabilities through systematic querying

The Conventional Framing

Model extraction attacks use API access to reconstruct model capabilities— training a smaller model to mimic the original through systematic querying. The attacker gains model-like capabilities without model access.

This is primarily a model theft concern, not a direct security vulnerability.

Why Extraction Enables Further Attacks

An extracted model lets attackers study your model's behaviors offline, discover vulnerabilities, and develop exploits without alerting your monitoring systems.

Once attackers have a local copy (or approximation), they can probe it extensively to find jailbreaks, injection techniques, and weaknesses— then deploy those attacks against your production system.

The research lab:

Extraction creates an offline research environment. Attacks developed there can be deployed against the real system with high confidence of success.

Architecture

Components:

Query strategy— systematic API probing
Response collection— gathering model outputs
Distillation training— training model on responses
Exploit development— using extracted model for research

Trust Boundaries

Extraction process: 1. Send thousands of diverse queries 2. Collect all responses 3. Train local model on input-output pairs 4. Local model approximates API behavior Now attacker has: - Offline experimentation environment - No rate limits - No monitoring - Unlimited exploit development time Production attack: Use pre-tested exploits.

Queries → API — extraction queries look normal
Responses → Attacker — responses reveal behavior
Extracted → Research — offline vulnerability research

Threat Surface

Threat	Vector	Impact
Capability theft	Extract and replicate model capabilities	IP theft, competitive disadvantage
Offline vulnerability research	Study extracted model for weaknesses	High-confidence exploits developed undetected
Defense probing	Test defenses on extracted model first	Attacker knows what bypasses defenses
Fingerprinting	Identify model characteristics for targeted attacks	Model-specific exploits crafted

The ZIVIS Position

•
API access is information leakage.Every API response reveals something about model behavior. Sufficient responses enable extraction.
•
Rate limiting slows but doesn't prevent extraction.A determined attacker with time can extract through rate limits. It's cost imposition, not prevention.
•
Monitor for extraction patterns.Systematic querying patterns (broad topic coverage, unusual query diversity) may indicate extraction attempts.
•
Consider watermarking.Detectable patterns in outputs can prove extraction occurred, enabling legal recourse if not prevention.

What We Tell Clients

Model extraction enables offline vulnerability research against an approximation of your model. Attackers can develop exploits without triggering your monitoring.

Rate limiting increases extraction cost. Monitor for unusual query patterns. Accept that some extraction may occur and ensure your defenses work even against attackers with offline copies.

Related Patterns

Rate Limiting— slowing extraction attempts
Audit Logging— detecting extraction patterns