APIs Leak Model Behavior

Why model access enables stealing model capabilities through systematic querying

The Conventional Framing

Model extraction attacks use API access to reconstruct model capabilities— training a smaller model to mimic the original through systematic querying. The attacker gains model-like capabilities without model access.

This is primarily a model theft concern, not a direct security vulnerability.

Why Extraction Enables Further Attacks

An extracted model lets attackers study your model's behaviors offline, discover vulnerabilities, and develop exploits without alerting your monitoring systems.

Once attackers have a local copy (or approximation), they can probe it extensively to find jailbreaks, injection techniques, and weaknesses— then deploy those attacks against your production system.

The research lab:

Extraction creates an offline research environment. Attacks developed there can be deployed against the real system with high confidence of success.

Architecture

Components:

  • Query strategysystematic API probing
  • Response collectiongathering model outputs
  • Distillation trainingtraining model on responses
  • Exploit developmentusing extracted model for research

Trust Boundaries

Extraction process: 1. Send thousands of diverse queries 2. Collect all responses 3. Train local model on input-output pairs 4. Local model approximates API behavior Now attacker has: - Offline experimentation environment - No rate limits - No monitoring - Unlimited exploit development time Production attack: Use pre-tested exploits.
  1. Queries → APIextraction queries look normal
  2. Responses → Attackerresponses reveal behavior
  3. Extracted → Researchoffline vulnerability research

Threat Surface

ThreatVectorImpact
Capability theftExtract and replicate model capabilitiesIP theft, competitive disadvantage
Offline vulnerability researchStudy extracted model for weaknessesHigh-confidence exploits developed undetected
Defense probingTest defenses on extracted model firstAttacker knows what bypasses defenses
FingerprintingIdentify model characteristics for targeted attacksModel-specific exploits crafted

The ZIVIS Position

  • API access is information leakage.Every API response reveals something about model behavior. Sufficient responses enable extraction.
  • Rate limiting slows but doesn't prevent extraction.A determined attacker with time can extract through rate limits. It's cost imposition, not prevention.
  • Monitor for extraction patterns.Systematic querying patterns (broad topic coverage, unusual query diversity) may indicate extraction attempts.
  • Consider watermarking.Detectable patterns in outputs can prove extraction occurred, enabling legal recourse if not prevention.

What We Tell Clients

Model extraction enables offline vulnerability research against an approximation of your model. Attackers can develop exploits without triggering your monitoring.

Rate limiting increases extraction cost. Monitor for unusual query patterns. Accept that some extraction may occur and ensure your defenses work even against attackers with offline copies.

Related Patterns