System Prompts Are Not Secret
Why treating system prompts as confidential creates false security assumptions
The Conventional Framing
Prompt leaking refers to extracting system prompts or hidden instructions through clever queries. Organizations often try to protect system prompts as proprietary or security-relevant information.
Preventing prompt leakage is treated as a security measure.
Why System Prompt Secrecy Is the Wrong Goal
If your security depends on the attacker not knowing your system prompt, you have security-through-obscurity. Assume attackers will extract it— they usually can—and design security that works anyway.
The focus on preventing prompt leaking often distracts from more important security measures. Prompt secrecy feels like security but isn't.
The false boundary:
System prompts aren't privileged—they're text that happens to be at the start of context. The model doesn't have a secure boundary between "system" and "user." Treating them as secret creates misplaced confidence.
Architecture
Components:
- System prompt— initial instructions in context
- Extraction queries— attempts to reveal prompt
- Prompt defenses— instructions not to reveal prompt
- Leak detection— identifying when prompt leaked
Trust Boundaries
- Prompt → Context — prompt is just first text in context
- Instruction → Compliance — 'don't reveal' is just another instruction
- Extraction → Leakage — creative queries bypass defenses
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Prompt extraction | Various techniques to get model to output prompt | System prompt revealed to attacker |
| Embedded secret exposure | Secrets in prompts are extractable | API keys, credentials, sensitive info leaked |
| Defense mapping | Extracted prompt reveals defense strategies | Attacker knows what to bypass |
| False security confidence | Believing prompt is secret when it isn't | Over-reliance on obscurity instead of real controls |
The ZIVIS Position
- •Don't put secrets in prompts.API keys, credentials, sensitive data in system prompts will eventually be extracted. Never put secrets there.
- •Assume prompts are public.Design your prompt as if it will be read by attackers. Because it probably will be.
- •Focus on real security measures.Prompt secrecy is not a security boundary. Spend effort on architectural controls, not hiding prompts.
- •Prompts can contain defense guidance.It's fine to have defensive instructions in prompts. Just don't rely on their secrecy—they should work even when known.
What We Tell Clients
System prompts are not secret and should not contain secrets. Assume attackers can and will extract them through various techniques.
Focus security efforts on controls that work even when your prompt is known. Never put API keys, credentials, or sensitive data in system prompts.
Related Patterns
- Prompt Hardening— making prompts resistant, not secret
- Canary Tokens— detecting leakage, not preventing it