Prompts About Prompts Are Still Prompts
Why using models to generate or optimize prompts inherits all prompt vulnerabilities
The Conventional Framing
Meta-prompting uses models to generate, refine, or optimize prompts. The model writes prompts for itself or other models, enabling automated prompt engineering and optimization.
The pattern is powerful for discovering effective prompting strategies without manual iteration.
Why Meta-Level Doesn't Mean Meta-Safe
A model generating prompts is still processing input in context. If that context contains injection, the generated prompt may contain the injection. You've automated prompt construction—including automated injection propagation.
The generated prompt then runs against a model, carrying any injections that were embedded during generation. Two stages of vulnerability.
The instruction generation problem:
If an attacker can influence what prompts get generated, they control what instructions future model calls receive. Meta-prompting is instruction injection one level removed.
Architecture
Components:
- Meta-model— generates or optimizes prompts
- Prompt template— structure for generation
- Generated prompt— output becomes instruction
- Execution model— runs generated prompt
Trust Boundaries
- Input → Meta-model — injection influences generation
- Meta-model → Prompt — injection embedded in prompt
- Prompt → Execution — malicious prompt runs
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Prompt injection by proxy | Inject content that becomes part of generated prompt | Attacker-controlled instructions reach execution model |
| Optimization hijacking | Influence what the meta-model optimizes for | Prompts optimized for attacker goals |
| Template manipulation | Inject into prompt template structure | All generated prompts carry injection |
| Feedback loop exploitation | If meta-prompting uses execution feedback, poison the feedback | Meta-model learns to generate malicious prompts |
The ZIVIS Position
- •Abstraction doesn't provide isolation.Meta-prompting adds a layer but not a security boundary. Injection at any layer can propagate through all layers.
- •Generated prompts are untrusted.Treat model-generated prompts like user input. The model generated them from potentially compromised context.
- •Validate generated instructions.Before executing a generated prompt, validate it doesn't contain injected instructions or unexpected patterns.
- •Limit generation scope.Constrain what the meta-model can include in prompts. Allowlist acceptable instruction patterns.
What We Tell Clients
Meta-prompting automates prompt creation but also automates injection propagation. If adversarial input influences the meta-model, it can embed attacks in every generated prompt.
Don't trust generated prompts more than user input. Validate them before execution. Consider constraining what instructions the meta-model can generate.
Related Patterns
- Prompt Chaining— sequential prompts with similar issues
- Reflection— self-analysis with same context problems