Extracted Entities Carry Extraction Attacks
Why extracting and storing entities from conversation creates structured injection vectors
The Conventional Framing
Entity memory extracts and stores information about entities (people, places, things) mentioned in conversation. This enables the model to remember facts about specific entities across turns.
The pattern creates structured, queryable memory from unstructured conversation.
Why Entity Extraction Is Manipulable
Entity extraction is performed by the model on potentially adversarial content. Attackers can inject fake entities or manipulate what gets extracted about real entities.
Extracted entities persist and are retrieved when relevant—creating a persistence mechanism for carefully crafted injections disguised as facts about entities.
The fact injection:
"User mentioned their admin password is 'secret123'" becomes a stored fact about the user entity. Future queries about the user retrieve this "fact."
Architecture
Components:
- Entity extractor— identifies entities in text
- Fact extractor— extracts facts about entities
- Entity store— persists entity information
- Retrieval— fetches relevant entities for context
Trust Boundaries
- Conversation → Extractor — extractor processes adversarial content
- Extractor → Storage — extracted facts persisted
- Storage → Retrieval — fake facts retrieved as real
Threat Surface
| Threat | Vector | Impact |
|---|---|---|
| Fact injection | Inject fake facts about entities | False information persists and influences future responses |
| Entity creation | Inject mentions of fake entities | Attacker-controlled entities stored and retrieved |
| Fact poisoning | Inject facts that override legitimate entity information | Real entities have corrupted stored information |
| Retrieval manipulation | Craft entities that get retrieved for many queries | Injection affects broad range of future interactions |
The ZIVIS Position
- •Extracted facts are model interpretations.Entity extraction is model-mediated. The extractor can be influenced to extract malicious 'facts'.
- •Validate before persistence.Don't persist extracted entities without validation. Check for unusual or sensitive content.
- •Separate extraction context.If possible, extract entities in a context that doesn't include the full conversation, limiting injection influence.
- •Treat retrieved facts as claims.Facts from entity memory are claims from past (possibly compromised) extraction, not verified truth.
What We Tell Clients
Entity memory creates structured, persistent storage from unstructured conversation—but the extraction is model-mediated and can be manipulated.
Validate extracted entities before storing. Treat retrieved entity facts as claims, not verified information. Consider what injection in entity extraction could persist.
Related Patterns
- Graph RAG— graph-based entity relationships
- Conversation Summary— unstructured memory alternative