Data poisoning is an attack vector where an adversary introduces malicious data into a dataset used by an AI system. The goal is to corrupt the system's behavior in a targeted way: causing it to misclassify specific inputs, follow injected instructions, or develop blind spots to certain types of content.
Data poisoning attacks target two main surfaces:
Training data poisoning introduces malicious examples into the dataset used to train or fine-tune a model. If an attacker can contribute to a model's training data, they can influence the model's learned behavior. This is particularly relevant for models fine-tuned on user-generated content, open-source datasets, or scraped web data.
Retrieval data poisoning targets the knowledge bases used in RAG pipelines. An attacker who can modify documents in a vector database, wiki, or document store can inject content that the agent will retrieve and act on. This is the more practical attack vector for deployed agents, because modifying retrieval data is often easier than modifying training data.
The effects of data poisoning include:
Backdoor triggers. The model behaves normally on most inputs but produces specific harmful outputs when it encounters a particular trigger pattern planted during poisoning.
Degraded safety. Poisoned training data can weaken a model's safety training, making it more likely to comply with harmful requests.
Instruction injection. Poisoned retrieval data can contain hidden instructions that the agent follows when it retrieves the poisoned document.
Defending against data poisoning requires controls at multiple levels. For training data, provenance tracking and data quality auditing help detect contamination. For retrieval data, content scanning should inspect retrieved documents before they enter the agent's context. Access controls on knowledge bases limit who can modify retrieval data. Behavioral monitoring can detect sudden changes in agent behavior that might indicate successful poisoning.
Authensor's Aegis scanner inspects content flowing through the agent pipeline, providing a defense layer against poisoned retrieval data that contains prompt injection or other malicious patterns.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides