Prompt injection is the primary attack vector against AI agents in production. Defending against it requires multiple layers because no single technique catches everything. This guide covers a production-grade defense strategy.
Scan all text before the agent processes it. This includes user messages, tool responses, and retrieved documents.
const guard = createGuard({
policy,
aegis: {
enabled: true,
threshold: 0.7,
detectors: ['prompt_injection'],
scanResponses: true, // Also scan tool responses
}
});
Aegis catches known injection patterns: instruction overrides, role impersonation, delimiter attacks, and encoding tricks. Set the threshold based on your tolerance for false positives.
Even if an injection succeeds and changes the agent's goal, the policy engine blocks unauthorized actions. The agent might want to exfiltrate data, but if the policy blocks outbound API calls, the attack fails.
rules:
# Only allow tools the agent actually needs
- tool: "search.web"
action: allow
- tool: "file.read"
action: allow
when:
args.path:
startsWith: "/data/public/"
# Block everything else
- tool: "*"
action: block
reason: "Tool not in allowlist"
Least-privilege policies limit the blast radius of a successful injection.
Scan the agent's output before it reaches the user or external systems. A successful injection might cause the agent to include sensitive data in its response.
const output = await agent.generate(input);
const scan = aegis.scan(output);
if (scan.threats.some(t => t.type === 'pii' || t.type === 'credentials')) {
return "I cannot provide that information.";
}
Track the agent's behavior pattern. A successful injection often causes a detectable change: different tools being called, higher denial rate, unusual argument patterns.
Sentinel detects these shifts in real time. When an anomaly is detected, the system can tighten policies or terminate the session.
Reduce the attack surface structurally:
Regularly test with known injection techniques:
Update your Aegis patterns as new techniques emerge. Run red team exercises quarterly.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides