Data exfiltration happens when an AI agent sends sensitive data to an unauthorized destination. This can occur through prompt injection (an attacker redirects the agent) or through misconfiguration (the agent has access to tools that can send data externally). Prevention requires controlling what data the agent can access and where it can send it.
An agent can exfiltrate data through:
The most effective defense is to not give the agent outbound communication tools it does not need:
rules:
# Block all outbound communication
- tool: "http.request"
action: block
reason: "Outbound HTTP requests are not permitted"
- tool: "email.send"
action: block
reason: "Email sending is not permitted"
If the agent does need outbound tools, restrict the destinations:
- tool: "http.request"
action: allow
when:
args.url:
matches: "^https://api\\.yourcompany\\.com/"
- tool: "http.request"
action: block
reason: "Only internal API calls are permitted"
Scan outbound data for sensitive content before it leaves the system:
const guard = createGuard({
policy,
aegis: {
enabled: true,
detectors: ['pii', 'credentials'],
scanOutbound: true,
}
});
If the agent tries to send an API key or personal information through any tool, Aegis detects it and blocks the action.
As a defense-in-depth measure, apply network-level egress controls to the agent's runtime:
These controls operate below the application layer, catching exfiltration attempts that bypass application-level checks.
Separate the agent's read and write capabilities:
An agent that can read customer records but cannot send emails or make HTTP requests has no exfiltration path, regardless of how it is manipulated.
Sentinel can detect exfiltration patterns:
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides