← Back to Learn
agent-safetybest-practicescontent-safety

Preventing data exfiltration by AI agents

Authensor

Data exfiltration happens when an AI agent sends sensitive data to an unauthorized destination. This can occur through prompt injection (an attacker redirects the agent) or through misconfiguration (the agent has access to tools that can send data externally). Prevention requires controlling what data the agent can access and where it can send it.

Exfiltration vectors

An agent can exfiltrate data through:

  • HTTP requests: Calling an API that sends data to an external server
  • Email: Sending data via an email tool
  • File writes: Writing data to a location accessible to an attacker
  • MCP tool responses: Returning sensitive data through a compromised tool
  • Encoded in search queries: Hiding data in search queries that are logged externally
  • DNS queries: Encoding data in DNS lookups (advanced)

Defense: Restrict outbound tools

The most effective defense is to not give the agent outbound communication tools it does not need:

rules:
  # Block all outbound communication
  - tool: "http.request"
    action: block
    reason: "Outbound HTTP requests are not permitted"
  - tool: "email.send"
    action: block
    reason: "Email sending is not permitted"

If the agent does need outbound tools, restrict the destinations:

- tool: "http.request"
  action: allow
  when:
    args.url:
      matches: "^https://api\\.yourcompany\\.com/"
- tool: "http.request"
  action: block
  reason: "Only internal API calls are permitted"

Defense: Output scanning

Scan outbound data for sensitive content before it leaves the system:

const guard = createGuard({
  policy,
  aegis: {
    enabled: true,
    detectors: ['pii', 'credentials'],
    scanOutbound: true,
  }
});

If the agent tries to send an API key or personal information through any tool, Aegis detects it and blocks the action.

Defense: Network-level controls

As a defense-in-depth measure, apply network-level egress controls to the agent's runtime:

  • Firewall rules that only allow connections to approved destinations
  • DNS filtering that blocks unknown domains
  • Proxy all outbound traffic through an inspecting proxy

These controls operate below the application layer, catching exfiltration attempts that bypass application-level checks.

Defense: Read/write separation

Separate the agent's read and write capabilities:

  • One policy for tools that read data (permissive)
  • A stricter policy for tools that write or send data (restrictive)

An agent that can read customer records but cannot send emails or make HTTP requests has no exfiltration path, regardless of how it is manipulated.

Monitoring for exfiltration

Sentinel can detect exfiltration patterns:

  • Unusual increase in outbound tool calls
  • Large argument sizes (bulk data in tool parameters)
  • Tool calls to previously unused destinations
  • Rapid sequence of read followed by write operations

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides