An AI agent pipeline has multiple stages where safety checks can be placed. Each stage catches different threats. Placing checks at every stage creates defense in depth.
User Input → Input Processing → LLM Reasoning → Tool Call → Tool Execution → Response → User Output
Safety checks can be placed at four points:
Where: Before the agent processes user input. What it catches: Direct prompt injection, PII in user input, malicious payloads.
const input = receiveUserInput();
const scan = aegis.scan(input);
if (scan.threats.length > 0) {
return "I cannot process that input.";
}
agent.process(input);
Also scan retrieved documents (RAG) at this stage:
const documents = await retrieveDocuments(query);
const safeDocuments = documents.filter(doc => {
const scan = aegis.scan(doc.content);
return scan.threats.length === 0;
});
Where: After the LLM decides to call a tool, before the tool executes. What it catches: Unauthorized actions, policy violations, tool misuse, budget overruns.
This is the primary enforcement point:
const decision = guard(toolName, args);
if (decision.action !== 'allow') {
// Do not execute the tool
}
Pre-execution is where the policy engine, rate limiting, budget controls, and approval workflows operate.
Where: After the tool executes, before the result is returned to the agent. What it catches: Indirect prompt injection in tool responses, sensitive data in tool output.
const result = await executeTool(toolName, args);
const scan = aegis.scan(JSON.stringify(result));
if (scan.threats.length > 0) {
// Do not pass the response to the agent
return { error: "Tool response contained unsafe content" };
}
return result;
This is critical for defending against indirect prompt injection. A tool response from a compromised server or a document with embedded instructions is caught here.
Where: Before the agent's response reaches the user or external system. What it catches: Leaked credentials, PII in responses, harmful content.
const response = await agent.generate();
const filtered = filterOutput(response);
sendToUser(filtered);
For a minimum viable safety deployment:
For production:
Each stage adds latency:
| Stage | Typical latency | Impact | |-------|----------------|--------| | Input scanning | <1ms | Negligible | | Pre-execution | <1ms | Negligible | | Post-execution | <1ms | Negligible | | Output filtering | <1ms | Negligible |
Total overhead for all four stages: under 5ms. LLM inference takes 500ms to 5 seconds. The safety checks are invisible in the overall pipeline latency.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides