Every AI agent that calls tools in production needs a safety layer. Without one, a prompt injection or hallucinated action can reach your filesystem, database, or payment API unchecked. This guide walks through adding guardrails to an existing agent.
Effective guardrails have three layers that run on every tool call:
Create a YAML policy file that declares what tools are allowed, what patterns are blocked, and what requires human approval:
rules:
- tool: "shell.execute"
action: block
when:
args.command:
matches: "rm -rf|mkfs|dd if="
- tool: "email.send"
action: escalate
reason: "Outbound email requires human approval"
- tool: "database.query"
action: allow
when:
args.query:
startsWith: "SELECT"
Every tool invocation passes through the guard function. This is the enforcement point:
import { guard } from '@authensor/sdk';
async function executeTool(name: string, args: Record<string, unknown>) {
const decision = guard(name, args);
if (decision.action === 'block') {
return { error: decision.reason };
}
if (decision.action === 'escalate') {
await requestApproval(decision);
return { pending: true };
}
return await runTool(name, args);
}
Aegis, the content scanner, runs alongside policy evaluation. It detects threats that pattern-matching rules might miss:
import { createAegis } from '@authensor/aegis';
const aegis = createAegis();
const scan = aegis.scan(userInput);
if (scan.threats.length > 0) {
// Block or log the detected threats
}
After deploying guardrails, verify that receipts are being generated for every decision. Each receipt includes the tool name, arguments, policy decision, timestamp, and a hash linking it to the previous receipt. This chain makes retroactive tampering detectable.
With guardrails in place, your agent operates within defined boundaries. Blocked actions never execute. Escalated actions wait for human approval. Allowed actions proceed with a full audit record. The agent itself does not know the guardrails exist, so it cannot attempt to bypass them.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides