← Back to Learn
guardrailstutorialagent-safetybest-practices

How to add safety guardrails to AI agents

Authensor

Every AI agent that calls tools in production needs a safety layer. Without one, a prompt injection or hallucinated action can reach your filesystem, database, or payment API unchecked. This guide walks through adding guardrails to an existing agent.

The three-layer approach

Effective guardrails have three layers that run on every tool call:

  1. Policy evaluation - A deterministic rules engine checks the action against your YAML policy before execution.
  2. Content scanning - The tool arguments are scanned for prompt injection patterns, credential leaks, and PII.
  3. Audit logging - Every decision (allow, deny, escalate) is recorded in a hash-chained receipt.

Step 1: Define your policy

Create a YAML policy file that declares what tools are allowed, what patterns are blocked, and what requires human approval:

rules:
  - tool: "shell.execute"
    action: block
    when:
      args.command:
        matches: "rm -rf|mkfs|dd if="
  - tool: "email.send"
    action: escalate
    reason: "Outbound email requires human approval"
  - tool: "database.query"
    action: allow
    when:
      args.query:
        startsWith: "SELECT"

Step 2: Wrap tool calls with guard()

Every tool invocation passes through the guard function. This is the enforcement point:

import { guard } from '@authensor/sdk';

async function executeTool(name: string, args: Record<string, unknown>) {
  const decision = guard(name, args);

  if (decision.action === 'block') {
    return { error: decision.reason };
  }

  if (decision.action === 'escalate') {
    await requestApproval(decision);
    return { pending: true };
  }

  return await runTool(name, args);
}

Step 3: Enable content scanning

Aegis, the content scanner, runs alongside policy evaluation. It detects threats that pattern-matching rules might miss:

import { createAegis } from '@authensor/aegis';

const aegis = createAegis();
const scan = aegis.scan(userInput);

if (scan.threats.length > 0) {
  // Block or log the detected threats
}

Step 4: Verify your audit trail

After deploying guardrails, verify that receipts are being generated for every decision. Each receipt includes the tool name, arguments, policy decision, timestamp, and a hash linking it to the previous receipt. This chain makes retroactive tampering detectable.

What to expect

With guardrails in place, your agent operates within defined boundaries. Blocked actions never execute. Escalated actions wait for human approval. Allowed actions proceed with a full audit record. The agent itself does not know the guardrails exist, so it cannot attempt to bypass them.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides