← Back to Learn
policy-engineexplainerbest-practices

What is shadow policy evaluation?

Authensor

Shadow policy evaluation is a testing technique where you run a new policy alongside your active policy without enforcing the new one. Both policies evaluate every tool call, but only the active policy's decision takes effect. The shadow policy's decisions are logged for comparison.

Why shadow evaluation matters

Changing safety policies in production is risky. A rule that is too permissive could allow harmful actions. A rule that is too restrictive could break agent functionality. Shadow evaluation lets you test changes against real production traffic before they take effect.

How it works

  1. Your active policy evaluates every tool call and enforces the decision
  2. The shadow policy evaluates the same tool call and logs the decision
  3. Both decisions are recorded in the receipt
  4. You compare the two decisions over time
const guard = createGuard({
  policy: activePolicy,
  shadowPolicy: candidatePolicy,
});

const decision = guard('file.write', { path: '/tmp/output.csv' });
// decision.action is from the active policy (enforced)
// decision.shadow.action is from the candidate policy (logged only)

What to look for

After running shadow evaluation for a period, analyze the differences:

  • New blocks: Actions the shadow policy would block that the active policy allows. These might be security improvements.
  • New allows: Actions the shadow policy would allow that the active policy blocks. These might reduce false positives.
  • Changed escalations: Actions that change between escalate and allow/block.

If the shadow policy produces better results (fewer false positives, no new security gaps), promote it to active.

Typical workflow

1. Write new policy
2. Deploy as shadow (shadow=true)
3. Run for 24-48 hours
4. Analyze differences
5. Adjust rules based on findings
6. Repeat steps 3-5 until satisfied
7. Promote to active

Configuration

const guard = createGuard({
  policy: loadYaml('./policies/active.yaml'),
  shadowPolicy: loadYaml('./policies/candidate.yaml'),
  onShadowDiff: (active, shadow, envelope) => {
    metrics.increment('shadow.diff', {
      tool: envelope.tool,
      active: active.action,
      shadow: shadow.action,
    });
  }
});

When to use shadow evaluation

  • Before any policy change in production
  • When onboarding a new agent type
  • When adding new tools to an existing agent
  • When tightening policies after a security review
  • When migrating from one policy format to another

Shadow evaluation adds minimal overhead because the policy engine is synchronous and runs in microseconds. Evaluating two policies instead of one doubles the evaluation time, but microseconds doubled is still microseconds.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides