← Back to Learn
deploymentmonitoringbest-practices

Canary Deployments for AI Safety Updates

Authensor

A canary deployment routes a small percentage of traffic to the new version of a service while the majority continues using the current version. If the canary exhibits problems, it is rolled back before most users are affected. This pattern is particularly valuable for safety infrastructure updates because a broken safety update could either block all legitimate actions or allow all malicious ones.

What to Canary

Any change to the safety stack is a candidate for canary deployment:

  • Policy engine updates
  • Aegis scanner rule changes
  • Sentinel monitoring threshold adjustments
  • SDK version updates
  • Control plane configuration changes

Setting Up the Canary

Deploy the new version alongside the current version. Route 1 to 5 percent of traffic to the canary. The routing should be deterministic: the same agent session should consistently go to either the canary or the stable version, not switch between them.

Metrics to Compare

Compare the canary against the stable version on key metrics:

  • Error rate: Are more actions failing?
  • Latency: Is policy evaluation slower?
  • Deny rate: Are more legitimate actions being blocked?
  • Allow rate: Are more risky actions being permitted?
  • Scanner performance: Is Aegis scanning taking longer or producing different results?

Automated Analysis

Define acceptance criteria before deployment. If the canary's metrics are within acceptable bounds of the stable version after the observation period, promote the canary. If any metric exceeds bounds, roll back automatically.

canary:
  traffic_percentage: 5
  observation_period: "4h"
  acceptance_criteria:
    error_rate_delta_max: 0.01
    latency_p99_delta_max: "50ms"
    deny_rate_delta_max: 0.02
  on_failure: "rollback"

Blast Radius Management

Keep the canary percentage low enough that failures affect few users but high enough to generate statistically meaningful data. For high-traffic systems, 1% may be sufficient. For lower-traffic systems, 5 to 10% may be necessary.

Promotion

When the canary passes all acceptance criteria, promote it by gradually increasing its traffic share: 5% to 25% to 50% to 100%. Continue monitoring at each stage.

Canary deployments turn every safety update from a leap of faith into a controlled experiment.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides