← Back to Learn
policy-enginebest-practicestutorial

Regression Testing for AI Safety Rules

Authensor

A safety regression occurs when a change to the system weakens an existing protection. A policy update might accidentally remove a deny rule. A code change might alter how the engine handles certain conditions. Regression testing catches these regressions before they reach production.

Building a Regression Suite

The regression suite is a collection of test cases that represent important safety behaviors. Each test case documents a specific threat or requirement and verifies that the system handles it correctly.

Sources for regression test cases:

Past incidents: Every safety incident should produce at least one regression test that verifies the fix. If an agent once exfiltrated data through a specific action pattern, add a test that verifies that pattern is now blocked.

Compliance requirements: Each compliance requirement should have at least one test. If the EU AI Act requires human oversight for high-risk decisions, add a test that verifies the approval workflow triggers for those decisions.

Red team findings: Each vulnerability discovered during red team exercises should produce a test. This ensures that known attack vectors remain defended.

Test Format

Each regression test should include:

regression_tests:
  - id: "REG-001"
    description: "Block data export to external destinations"
    source: "INC-2025-0042"
    envelope:
      action: "data.export"
      resource: "/data/customers/*"
      destination: "external-api.example.com"
    expected_decision: "deny"

  - id: "REG-002"
    description: "Require approval for high-risk financial actions"
    source: "EU-AI-ACT-ART-14"
    envelope:
      action: "payment.send"
      amount: 5000
    expected_decision: "require_approval"

Running Regression Tests

Run the full regression suite in CI on every change to policies, the policy engine, safety scanners, or the control plane. A failing regression test should block deployment.

Maintaining the Suite

The regression suite grows over time. Review it periodically to remove tests for retired features and update tests for changed requirements. A large, well-maintained regression suite is a valuable asset. A large, unmaintained suite is a burden.

Coverage Gaps

Compare the regression suite against the current threat model. Are there known threats without corresponding tests? Are there compliance requirements without corresponding tests? Fill gaps proactively.

Every safety regression that reaches production is a test that should have existed. Write the test now.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides