Unit tests verify that individual safety components work in isolation. Integration tests verify that they work correctly when connected together and when processing realistic agent workflows. A policy engine that passes all unit tests might still fail in integration if the envelope format from the SDK does not match what the engine expects.
Integration tests exercise the full safety pipeline:
Each step must correctly consume the output of the previous step.
Set up a test environment with all safety components running. Use a test database for receipts. Configure Aegis with production-equivalent rules. Load a test policy that covers the scenarios you want to exercise. The environment should mirror production as closely as possible.
Submit a legitimate action envelope that should be allowed. Verify the action is permitted, a receipt is created, and the response contains the correct decision.
Submit an envelope that should be denied by policy. Verify the action is blocked, the denial reason is correct, and a receipt records the denial.
Submit an envelope with parameters that should trigger Aegis. Verify the scan detects the issue, the action is handled according to policy (blocked or flagged), and the scan result appears in the receipt.
Submit an envelope that requires approval. Verify the approval request is created, simulate an approval response, and verify the action proceeds after approval.
Run integration tests in CI on every change to any safety component. Use Docker Compose or similar tooling to spin up the test environment automatically. Tear it down after tests complete.
# docker-compose.test.yml
services:
postgres:
image: postgres:16
control-plane:
build: ./packages/control-plane
depends_on: [postgres]
test-runner:
build: ./tests/integration
depends_on: [control-plane]
Integration tests should also cover failure modes. What happens when the database is unreachable? What happens when Aegis scanning times out? Verify that the system fails closed in each case.
Integration tests are where you discover the gaps between what you think the system does and what it actually does.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides