Not all safety incidents require the same response. A minor anomaly in an agent's action frequency is different from an active data exfiltration attempt. Severity classification assigns a level to each incident based on its impact, enabling teams to prioritize responses and allocate resources appropriately.
Active harm is occurring or imminent. Examples: confirmed data exfiltration, policy bypass allowing unauthorized actions, safety scanner completely nonfunctional, agent executing actions on production systems without authorization.
Response: Immediate. All available responders engage. Affected agents are shut down. Communication goes to leadership within 15 minutes.
Significant safety degradation but no confirmed active harm. Examples: elevated rate of policy evaluation errors, Aegis scanner returning unexpected results, anomalous agent behavior that has not yet caused damage, approval workflow bypass.
Response: Within 1 hour. On-call engineer investigates. Affected agents may be throttled or put into degraded mode.
Potential safety concern that requires investigation. Examples: unusual patterns in audit logs, minor anomaly detection alerts, single agent showing behavioral drift, failed health checks on non-critical agents.
Response: Within 4 hours. Assigned during business hours. No immediate operational impact.
Informational findings that should be tracked. Examples: dead policy rules identified during audit, minor configuration drift, documentation gaps, non-recurring false positive alerts.
Response: Tracked in backlog. Addressed during regular maintenance cycles.
Classify based on three factors:
Impact scope: How many agents, users, or systems are affected?
Safety degradation: Is the safety posture weakened? By how much?
Active exploitation: Is someone actively exploiting the issue, or is it a latent vulnerability?
Authensor's Sentinel monitoring can automatically classify incidents based on preconfigured rules. Map monitoring alert types to severity levels. Critical alerts trigger SEV-1 classification automatically. Lower-severity alerts create tracked incidents for review.
Define clear escalation paths for each severity level. SEV-1 goes directly to incident commander. SEV-2 goes to the on-call engineer with escalation to incident commander if not resolved within the SLA. Document these paths and test them regularly.
Clear severity classification prevents both under-reaction and over-reaction. Both waste resources and erode trust.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides