← Back to Learn
monitoringbest-practicesdeployment

Monitoring Alert Rules Template

Authensor

Effective monitoring requires well-tuned alert rules that surface real issues without overwhelming the team with false positives. This template provides starting points for common agent monitoring scenarios. Tune the thresholds based on your agent's baseline behavior.

Action Rate Alerts

alerts:
  - name: "high-action-rate"
    description: "Agent executing actions faster than expected"
    metric: "actions_per_minute"
    condition: "value > baseline_mean + (3 * baseline_stddev)"
    window: "5m"
    severity: "warning"

  - name: "action-rate-spike"
    description: "Sudden increase in action rate"
    metric: "actions_per_minute"
    condition: "rate_of_change > 200%"
    window: "1m"
    severity: "critical"

Error and Denial Alerts

  - name: "high-denial-rate"
    description: "Policy denying more actions than usual"
    metric: "denial_rate"
    condition: "value > 0.3"
    window: "10m"
    severity: "warning"

  - name: "repeated-denied-tool"
    description: "Agent repeatedly attempting a denied action"
    metric: "consecutive_denials_same_tool"
    condition: "value >= 5"
    window: "5m"
    severity: "critical"

  - name: "error-rate-spike"
    description: "Tool execution errors increasing"
    metric: "error_rate"
    condition: "value > 0.1"
    window: "5m"
    severity: "warning"

Behavioral Drift Alerts

  - name: "tool-distribution-shift"
    description: "Agent using a different mix of tools than baseline"
    metric: "tool_usage_distribution"
    condition: "kl_divergence > 0.5"
    window: "1h"
    severity: "warning"

  - name: "new-tool-usage"
    description: "Agent calling a tool it has never used before"
    metric: "unique_tools"
    condition: "new_tool_detected"
    window: "1m"
    severity: "info"

  - name: "session-length-anomaly"
    description: "Agent sessions running longer than expected"
    metric: "session_duration"
    condition: "value > baseline_p99"
    window: "per_session"
    severity: "warning"

Content Safety Alerts

  - name: "aegis-detection-spike"
    description: "Content scanner flagging more content than baseline"
    metric: "aegis_detections_per_minute"
    condition: "value > baseline_mean + (2 * baseline_stddev)"
    window: "10m"
    severity: "warning"

  - name: "pii-exposure-detected"
    description: "PII detected in agent output"
    metric: "pii_detection"
    condition: "count > 0"
    window: "1m"
    severity: "critical"

Start with these rules and adjust thresholds after two weeks of baseline data collection. Alert fatigue is the primary risk: if your team ignores alerts because there are too many, the monitoring system has failed. Tune aggressively to minimize false positives.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides