Session risk scoring is a technique that assigns a running risk score to an AI agent session based on the actions taken during that session. The score increases when the agent takes risky actions and decreases over time. Policies can reference the risk score to become more or less restrictive based on how the session is going.
Each action contributes to the session risk score:
The score decays over time, so a single risky action does not permanently elevate the session risk.
Session starts: risk = 0
Agent reads files: risk = 0
Agent writes file: risk = 0.1
Agent tries blocked command: risk = 0.4
Time passes: risk decays to 0.3
Agent tries another blocked command: risk = 0.7
Policies can reference the risk score to change behavior dynamically:
rules:
- tool: "file.write"
action: allow
when:
context.riskScore:
lt: 0.5
- tool: "file.write"
action: escalate
when:
context.riskScore:
gte: 0.5
reason: "Elevated session risk requires approval for writes"
When the session starts, file writes are allowed. After the agent accumulates enough risk (from denied actions or suspicious behavior), file writes require approval.
Different events contribute different amounts of risk:
| Event | Risk Weight | |-------|------------| | Allowed action | 0.0 | | Escalated action | 0.1 | | Blocked action | 0.3 | | Aegis threat detected | 0.5 | | Sentinel anomaly alert | 0.4 | | Multiple denials in window | 0.3 |
const guard = createGuard({
policy,
riskScoring: {
enabled: true,
decayRate: 0.01, // Risk decays by 0.01 per second
maxScore: 1.0,
blockThreshold: 0.9, // Auto-block all actions above this score
}
});
Static policies treat every session the same. But a session where the agent has attempted five blocked actions is inherently riskier than a session where every action has been allowed. Risk scoring lets your safety system respond proportionally to the actual threat level in the current session.
This creates a natural escalation path: as an agent behaves more suspiciously, the system becomes more restrictive. This is useful for catching slow-moving attacks where each individual action is not enough to trigger a static rule.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides