← Back to Learn
agent-safetyexplainerbest-practices

What is fail-closed design in AI safety?

Authensor

Fail-closed is a design principle where the default behavior, when the system encounters an error or an unrecognized situation, is to deny the action. The opposite is fail-open, where the default is to allow.

Why fail-closed matters for AI agents

AI agents operate with real-world tools. When the safety system encounters a situation it was not designed for, the choice is binary: allow the action (fail-open) or block it (fail-closed).

Fail-open risk: An unrecognized tool call, a policy parsing error, or a network timeout causes the safety layer to skip enforcement. The action goes through unchecked.

Fail-closed behavior: An unrecognized tool call is blocked. A policy parsing error blocks all actions until the policy is fixed. A timeout blocks the action and logs the failure.

The cost of a false positive (blocking a legitimate action) is an inconvenience. The cost of a false negative (allowing a dangerous action) can be data loss, financial damage, or security breach. Fail-closed optimizes for the right side of this tradeoff.

Where fail-closed applies

No matching rule: If no policy rule matches the tool call, the action is denied. This is the opposite of most traditional RBAC systems where "no rule" means "no restriction."

Policy load failure: If the policy file is missing, corrupted, or invalid, all actions are denied until a valid policy is loaded.

Scanner failure: If the content scanner throws an error, the action is denied. A broken scanner should not silently permit content threats.

Network failure: If the SDK cannot reach the control plane to fetch the latest policy, it uses the last cached policy. If there is no cached policy, it denies all actions.

Timeout: If an approval request times out, the action is denied. Unanswered escalations do not auto-approve.

Implementing fail-closed

In code, fail-closed means every function that evaluates safety returns "deny" on error:

function evaluate(envelope: Envelope, policy: Policy): Decision {
  try {
    // Normal policy evaluation
    for (const rule of policy.rules) {
      if (matches(rule, envelope)) {
        return { action: rule.action, reason: rule.reason };
      }
    }
    // No rule matched: deny
    return { action: 'block', reason: 'No matching rule (fail-closed)' };
  } catch (error) {
    // Error during evaluation: deny
    return { action: 'block', reason: `Evaluation error: ${error.message}` };
  }
}

The tradeoff

Fail-closed systems are more conservative. They will occasionally block legitimate actions that the policy author forgot to allow. This is intentional. The friction of writing explicit allow rules forces you to think through what the agent should be able to do. It is better to start restrictive and loosen over time than to start permissive and discover gaps after an incident.

Keep learning

Explore more guides on AI agent safety, prompt injection, and building secure systems.

View All Guides