Human-in-the-loop (HITL) is a design pattern where an AI system includes checkpoints that require human review and approval before the system takes certain actions. The human is literally in the loop: the system cannot complete its task without human input at these checkpoints.
AI agents make mistakes. They hallucinate, misinterpret instructions, and get manipulated by prompt injection. For low-stakes tasks, these mistakes are tolerable. For high-stakes tasks, they are not.
A human reviewer catches mistakes that the AI cannot catch in itself:
Pre-execution review: The agent proposes an action and waits for approval before executing it. This is the most common pattern.
Batch review: The agent queues up a set of actions, and a human reviews and approves them as a batch. Efficient for high-volume, similar actions.
Exception-based review: The agent operates autonomously until it encounters an action the policy flags as risky. Only flagged actions go to a human.
Confidence-based escalation: The agent escalates when its own confidence in the action is below a threshold. "I'm not sure this is the right API endpoint, please confirm."
Exception-based HITL maps directly to policy escalation rules:
rules:
# Low-risk: automatic
- tool: "search.web"
action: allow
- tool: "file.read"
action: allow
# Medium-risk: human review
- tool: "file.write"
action: escalate
reason: "File writes require review"
- tool: "email.send"
action: escalate
reason: "Outbound email requires review"
# High-risk: blocked
- tool: "shell.execute"
action: block
HITL slows things down. Every escalation pauses the agent and waits for a human to respond. This creates a bottleneck. The key is to calibrate: escalate too much and you eliminate the productivity benefit of the agent. Escalate too little and you accept more risk.
Start with aggressive escalation (many things require approval) and relax over time as you build confidence in the agent and your policies. Track the approval rate: if 99% of escalations are approved, you can likely convert some of those rules to auto-allow.
HITL only works if the human reviewer actually reads and evaluates the request. "Approve all" is not a review. Present the reviewer with clear context: what the agent wants to do, why the policy flagged it, and what the consequences might be. Make approval a deliberate act, not a rubber stamp.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides