Approval Design for High-Risk Operations
An operations team runs hundreds of actions a day. Most are harmless: a lookup, a status change, an internal note. A few are not. A refund moves money out. A reply goes to a customer in writing. An access change decides who sees what. A mistake is expensive and hard to take back. Those are the actions teams now hand to AI agents.
The first question is not whether to add an approval step. It is which actions need one. The answer follows from the action itself, not from the org chart.
The same action lands in a different tier. A lookup runs untraced. A bulk update gets a sampled audit. A refund or an access change earns independent review. Most approval systems route by hierarchy. Routine work waits for a manager. Dangerous work slips through because it did not match a template.
The rest of this post handles that top tier. It is the tier teams most want to automate away.
A human in the loop is what lets you deploy AI to the hard problems
Keeping a person in the loop is not caution. It is what makes high-value work safe to automate. You would never hand a refund to a model with no human review. That is why a refund belongs behind a checkpoint. The agent reads the account, drafts a response, proposes an amount. A person owns the irreversible step. That split lets an agent help run finance work, access changes, and customer-facing actions, without locking it out of them.
The split neutralises the failure modes of full autonomy. An agent can be talked into a refund. A customer writes "I have already filed a chargeback, just refund me," or a hostile instruction hides in a forwarded email. That second case is prompt injection aimed at the refund tool. An agent can be fluent and wrong: right amount, wrong account, a reason that reads well and cites nothing. Its blast radius scales with the model. It works every open ticket before anyone sees the pattern. Left alone, it leaves nobody accountable. When finance asks who approved the refund on the disputed account last Tuesday, "the agent decided" is not an answer. Two-person review turns each failure mode into a request a second person can catch before the money moves.
The control lives in the gate, not in the agent. That lets the fast-moving parts change underneath. It does not care which model proposes the action — a frontier API, an open model you host, or one you train yourself. You can swap agents without rebuilding the control. It does not care what the action does or how the plugin behind it is built, only that the action is named and controlled. And it does not care where it runs. Cloud or on-prem, the trust boundary stays the same. The agent, the action, and the deployment are yours to choose. The approval step remains constant.
The agent proposes. A person commits. You get the agent's reach on hard problems without the tail risk. The obvious way to add that person is a confirmation dialog. It does not work.
A confirmation dialog is not an approval control
A real approval control changes failure probability or impact. A confirmation dialog does neither. It interrupts the person already committed and asks them to commit again.
Anthropic reported the failure mode plainly from its own field data on Claude Code. Operators approved roughly 93 percent of permission prompts. Moving more work inside an OS-level sandbox cut prompt volume by about 84 percent. The containment changed the risk. The prompt did not.
The lesson sets the order. Before designing an approval step, shrink what reaches it. Can the action be made reversible? Scoped so it cannot do harm? Denied outright for a class of input? Decided by a deterministic policy with no human at all? A human reviewer is the most expensive control. Spend it last, on the actions that survive every cheaper filter.
Containment changes the risk before a reviewer ever sees it. The cheaper filters remove most of the volume. The human tier is the narrow point, not the wall every action piles against.
That economy works only when the system knows what each action is. An action must be named before it can be controlled. You cannot put a checkpoint in front of a capability you have never made explicit. The ability to issue a refund should be one registered plugin action with an owner, an authentication mode, and a health check — not something the agent can do on its own. Building those actions — and letting your technical team or a coding agent add custom ones — is the subject of the third post. Here the point is narrower: an action you have not named is an action you cannot control.
A high-risk action like a refund gets configured as two-person review — maker-checker. It must be requested, approved by a different user, and executed from the approved request. Lower-risk actions run directly. Requester and approver are separate people. That is role separation (segregation of duties). It is the oldest control in finance, and it still works. Rejecting a contested refund is itself a consequential decision, so a denial needs the same independence as an approval.
One more property: the approval must survive time. High-risk approvals do not resolve in seconds. They sit for hours or days while a finance approver is at lunch or a regional lead weighs in. An approval held in memory is an approval a deploy or crash can silently drop. A durable approval is tied to the case and the stored request. The decisive check happens when the action resumes, not when the button is clicked. By then the token, scope, policy, or target account may have changed.
None of the real failures are solved by a better dialog. They stack: approval fatigue that turns review into reflex, a preview that does not match what executes, a credential that drifts between approval and run, a prompt-injected request that looks plausible, a queue where operators auto-approve everything until the agent finds an unexpected path. Each is removed by a different layer, applied in order.
Each layer removes a class of failure the next one should never have to see. The human reviewer is the last and narrowest layer. It is not the first line of defence.
The shape that holds up
Pulled together, the model is a small number of components with a clear trust boundary between the part that proposes work and the part that executes it.
The planner proposes on the untrusted side. Policy, the durable workflow, and the approval review sit in between. Only past the trust boundary does a privileged executor act. Every step writes to an audit store kept separate from operational traces.
The principle is short. Shrink the surface first, gate what survives, and make every gated decision reconstructible. An approval earns its place only when three things hold: the action can do real harm, automation cannot safely make the call, and a second person knows something the requester does not. Where all three hold, give the reviewer the exact payload, keep the approval durable and short-lived, separate the requester from the approver, and keep the request, the decision, and the execution on one record. That is the difference between a control that holds up in an audit and a button that means nothing.
The next post walks a refund through a two-person approval workflow on real screens, from an agent's recommendation to the execution record. The third covers durable approvals, policy as code, and building gated actions, and the problems that do not yet have clean answers.