Read enough federal responsible-AI policies and a phrase starts to feel like wallpaper: there will be a human in the loop. It appears in every governance document, every oversight framework, every assurance that the AI will be used responsibly. And in a large share of cases it is doing no real work, because the policy never answers the questions that would make it true. Who is that human? What are they trained to do? How much time does each decision give them? How many of them does the workload actually require? Human-in-the-loop is written as a checkbox and it is actually a staffing model — and the agencies that treat it as the former are the ones whose oversight quietly fails in production.

When the loop is a checkbox

The checkbox version of human-in-the-loop satisfies the policy and not the purpose. It looks like this: the AI makes a recommendation, a human clicks approve, the box is checked. On paper there is human oversight. In practice, if the human has two seconds per decision, no training in what a wrong recommendation looks like, and a queue measured in thousands, the human is a rubber stamp and the oversight is fictional. The loop exists; the judgment does not.

This failure mode is common because it is invisible until it is tested. The metrics look fine — every decision had a human approval — right up until a wrong AI recommendation sails through the human step and reaches a citizen, and the review asks what the human actually did. The honest answer is that the human was given a checkbox and no realistic ability to exercise judgment. The policy was satisfied. The mission was not protected.

"A human with two seconds per decision, no training, and a queue of thousands isn't a safeguard. They're a rubber stamp the policy mistook for oversight."

The loop is actually three different jobs

Part of why human-in-the-loop gets implemented badly is that it names three genuinely different jobs as if they were one. Each requires different people, training, and staffing.

A policy that says 'human in the loop' without specifying which of these three it means has not designed oversight; it has gestured at it. Each role has a different cost, a different skill profile, and a different failure mode when it is understaffed.

The staffing math the business case skips

Here is the calculation most federal AI business cases never run: how many humans does responsible operation of this system actually require, and what does that cost? The math is unforgiving and it is usually the reason the business case looks better than the reality.

If a system processes a high volume of decisions and each needs a meaningful human review, the reviewer headcount can rival the savings the AI was supposed to deliver. Agencies discover this after deployment: the AI is fast, but doing oversight properly requires a review workforce nobody budgeted, so either the budget breaks or — far more often — the oversight quietly degrades to the checkbox version to fit the staffing that was actually funded. The system stays in production; the human-in-the-loop becomes a fiction; and the gap between the policy and the practice is exactly the headcount nobody costed.

Designing the human role deliberately

Human-in-the-loop done well is a designed role, costed and staffed like any other part of the system. The agencies that get it right make a consistent set of moves.

Treating oversight as delivery

The reframe that makes human-in-the-loop real is to stop treating it as a governance promise and start treating it as a delivery requirement — a staffed, trained, costed function that ships with the system, not a sentence in the policy. At FCI this is how we think about responsible AI generally: oversight is a deliverable handed over with the model, not an assurance offered after it. An agency that designs the human role deliberately, costs it honestly, and staffs it adequately has oversight that holds when it is tested. An agency that writes 'human in the loop' into the policy and discovers the staffing cost after deployment has a checkbox that satisfies the auditor right up until the moment it matters. The loop was never a checkbox. It was always a staffing model, and the agencies that field trustworthy AI are the ones that budgeted for it as one.[2]