Every AI agent operating in a federal workflow is generating records. The model produces transcripts. The workflow produces decision logs. The reasoning chain produces intermediate outputs. The agent's interactions with federal systems produce a complete audit trail of every prompt, every retrieval, every action taken. Most of this content meets the federal definition of a record.[1] Almost no agency has scheduled it for retention. The volume is exponential, the federal records definition has not moved, and the audit cycle that comes for unscheduled material is two years out. The window for getting ahead of this is now.

What an AI agent actually generates

The output of a federal AI workflow is not one artifact. It is a cascade of artifacts, most of which agencies have not yet classified against the federal records definition.

A typical agentic AI workflow inside a federal agency produces, at minimum: the user's original prompt or question, the agent's interpretation of the request, the retrieval queries the agent issued against agency systems, the records and documents the agent retrieved, the intermediate reasoning chain the agent followed, any tool calls the agent made, the final response delivered to the user, the user's reaction or follow-up, and the operational metadata wrapping all of it — timestamps, identity assertions, system events, escalation triggers. Each item is generated automatically. Each item persists somewhere — in logs, in databases, in observability tooling, in the agent platform's own internal storage. Each item potentially documents a federal decision or transaction.

Chart 01 · What an agentic workflow actually produces

One federal AI request produces nine categories of artifact. Federal records officers have classified two.

The model produces an answer. The workflow produces an evidence trail. Most of the trail is unscheduled.

User request INPUT Agent processes REASONING + TOOL CALLS RECORD Original prompt Documents intent Decision log What the agent decided Final response Delivered to user GRAY ZONE Retrieval queries What records the agent read Reasoning trace How the agent thought Tool call log What the agent did User reaction Follow-up, override ADMINISTRATIVE Identity assertions Who, when, where System events Errors, escalations 3 ARTIFACTS 4 ARTIFACTS 2 ARTIFACTS
The gray zone is most of the volumeThree artifacts are clearly records. Two are administrative. The remaining four sit in a gray zone where agency-specific classification has to happen — and that's where most federal agencies have not yet acted.
Categories shown are typical of an agentic federal AI workflow with retrieval, reasoning, and tool-use capabilities. Classification of gray-zone artifacts depends on the specific use case, the agency's records retention schedule, and whether the artifact documents a federal decision or transaction.
FCI Advisory framework, derived from federal AI deployment observation

The federal records definition at 44 U.S.C. § 3301 is broad and has not been narrowed by recent regulation. "All recorded information, regardless of form or characteristics, made or received by a Federal agency under Federal law or in connection with the transaction of public business" applies to content the agency produces via an AI workflow exactly as it applies to email, contracts, and signed forms. The agent did not invent a new category of content; it accelerated the production of an old one.

The federal records definition did not change. The volume did.

NARA's records framework was built against an operational tempo where federal records were generated by federal employees taking deliberate actions over hours or days. A contract gets drafted. A determination gets signed. An email gets sent. The volume of records produced by an agency in a given week was bounded by the number of records-eligible decisions the agency's workforce made that week.

That bound is gone. An agentic AI workflow can produce records at machine speed — hundreds of decision logs an hour, thousands of reasoning traces in a day, millions of prompt-response pairs across a year. The volume is no longer human-bounded; it is compute-bounded. And the federal records governance machinery that was designed for the slower tempo is now receiving a stream of records-eligible content at orders of magnitude beyond what it was sized for.

Chart 02 · Records-eligible content under federal management

Federal AI workloads will produce more records-eligible content in 2027 than all federal IT systems combined produced in 2020.

The records governance machinery was sized for the dotted line. The solid line is what's actually coming.

10× VOLUME INDEXED 2020 2021 2022 2023 2024 2025 2026 2027 2028 AI 9.5× Legacy 1.4× crossover ≈ FY26
The crossover is months awayAI-generated records-eligible content overtakes traditional IT system output around FY26 by directional volume. The governance machinery built for the legacy curve is now operating against the AI curve, and the records-management staffing was sized against neither.
Indexed volume of records-eligible content under federal management, with 2020 traditional-IT output normalized to 1×. AI volume reflects projected output from federal agentic deployments and chat-style AI services. Specific agency curves vary; directional shape is consistent across the FCI engagement base.
FCI Advisory projection, calibrated against current federal AI procurement and deployment trajectory

The projected volume curve is not subtle. Federal AI workloads scaling through the next four years will generate more records-eligible content than every federal IT system in the prior decade combined. None of that content currently has a default home in the agency's records management architecture. The default behavior — content sitting in agent platform logs, observability stacks, or model-vendor infrastructure — is operational expedience, not records compliance.

"Federal records governance was designed against human tempo. AI agents work at compute tempo. The federal records definition has not changed; the volume of records-eligible content arriving at it has, and the governance machinery is now operating in conditions it was not built for."

The retention question nobody scheduled

The mechanical step that should follow records-eligibility classification is retention scheduling — how long the content is retained, where it is stored, how it gets dispositioned, how it is preserved in formats that remain readable through the retention window. Federal records officers have done this work for decades against email, paper records, electronic case files, and similar artifacts.[2] The work has not yet been done against AI-generated artifacts.

The schedule design problem is non-trivial. Some categories of AI output are clearly records and require retention against existing schedules — final decisions, formal communications, official outputs. Other categories are clearly transitory — internal reasoning chains the agent did not surface, retrieval queries the agent later refined or abandoned. Many categories sit in a gray zone — intermediate decision logs that document how an agent arrived at a final action, transcripts of agent-to-system interactions that may be needed for audit, prompt-response pairs that document operational decisions even if the user never saw them.

Chart 03 · Federal agencies with retention schedules in place

Five categories of AI-generated record. Two have any meaningful agency-level scheduling.

The categories that resemble traditional outputs (final responses, communications) are getting scheduled. Everything else is unscheduled by default.

Final AI responses
delivered outputs, communications
32% / 60%
Agent decision logs
what the agent decided and why
14% / 33%
Prompt-response pairs
what was asked and answered
7% / 21%
Reasoning traces
intermediate model output
4% / 13%
Training-data lineage
provenance of model inputs
6% / 17%
SCHEDULED IN PRODUCTION SCHEDULE DRAFTED, NOT IN PRODUCTION
The gap is structuralRoughly two-thirds of federal agencies have no schedule for the four less-visible categories. Inconsistency across agencies — same artifact treated differently at different agencies — is the dominant pattern, and federal records governance does not tolerate inconsistency for long.
Percentage of federal agencies with AI deployments in production that have established formal retention schedules (left) and drafted schedules pending implementation (right). Estimates reflect FCI engagement observation. The gap between scheduled and unscheduled is the unmanaged volume the audit cycle will eventually examine.
FCI Advisory observation across federal AI / records engagements, FY26-Q1

Federal agencies further along the AI deployment curve have started addressing the schedule design problem, but the pattern is uneven. Most have addressed final outputs and formal communications. Few have scheduled decision logs systematically. Almost none have scheduled intermediate reasoning traces or prompt-response pairs. The schedules that exist are inconsistent across agencies, which means similar artifacts at different agencies are being treated as "schedule indefinitely" at one and "dispose at runtime" at another. Federal records governance does not tolerate this kind of inconsistency for long.

What good schedule design actually requires

Three operational decisions are load-bearing in scheduling AI-generated records, and most agencies have not made any of them deliberately.

The first is the records-eligibility classification framework itself. Which categories of AI output are records, which are transitory, which are non-record administrative artifacts. This decision needs to be made against the federal records definition, not against operational convenience. A category labeled "non-record" because the agency does not want to retain it does not become non-record simply by being labeled.

The second is the retention destination. AI artifacts live in vendor platforms, model APIs, observability tooling, and agent-platform logs by default. Federal records have to live in records-managed environments — typically the agency's Documentum or equivalent ECM, with retention schedules attached. The pipeline that moves AI artifacts from operational platforms into records-managed environments needs to exist before the artifacts accumulate, not after.

The third is the schedule horizons themselves. Some AI-generated content qualifies for short retention under existing schedules (operational logs, system events). Some qualifies for longer retention as documentation of federal decisions (decision logs, final outputs). Some may qualify for permanent retention under NARA's Capstone framework if it documents senior officials' decision-making.[2] The horizons need to be determined deliberately, agency by agency, with NARA-aligned guidance.

Chart 04 · The records-eligibility decision

Three questions determine how a federal records officer should treat any AI-generated artifact.

Simplified, but more discipline than the current state of most agency AI classification.

AI-generated artifact CLASSIFY START Q1. Does it document a decision or transaction? NO YES OUTCOME · A Administrative Standard log retention. No special schedule. Q2. Is the artifact persistent beyond runtime? NO YES OUTCOME · B Transitory Dispose at runtime end. No retention required. OUTCOME · C Federal record Schedule per NARA-aligned retention policy.
Simplified, deliberatelyReal federal records classification involves more variables — legal hold status, FOIA implications, executive-level Capstone considerations, agency-specific schedules. The three questions above are the start, not the entirety. But they are more discipline than most current AI deployments apply.
A simplified decision framework for classifying AI-generated artifacts against the federal records definition. Real records officers run a more granular version of this; the simplification is intended to make the three load-bearing questions visible at decision time.
FCI Advisory framework, derived from federal records modernization engagements

The decision framework above is a simplified version of what records officers actually run when classifying new content categories. The simplification matters: it is more discipline than the current state of federal AI artifact classification, which is mostly absent.

The audit cycle is two years out

Federal records governance ultimately runs through audit. NARA's inspector general framework, GAO records audits, and the agency's own internal records review cycles will eventually examine how federal AI artifacts are being handled. The federal AI workloads that came online in 2024 and 2025 are approaching the point where their first full audit cycle becomes operational — typically two to three years post-deployment.

When that audit cycle arrives, agencies will be asked to produce: the records-eligibility classification framework, the retention schedules, the disposition records, the evidence of preservation, and the audit trail showing how AI-generated content was governed. Agencies that designed this framework before deployment will have answers. Agencies that did not will be trying to retroactively classify and schedule three years of accumulated content while the auditor waits. The retroactive path is dramatically more expensive than the upfront path, and it carries operational risk in the form of records that may have to be reconstructed or that have already been lost.

What this rules in and out

Four strategic conditions reshape what federal records officers and AI program leads should be coordinating now:

The decision

Federal AI programs are creating records at compute tempo. The federal records framework is operating against that input without having been redesigned for it, and the audit cycle that examines the gap is two years out. The decision for federal records officers and AI program leads is whether to design the records framework now, before the audit cycle arrives, or to discover the gap in retrospect when an inspector asks for the schedule that does not exist. The cost shape is the same either way. The timing changes whether the schedule is built deliberately or reconstructed under pressure.[5]