The federal AI deployments getting public attention sit on a small surface area — a chatbot in a customer-facing portal, a policy-search tool layered on a published corpus, a document-summarization workflow against a curated archive. The systems underneath are mostly new and mostly designed to work with each other. The visible work in federal AI is happening on the easy edge.
The interesting deployments are not on that edge. They are the agentic systems being built inside federal HR, finance, logistics, and case-management workflows — systems where the agent reasons across 8–15 underlying applications, half of which were built before iPaaS existed and most of which carry a decade or more of agency-specific customizations. The middleware engineering required to make this work is harder than anything happening at the model layer. It is also, by an order of magnitude, where the federal AI program risk actually lives.
What an agent actually has to touch
A federal AI agent that handles a single end-to-end workflow — a position-management action, a grievance triage, a benefits adjudication, a contract modification — does not operate against one system. It operates against a constellation.
A single federal workflow touches 8–15 systems. None were built to be reasoned across at runtime.
The agent has to integrate the constellation at runtime — a class of problem the underlying systems were never engineered to support.
The reasoning surface for a single federal workflow is not exotic. It is the same constellation that an experienced agency employee navigates manually every day. What changes when an AI agent is inserted into that workflow is that the navigation has to be machine-traversable in real time, the agent has to maintain transactional consistency across systems that do not share a transaction model, the audit trail has to span every system the agent touched, and the failure modes have to be recoverable without losing operational state. None of this is a model capability. All of it is middleware capability.
The systems the agent touches were not built to be touched by an agent. They were built, in sequence, over twenty or thirty years, to be touched by human users following procedures and by point-to-point integrations carrying specific transactions between specific systems. The agent is now expected to integrate the constellation at runtime — a class of integration problem the underlying systems were not engineered to support.
Integration complexity by system age
Federal systems do not all carry the same integration complexity. The complexity tracks the age of the system, the depth of customization, and how the system was originally integrated to its neighbors.
Complexity tracks system age, customization depth, and original integration pattern.
The pre-2005 band has no API surface an agent can reason against. Most federal agencies have critical workloads in this band.
The newest federal systems — those deployed in the last five years on modern architectural patterns — carry relatively low integration complexity. They expose APIs, they document their schemas, and they were built with the expectation that other systems would talk to them. The integration work is real but bounded.
The middle band — systems deployed roughly 2005–2015 — is harder. These systems often expose service interfaces but the interfaces are partial, the schemas have drifted from their original documentation, and the customizations have created undocumented behavior at the edges. Federal agencies running ERP at this generation typically maintain a layer of integration adapters that has to be carefully reverse-engineered before agentic workloads can reason across the system.
The oldest band — systems still in production from before 2005 — is the hardest. These systems were built for batch processing and screen-based human interaction. Their integration patterns are file-drop, scheduled extract, or screen-scrape. They have no API surface that an agent can reason against directly. Federal agencies operating critical workloads on this generation of system have two choices when AI workloads arrive: build a middleware abstraction layer that gives the agent an API where the source system does not, or replace the system. Most are doing the first because the second is a multi-year program in its own right.
Where federal AI pilots actually stall
Federal AI pilots fail at predictable points. The model layer is not, in most cases, the failure point. Integration is.
Pilots cluster at the layers where the integration work is hardest.
The model accounts for a small minority of stalls. Integration and governance together account for the majority.
Pilots that stall do not stall evenly across the stack. They cluster at the stages where the integration work is hardest — data access from legacy systems, transactional consistency across multiple systems of record, identity and authorization plumbing across federated environments, and the production handover where the pilot has to operate against the real workflow rather than a sanitized test case. The model layer accounts for a small minority of pilot stalls in the agencies FCI has observed. The integration and governance layers together account for the majority.
The pattern is consistent across federal sectors, agency sizes, and AI use cases. Programs that fund the integration work alongside model selection have substantially higher ship rates than programs that fund the model and treat integration as a downstream concern. This is not surprising once stated. It is also not what most federal AI procurement scoping reflects today.
Pre-agentic vs agentic integration architecture
The integration patterns that worked in the pre-agentic federal environment do not survive contact with agentic workloads. The architectures have to change shape.
The integration architecture changes shape, not just scale.
Agentic workloads do not fit the pipe model. The middleware has to support a runtime reasoning mesh, not scheduled transactions.
Pre-agentic federal integration was largely point-to-point or hub-and-spoke. A specific transaction moved from System A to System B on a known schedule, through a known adapter, with a known schema. The integration map was a collection of pipes, each carrying defined traffic. Federal middleware practice had decades to optimize this pattern; the iPaaS modernization wave of the late 2010s consolidated many of those pipes onto common platforms without fundamentally changing the shape.
Agentic workloads do not fit the pipe model. An agent does not move a defined transaction on a schedule. It reasons across the system landscape in response to a runtime query, decides which systems to read from and write to, and operates against multiple systems within a single workflow. The integration layer has to support arbitrary read access (not just scheduled extracts), runtime authorization decisions (not just pre-provisioned trust), transactional rollback across systems that don't share a transaction manager, and audit logging that spans every system the agent touched in a single decision.
The federal agencies furthest along on agentic deployment have, almost without exception, modernized their middleware layer first or in parallel. The agencies that have not are discovering, at deployment time, that the integration layer they have cannot absorb the workload. The remediation is a middleware modernization program with an AI deadline attached to it — which is a worse version of the same modernization program done a year earlier with no deadline pressure.
What this rules in and out
Four conditions reshape what federal program leadership should be doing in the current cycle:
- The middleware layer is where federal agentic AI succeeds or fails. Foundation-model performance is becoming a baseline; the differentiator is integration depth. Programs that scope and fund a middleware modernization concurrently with AI deployment ship; programs that treat middleware as a downstream concern stall. The decision to fund the middleware work belongs in the AI program plan, not in a separate IT modernization track.
- Legacy ERP is the load-bearing constraint. Federal HR, finance, and case-management ERP systems carry the agency's most consequential workflows. They are also the systems with the deepest customization and the oldest integration patterns. Agencies trying to deploy agentic AI without a deliberate strategy for the legacy ERP integration layer are scoping past the constraint that will actually bind the program.
- Stall analysis should drive vendor evaluation. Federal AI procurements that score vendors primarily on model benchmarks are scoring on the layer where pilots rarely fail. Procurements that score on integration depth, federal-specific connector inventory, FedRAMP boundary handling, and legacy-system adapter patterns are scoring on the layer where pilots actually fail. The evaluation criteria should match where the risk lives.
- The remediation curve is non-linear. The middleware modernization work an agency does in the 18 months before an AI deployment is fundamentally easier than the same work done in the 18 months after the deployment has stalled. The cost difference is not 2x — it is closer to 4x or 5x in the agencies FCI has observed, because post-stall remediation happens under deadline pressure, with active users, and with accumulated technical debt that pre-deployment work would not have created.
The decision
Federal agentic AI is being built in the layer of the federal technology stack that gets the least public attention. The model is real but commodity; the integration layer is where the actual engineering decides whether deployments ship. The decision for federal technology leadership is not whether to deploy agentic AI inside HR, finance, or case-management workflows — the procurements are in market and the deployments are happening. The decision is whether the integration and middleware layer underneath is being scoped, funded, and governed as a first-class engineering concern, or whether the program will discover the gap at deployment time and pay for it twice. The agencies treating middleware as a downstream concern are choosing the second path. The agencies treating middleware as the primary scope are the ones whose pilots are shipping.5
GS


