When Agentic AI Meets Legacy ERP: The Integration Layer Nobody's Writing About

The federal AI deployments getting public attention sit on a small surface area — a chatbot in a customer-facing portal, a policy-search tool layered on a published corpus, a document-summarization workflow against a curated archive. The systems underneath are mostly new and mostly designed to work with each other. The visible work in federal AI is happening on the easy edge.

The interesting deployments are not on that edge. They are the agentic systems being built inside federal HR, finance, logistics, and case-management workflows — systems where the agent reasons across 8–15 underlying applications, half of which were built before iPaaS existed and most of which carry a decade or more of agency-specific customizations. The middleware engineering required to make this work is harder than anything happening at the model layer. It is also, by an order of magnitude, where the federal AI program risk actually lives.

What an agent actually has to touch

A federal AI agent that handles a single end-to-end workflow — a position-management action, a grievance triage, a benefits adjudication, a contract modification — does not operate against one system. It operates against a constellation.

Chart 01 · The agent reasoning surface

A single federal workflow touches 8–15 systems. None were built to be reasoned across at runtime.

The agent has to integrate the constellation at runtime — a class of problem the underlying systems were never engineered to support.

What changes with the agentThe constellation already exists — agency employees navigate it manually every day. What changes is that the navigation has to be machine-traversable in real time, with transactional consistency, audit coverage, and recoverable failure across systems that don't share a transaction model. None of this is a model capability.

Representative reasoning surface for a single federal end-to-end workflow (position management, grievance triage, benefits adjudication, contract modification). System count and category coverage are typical of mid-to-large federal civilian and defense agencies; specific architectures vary.

FCI Advisory framework, derived from federal middleware engagement observation

The reasoning surface for a single federal workflow is not exotic. It is the same constellation that an experienced agency employee navigates manually every day. What changes when an AI agent is inserted into that workflow is that the navigation has to be machine-traversable in real time, the agent has to maintain transactional consistency across systems that do not share a transaction model, the audit trail has to span every system the agent touched, and the failure modes have to be recoverable without losing operational state. None of this is a model capability. All of it is middleware capability.

The systems the agent touches were not built to be touched by an agent. They were built, in sequence, over twenty or thirty years, to be touched by human users following procedures and by point-to-point integrations carrying specific transactions between specific systems. The agent is now expected to integrate the constellation at runtime — a class of integration problem the underlying systems were not engineered to support.

Integration complexity by system age

Federal systems do not all carry the same integration complexity. The complexity tracks the age of the system, the depth of customization, and how the system was originally integrated to its neighbors.

Chart 02 · Integration complexity by system age

Complexity tracks system age, customization depth, and original integration pattern.

The pre-2005 band has no API surface an agent can reason against. Most federal agencies have critical workloads in this band.

Built 2015–2025

2/10

Complexity score

Modern federal systems

Exposed APIs, documented schemas, built with the expectation that other systems would talk to them. Integration work is real but bounded.

IntegrationREST / GraphQL

Agent readyYes, with adapters

Share of federal estate~22%

Built 2005–2015

6/10

Complexity score

Service-era ERP and case management

Partial service interfaces, schemas drifted from original documentation, customizations created undocumented behavior at the edges. Adapter layer requires reverse-engineering before agentic workloads can land.

IntegrationSOAP / partial REST

Agent readyWith abstraction work

Share of federal estate~46%

Pre-2005

9/10

Complexity score

Batch-and-screen generation

Built for batch processing and screen-based human interaction. No API surface an agent can reason against directly. Federal agencies operating critical workloads on this generation must build a middleware abstraction layer or replace the system.

IntegrationFile / extract / screen-scrape

Agent readyOnly via abstraction

Share of federal estate~32%

The binding constraintNearly a third of the federal system estate sits in the pre-2005 band. Agencies with critical workflows running on these systems face a choice when AI workloads arrive: build a middleware abstraction layer, or replace the system. Most are doing the first because the second is a multi-year program of its own.

Three federal system age cohorts with integration complexity score, typical integration patterns, agent-readiness assessment, and approximate share of the federal estate. Share figures reflect FCI's engagement observation; the directional shape is consistent across federal civilian and defense.

FCI Advisory framework, derived from federal integration engagement observation

The newest federal systems — those deployed in the last five years on modern architectural patterns — carry relatively low integration complexity. They expose APIs, they document their schemas, and they were built with the expectation that other systems would talk to them. The integration work is real but bounded.

The middle band — systems deployed roughly 2005–2015 — is harder. These systems often expose service interfaces but the interfaces are partial, the schemas have drifted from their original documentation, and the customizations have created undocumented behavior at the edges. Federal agencies running ERP at this generation typically maintain a layer of integration adapters that has to be carefully reverse-engineered before agentic workloads can reason across the system.

The oldest band — systems still in production from before 2005 — is the hardest. These systems were built for batch processing and screen-based human interaction. Their integration patterns are file-drop, scheduled extract, or screen-scrape. They have no API surface that an agent can reason against directly. Federal agencies operating critical workloads on this generation of system have two choices when AI workloads arrive: build a middleware abstraction layer that gives the agent an API where the source system does not, or replace the system. Most are doing the first because the second is a multi-year program in its own right.

Where federal AI pilots actually stall

Federal AI pilots fail at predictable points. The model layer is not, in most cases, the failure point. Integration is.

Chart 03 · Where federal AI pilots stall

Pilots cluster at the layers where the integration work is hardest.

The model accounts for a small minority of stalls. Integration and governance together account for the majority.

Integration Legacy data access Pre-2005 systems, no API surface

Integration Cross-system consistency No shared transaction model

Governance Identity & authorization Federated identity, runtime authz

Deploy Production handover Pilot-to-prod gap

Governance Audit & records spanning Cross-system audit trail

Model Model performance Benchmark, suitability

Other Procurement & scope Contracting, scope drift, other

Stall structureIntegration accounts for ~49% of stalls. Governance accounts for ~27%. Deploy handover accounts for ~13%. The model layer accounts for ~11%. The bar lengths show where the difficulty actually lives — and where vendor evaluation criteria typically don't.

Distribution of federal AI pilot stall causes across seven failure categories, FY24-Q4 through FY26-Q1. Sorted high to low. Percentages reflect FCI's engagement observation; the directional shape is consistent across federal sectors and AI use cases.

FCI Advisory analysis of federal AI pilot outcomes

Pilots that stall do not stall evenly across the stack. They cluster at the stages where the integration work is hardest — data access from legacy systems, transactional consistency across multiple systems of record, identity and authorization plumbing across federated environments, and the production handover where the pilot has to operate against the real workflow rather than a sanitized test case. The model layer accounts for a small minority of pilot stalls in the agencies FCI has observed. The integration and governance layers together account for the majority.

The pattern is consistent across federal sectors, agency sizes, and AI use cases. Programs that fund the integration work alongside model selection have substantially higher ship rates than programs that fund the model and treat integration as a downstream concern. This is not surprising once stated. It is also not what most federal AI procurement scoping reflects today.

"The model layer accounts for a small minority of pilot stalls. The integration layer is where the federal AI program risk actually lives. Programs scoped to the model and not the middleware are scoping to the easy part."

Pre-agentic vs agentic integration architecture

The integration patterns that worked in the pre-agentic federal environment do not survive contact with agentic workloads. The architectures have to change shape.

Chart 04 · Architecture comparison

The integration architecture changes shape, not just scale.

Agentic workloads do not fit the pipe model. The middleware has to support a runtime reasoning mesh, not scheduled transactions.

Capability

Pre-agentic · pipe model

Agentic · reasoning mesh

Integration shape

Point-to-point or hub-and-spoke. Specific transactions on a known schedule via known adapters.

Runtime mesh. Agent decides at query time which systems to read from and write to within a single workflow.

Data access

Scheduled extracts. Batch files. Defined ETL windows.

Arbitrary read at runtime. The agent issues queries the integration layer never saw before.

Authorization

Pre-provisioned trust between systems. Static service accounts.

Runtime authorization decisions. Identity, scope, and consent evaluated per-action by the agent.

Transactions

Per-pipe semantics. Each integration manages its own consistency.

Cross-system rollback. The agent acts across systems that don't share a transaction manager; failure has to be recoverable.

Audit

Per-pipe logs. Reconciliation reports. End-of-period audits.

Workflow-spanning trace. Audit trail covers every system the agent touched in a single decision — at decision time, not after.

Failure mode

Pipe stops; queues back up; operations notified.

Partial-completion recovery. Agent must resume from a known consistent state across multiple systems of record.

The remediation curveAgencies furthest along on agentic deployment have modernized their middleware layer first or in parallel. Agencies that have not are discovering, at deployment time, that the integration layer they have cannot absorb the workload. The remediation is a middleware modernization program with an AI deadline attached — a worse version of the same program done a year earlier with no deadline pressure.

Six capability dimensions of the federal middleware layer, compared across the pre-agentic pipe model and the agentic reasoning mesh. Both are ideal-typical patterns; real architectures sit on a continuum between them and most federal agencies operate hybrid postures.

FCI Advisory framework, derived from federal middleware engagement observation

Pre-agentic federal integration was largely point-to-point or hub-and-spoke. A specific transaction moved from System A to System B on a known schedule, through a known adapter, with a known schema. The integration map was a collection of pipes, each carrying defined traffic. Federal middleware practice had decades to optimize this pattern; the iPaaS modernization wave of the late 2010s consolidated many of those pipes onto common platforms without fundamentally changing the shape.

Agentic workloads do not fit the pipe model. An agent does not move a defined transaction on a schedule. It reasons across the system landscape in response to a runtime query, decides which systems to read from and write to, and operates against multiple systems within a single workflow. The integration layer has to support arbitrary read access (not just scheduled extracts), runtime authorization decisions (not just pre-provisioned trust), transactional rollback across systems that don't share a transaction manager, and audit logging that spans every system the agent touched in a single decision.

The federal agencies furthest along on agentic deployment have, almost without exception, modernized their middleware layer first or in parallel. The agencies that have not are discovering, at deployment time, that the integration layer they have cannot absorb the workload. The remediation is a middleware modernization program with an AI deadline attached to it — which is a worse version of the same modernization program done a year earlier with no deadline pressure.

What this rules in and out

Four conditions reshape what federal program leadership should be doing in the current cycle:

The middleware layer is where federal agentic AI succeeds or fails. Foundation-model performance is becoming a baseline; the differentiator is integration depth. Programs that scope and fund a middleware modernization concurrently with AI deployment ship; programs that treat middleware as a downstream concern stall. The decision to fund the middleware work belongs in the AI program plan, not in a separate IT modernization track.
Legacy ERP is the load-bearing constraint. Federal HR, finance, and case-management ERP systems carry the agency's most consequential workflows. They are also the systems with the deepest customization and the oldest integration patterns. Agencies trying to deploy agentic AI without a deliberate strategy for the legacy ERP integration layer are scoping past the constraint that will actually bind the program.
Stall analysis should drive vendor evaluation. Federal AI procurements that score vendors primarily on model benchmarks are scoring on the layer where pilots rarely fail. Procurements that score on integration depth, federal-specific connector inventory, FedRAMP boundary handling, and legacy-system adapter patterns are scoring on the layer where pilots actually fail. The evaluation criteria should match where the risk lives.
The remediation curve is non-linear. The middleware modernization work an agency does in the 18 months before an AI deployment is fundamentally easier than the same work done in the 18 months after the deployment has stalled. The cost difference is not 2x — it is closer to 4x or 5x in the agencies FCI has observed, because post-stall remediation happens under deadline pressure, with active users, and with accumulated technical debt that pre-deployment work would not have created.

The decision

Federal agentic AI is being built in the layer of the federal technology stack that gets the least public attention. The model is real but commodity; the integration layer is where the actual engineering decides whether deployments ship. The decision for federal technology leadership is not whether to deploy agentic AI inside HR, finance, or case-management workflows — the procurements are in market and the deployments are happening. The decision is whether the integration and middleware layer underneath is being scoped, funded, and governed as a first-class engineering concern, or whether the program will discover the gap at deployment time and pay for it twice. The agencies treating middleware as a downstream concern are choosing the second path. The agencies treating middleware as the primary scope are the ones whose pilots are shipping.⁵

When Agentic AI Meets Legacy ERP: The Integration Layer Nobody's Writing About

What an agent actually has to touch

A single federal workflow touches 8–15 systems. None were built to be reasoned across at runtime.

Integration complexity by system age

Complexity tracks system age, customization depth, and original integration pattern.

Where federal AI pilots actually stall

Pilots cluster at the layers where the integration work is hardest.

Pre-agentic vs agentic integration architecture

The integration architecture changes shape, not just scale.

What this rules in and out

The decision

Keep reading.

The Documentum Question Federal CIOs Aren't Asking but Should Be

Federal AI Is Being Built on Middleware Most People Haven't Heard Of

The Federal Documentum Tool That Decides Whether AI Hallucinates

Put this thinking to work.