Federal AI hallucinations make news. The technology press treats them as a model problem — a failure mode of foundation models that better models will eventually solve. Inside federal AI deployments, the pattern looks different. Hallucinations are not happening because the model is wrong about the world. They are happening because the model is reading bad data and summarizing it confidently. The federal records the model retrieves are dirty, contradictory, weakly classified, and stale — and no model upgrade will fix that. The fix is data quality, and the tool most federal Documentum operators already own for surfacing it is sitting underused. That choice is now the difference between AI pilots that work and AI pilots that quietly stop.

What the model is actually reading

Federal AI pilots that use retrieval-augmented generation patterns are reading from federal Documentum repositories. Those repositories have been accumulating content since the 2000s, with each successive generation of operators contributing to a content estate that has never been comprehensively audited for quality. The content the model retrieves on any given query may include the current version of a record, an outdated draft, a near-duplicate with different metadata, a misclassified document filed under the wrong type, and a record whose retention metadata says it should have been disposed three years ago. The model has no way to distinguish among these. It returns a confident answer, citing the records as sources. The answer is wrong because the source set is wrong.

This is not a model failure. It is a data quality failure that the model is exposing for the first time. The records have been bad for years. The manual processes that previously consumed them tolerated the badness because human readers applied judgment, ignored stale versions, mentally classified things the system had filed wrong. The AI applies no such judgment. It reads what is there.

Chart 01 · Federal records data quality by age

The records federal AI workloads retrieve from were accumulating quality debt long before AI arrived.

Older records carry the accumulated decisions of operators who left a decade ago. AI retrieves from all of them indiscriminately.

100% 75% 50% 25% AI RETRIEVAL THRESHOLD · 80% 38% 52% 64% 73% 82% Pre-2005 2005 – 2009 2010 – 2014 2015 – 2019 2020+
Where most records sitThe two oldest cohorts together hold over half the federal estate and sit well below the data quality threshold federal AI retrieval requires. AI pilots running against those cohorts hallucinate not because the model failed but because the retrieved content was unfit for retrieval.
Composite quality score combining metadata completeness, classification accuracy, version consistency, and lineage integrity. Scores reflect a typical federal Documentum environment at the cohort midpoint. The 80% line is the directional threshold below which AI retrieval accuracy degrades sharply.
FCI Advisory observation across federal Documentum data-quality audits, FY24-Q4 through FY26-Q1

The data quality of federal records is heavily correlated with their age. Records created in the past five years sit inside content management environments with better classification discipline, cleaner type hierarchies, and more consistent metadata. Records from the 2000s carry the accumulated decisions of operators who left a decade ago, classification schemes that have since been deprecated, and type-based content models that nobody has rationalized in the intervening years. The federal AI workloads do not get to choose which records they retrieve from. They retrieve from everything.

Data quality has been a federal problem before AI made it expensive

The federal records estate has known about its data quality problems for years. Cleanup has been deferred not because the problems were invisible but because the cost of cleanup was visible and the cost of dirty data was not. Every modernization cycle since 2010 has included some line item for data quality or metadata enrichment — and most have been narrowed or eliminated under budget pressure, with the agency reasoning that records were "good enough" for current workflows.

This reasoning was correct for the workflows it was applied to. Manual case adjudication, FOIA processing with human review, contract management with human-in-the-loop oversight — all of these absorb data quality issues that an automated system could not. The AI workloads now arriving do not have human-in-the-loop tolerance. The cost of dirty data has shifted from invisible to visible, and the bill is coming due on records that have been quietly degrading for two decades.

"Federal AI hallucination is not a model failure. It is a data quality failure that the model is exposing. The records have been bad for years; the AI just stopped tolerating them."

What DqMan actually surfaces

DqMan — Documentum Query Manager — is part of the standard Documentum administrative toolkit.[1] It is not new. It is not vendor-marketed for AI. It has been in the platform for over a decade. What it does is allow Documentum administrators to run sophisticated queries against the content estate, surfacing specific data quality conditions: incomplete metadata, duplicate or near-duplicate records, misclassified content, broken type hierarchies, stale retention assignments, orphaned versions, and inconsistent classification across related records. The tool is not subtle; the data is.

Chart 02 · Hallucination rate by source data quality

AI hallucination is a function of the data the model retrieves, not the model itself.

Same model. Same prompts. The variable is source quality.

Below 50%
quality score
58%
50 – 60%
quality score
38%
60 – 70%
quality score
22%
70 – 80%
quality score
12%
Above 80%
quality score
4%
The model is not the variableHallucination rate scales inversely with source data quality across every model tested. The "model upgrade" path moves a vendor's score from, say, 22% hallucination to 19%. The data quality path moves the same workload from 22% to 4%. The leverage is not where procurement is looking.
Hallucination rate measured across federal RAG deployments running comparable foundation models on content corpora at different data quality scores. "Hallucination" defined as the model producing an answer that contradicts or fabricates against the underlying retrieved record set.
FCI Advisory observation across federal AI pilot deployments, FY25-Q3 through FY26-Q1

The dimensions DqMan surfaces map almost exactly to the dimensions federal AI workloads care about — completeness, consistency, classification accuracy, lineage integrity, duplication, and freshness. The match is not coincidental. The data quality dimensions that matter for human readers are the same ones that matter for machine readers, just with different tolerance. Manual processes tolerated 30% metadata completeness; AI retrieval requires 80%. Manual processes tolerated misclassification rates of 15–20%; agentic retrieval requires under 5%. The tool has always been able to surface the gap. The work of running the audits has not been done.

Why federal operators underuse the tool

Three operational realities have kept DqMan on the shelf across most federal Documentum environments.

The first is staffing. Federal Documentum administrators were trained primarily on operational tasks — keeping the environment running, supporting end users, enforcing retention disposition, managing custom-code deployments. Systematic data quality auditing is a separate skill. It requires DQL fluency, pattern recognition across large content estates, and judgment about which findings matter operationally. Few federal operators were hired against that skill, and few have been developed into it.

The second is incentives. A clean data quality report is invisible — nothing breaks, nothing improves visibly. A dirty data quality report surfaces work that someone has to fund. Federal program incentives have favored leaving the audit unrun. The cost of running the audit is small; the cost of acting on it is large; the budget owner of the second number is rarely the same person who would benefit from the first.

The third is output legibility. DqMan surfaces findings as DQL result sets, query traces, and metadata anomaly listings. Translating those into "your AI pilot will hallucinate on this content corpus" requires interpretation that the tool does not provide on its own. Many agencies have run partial audits, gotten outputs they could not act on, and put the tool back on the shelf.

Chart 03 · What DqMan surfaces vs what AI needs

The dimensions DqMan can audit are the same dimensions federal AI workloads require.

The gap is wider than most operators realize before running the audit. The audit is the prerequisite.

Completeness Consistency Classification Duplication Lineage Freshness AI WORKLOAD REQUIRES TYPICAL FEDERAL ENV
The shape of the gapSix dimensions, six different shortfalls. The largest gaps tend to be in Classification (where misfiled records pollute retrieval) and Lineage (where the model can't trace a record's provenance). Those are also the dimensions DqMan is best suited to audit.
Composite radar showing typical federal Documentum environment data quality across six dimensions, compared with the requirements of federal AI retrieval workloads. Scores are illustrative aggregates representing the directional shape of the gap; specific environments vary.
FCI Advisory framework, derived from federal Documentum data-quality engagements

The capability gap in the typical federal Documentum environment, against what AI workloads actually require, is wider than most operators realize until they run the audit. The audit is the prerequisite — running it converts a vague worry about AI accuracy into a concrete punch list of remediations.

What changes when the tool gets used

Across federal agencies that have run systematic DqMan-enabled data quality audits before deploying AI workloads, the operational pattern is consistent. Retrieval precision improves from 40–50% ranges into the 75–85% range. AI hallucination rates drop from 25–35% to under 10%. Time-to-answer in agentic workflows compresses by an order of magnitude as the model stops chasing contradictory sources. Audit defensibility — the agency's ability to defend AI-driven decisions to an inspector — moves from partial to full.

Chart 04 · Before and after systematic data quality work

Same model, same agency, same pilot. The variable that moved was the source data.

Four federal AI program metrics measured before and after DqMan-enabled data quality remediation.

Retrieval precision
BEFORE
41%
AFTER
79%
Hallucination rate
BEFORE
31%
AFTER
6%
Time-to-answer (relative)
BEFORE
8.4s
AFTER
1.2s
Audit defensibility
BEFORE
Partial
AFTER
Full
The intervention moves four metrics at onceThe data quality work doesn't pick one metric and improve it. It improves retrieval, hallucination, latency, and defensibility together — because all four are downstream of the same source.
Paired before/after measurement on federal AI pilots after systematic data-quality remediation on the underlying Documentum content corpus. Each pair represents the same agency, same model, same prompts — the data quality variable was the only change.
FCI Advisory observation across federal data-quality-led AI deployments, FY25-Q2 through FY26-Q1

The pattern is not specifically about DqMan. It is about systematic data quality work that DqMan happens to enable in federal Documentum environments. The same pattern shows up in non-Documentum federal AI deployments where comparable data quality tooling is used. The lesson holds across tools: the AI accuracy problem is mostly a data quality problem in disguise, and the data quality problem is solvable with tooling that federal agencies in many cases already own.

What this rules in and out

Four strategic conditions reshape how federal CIOs should be thinking about AI pilot readiness:

The decision

Federal agencies running AI pilots that hallucinate have a tooling question and a staffing question to ask before they have a model question. The tooling exists, often inside the agency's existing Documentum environment. The skills are scarce but not exotic. The decision for federal CIOs is whether to fund a data quality audit and the remediation it surfaces before the next AI pilot, or to discover the same findings the hard way after the pilot has failed to ship. The cost shape is the same either way. The timing changes who pays.[4]