The Witnesses Turn State’s Evidence — Luminity Digital
The Provenance Gap · Post 3 of 3 · May 2026

The Witnesses Turn State’s Evidence

Posts 1 and 2 named the structural defect and the substrate that would close it. Post 3 walks the regulatory frame from the inside. Two regulatory stewards and one federal funder counting on the stewards. Each reaches the same gap from a different direction. None of them can close it on the current substrate.

May 2026 Tom M. Gomez Luminity Digital 15 Min Read
Post 1 named the structural defect: the question healthcare’s safety culture wants to ask of an agentic action — what was the intent — cannot be load-bearing when the system that produced the action also produces the answer. Post 2 named the substrate that would close the gap: four surfaces — Source, Action, Context, Scope (SACS) — at which alignment-grade provenance has to live. The standard contemplated it. The deployment did not populate it. Agents are widening a gap that was already there. This post closes the series. Not by proposing a substrate to be built — that work is downstream of this argument. Post 3 walks the regulatory frame from the inside. The field is counting on three institutional actors to make agentic AI in healthcare safely deployable — two regulatory stewards and one federal funder counting on the stewards. Each operates in a different capacity. Each reaches the same gap from a different direction. None of them can close it without the substrate. The witnesses the field is calling on are about to turn state’s evidence — not because they are hostile, but because they are honest. Asked the question of whether they can authorize agentic clinical AI on the current substrate, they have to answer no. That answer is in the documents. This post reads the documents. Post 03 of three.

The frameworks the field is counting on

The field is counting on three institutional actors to make agentic AI in healthcare safely deployable. Each operates in a different capacity.

HIPAA — the existing statutory privacy regime — governs the confidentiality of protected health information. It was enacted in 1996 and has been the foundation of patient privacy enforcement for thirty years.

SaMD — the FDA’s framework for Software as a Medical Device — governs the regulatory authorization of clinical software. It is the framework most directly built for clinical software, and the one most directly invoked when a vendor brings an AI tool to market.

ARPA-H — a U.S. federal funding agency within the Department of Health and Human Services supporting high-risk, high-reward biomedical and health research — is currently funding ADVOCATE, an ambitious supervisory-agent experiment for clinical AI validation. The program requires its grantees to seek FDA qualification for the tools they build. ARPA-H is not a regulator. It is the funder of a regulatory experiment by proxy.

Two Stewards, One Funder, Three Capacities

HIPAA — statutory privacy regime. Governs confidentiality of PHI.

SaMD — FDA framework for clinical software authorization. Governs whether software can be marketed for clinical use.

ARPA-H ADVOCATE — federal funder of a regulatory experiment. Funds grantees building toward FDA qualification of supervisory clinical AI.

Two stewards. One funder. Three different capacities. Each reaches the substrate gap from a different direction. The argument that follows takes each in turn — briefly for HIPAA, at length for SaMD, in detail for ARPA-H ADVOCATE — and shows where each one’s structural assumptions break against agentic deployment built on a substrate that cannot attest at SACS-level.

HIPAA: the framework that was never built for this

HIPAA is the easy witness. The gap is obvious once named.

HIPAA was designed for the privacy of protected health information at rest and in transit. Its enforcement model assumes a human actor whose intent and authorization can be subpoenaed, deposed, and held to account through professional and institutional process. The Privacy Rule and the Security Rule together govern who may access PHI, under what circumstances, with what safeguards. The framework has been the load-bearing patient-privacy regime in U.S. healthcare for three decades, and it does that job well.

It does not do the job Post 1 and Post 2 named.

For agentic action, HIPAA can detect confidentiality breaches — unauthorized access, improper disclosure, lost devices, breached databases. It cannot detect SACS-level provenance failures. It cannot attest whether a tool response the agent acted on came from where the agent claimed it came from. It cannot attest whether the agent’s emitted action was bound to authenticated source data. It cannot attest whether content of unverified provenance entered the agent’s working context. It cannot attest whether the action stayed inside the credential’s sanctioned envelope. HIPAA was not asked to do these things. It was not designed to do them. The framework’s silence on agentic provenance is not a defect of the framework — it is a fact about what the framework was built to govern.

The reason this matters for Post 3 is the institutional posture it creates. HIPAA’s enforcement apparatus — the Office for Civil Rights, the audit and breach notification regime, the civil monetary penalties — gives the field a reasonable confidence that something is watching. That confidence is well-placed for the questions HIPAA asks. It is misplaced for the questions agentic deployment raises. The framework that the field has the strongest enforcement intuitions about does not have a structural answer for the gap this series is mapping. It cannot. The witness has to be called for something else.

SaMD: the framework that was built for clinical software, and still cannot close this gap

SaMD is the harder witness. The Software as a Medical Device framework is the FDA’s framework most directly built for clinical software. It is the framework most often invoked when a clinical AI tool is brought to market. It is the framework that, structurally, has the best claim to authorize agentic clinical AI.

It still cannot close this gap. And the field is now openly acknowledging that fact.

The SaMD framework, as it currently operates, rests on a structural assumption: that the software being authorized has an intended use stable enough to be validated. Validation under SaMD asks whether the software, in its specified operating envelope, performs its intended function safely and effectively. The framework grew out of a world where clinical decision-support tools were largely static — rule engines, lookup tables, deterministic algorithms with versioned outputs. For that world, intended use and validated function were nearly identical, and the gating question — is this software safe and effective for its intended use? — could be answered through pre-market clinical evaluation.

Agentic AI in clinical scope does not satisfy the structural assumption underneath SaMD. Its intended use is not stable. Its behavior across a session is conditioned on retrieved context that the framework does not see, on tool calls the framework does not attest, on reasoning traces the framework cannot independently verify. The agent’s action emerges from the interaction of model, context, and tool surface — and the substrate underneath each of those interactions is, as Posts 1 and 2 established, not alignment-grade in any current deployment.

The field has begun to name this gap. A recent npj Digital Medicine paper introducing the Unified Nomenclature for Digital Clinical Software (UNDCS) taxonomy makes the structural argument explicit. The authors observe that current regulatory categories — SaMD, clinical decision support, software accessory — do not adequately distinguish between static clinical software, AI-enabled clinical software with defined boundaries, and autonomous agentic clinical AI operating across heterogeneous tool surfaces. The UNDCS taxonomy proposes new categories specifically to address what existing frameworks cannot characterize. The gap is named, by name, in peer-reviewed work the field is now circulating.

This is not a critique of SaMD. The framework was built for the clinical software the field deployed. The world changed. Intended use as a gating concept assumes a substrate that attests what the software did. For static software, the audit trail in the EHR is sufficient. For agentic deployment, it is not. The gating concept depends on exactly the substrate this series has been mapping — and the substrate is not there.

What this means in practice: the FDA can clear an agentic clinical AI tool today. The clearance authorizes the tool’s intended use as specified. The clearance does not — because it cannot — attest that the tool’s actions in deployment will satisfy SACS-level provenance. The framework most built for clinical software, asked to authorize agentic systems on the current substrate, becomes a witness to its own limit. The witness does not refuse. The witness answers honestly. The honest answer is that the framework can authorize the tool but cannot attest the action.

The Field Has Spoken

SaMD has turned state’s evidence. The UNDCS taxonomy authors, writing in npj Digital Medicine, name the structural gap explicitly: current regulatory categories do not cover autonomous agentic clinical AI. The framework most built for clinical software is the framework that has begun, in peer-reviewed publication, to acknowledge its own structural limit.

ARPA-H ADVOCATE: the experiment that meets the gap from the other side

ARPA-H is not a regulator. It is a federal funding agency within the Department of Health and Human Services supporting high-risk, high-reward biomedical and health research. The agency operates on the DARPA pattern: program managers set ambitious technical milestones, fund teams to attempt them, pull the plug if they do not deliver. The cultural posture is innovation-forward, fast-iterating, willing to take on problems other federal mechanisms cannot. The agency has produced consequential work in its short institutional history and is generally well-regarded as a vehicle for difficult research.

This posture matters for what comes next, because the program under examination — ADVOCATE — is a serious response to a real problem. The program acknowledges, in its own published documents, that the current paradigm for clinical AI deployment is not sustainable. ADVOCATE is funded to attempt a better paradigm. That work deserves intellectual respect, regardless of where its architecture settles. What follows is structural analysis, not institutional critique.

ADVOCATE — a multi-track ARPA-H program funded under ARPA-H-SOL-26-142 — is building an agentic AI system for cardiovascular disease management, with a supervisory-agent architecture for clinical AI validation as one of its central technical objectives. The program is structured over 39 months in two phases. It has three technical areas. TA1 develops a patient-facing autonomous AI agent capable of managing chronic cardiovascular disease — adjusting appointments, medications, diet, and exercise — with the explicit goal of providing specialist-level care continuously. TA2 — the most ambitious track structurally — builds a supervisory agent capable of monitoring TA1’s clinical AI in production to ensure continued safety and effectiveness. TA3 establishes the health-system integration and scalability studies. The program’s stated transformative goal, in its own language, is to create “a first-of-its-kind, reliable, FDA-authorized clinical agentic AI system that serves around the clock as a new, digital member of the clinical care team.” The program manager has described the objective as building “a technology that can essentially serve as a clinician-extender: an autonomous agent smart enough to understand patients’ treatment needs, which can both provide certain care autonomously as well as engage the clinical team as needed.”

The structural objective, named explicitly in the program’s materials, is to move past the current paradigm in which clinical AI outputs require provider review before action. That is, ADVOCATE is funded to build what the field would call AI-acting-and-AI-supervising for clinical use, with the explicit goal of reducing the human-in-the-loop requirement at scale while maintaining FDA-authorized safety.

TA2 — the supervisory agent — is the most structurally consequential component for this series. The ADVOCATE materials describe the supervisory agent as a disease-agnostic tool capable of real-time monitoring of clinical AI agents to ensure their continued safety and effectiveness across deployments. The program is structured to support FDA market authorization for TA1 and, separately, Medical Device Development Tool (MDDT) qualification for TA2 — the supervisory tool that the FDA’s own evaluators could use to validate other clinical AI tools post-deployment. The program anticipates close engagement with the FDA throughout the 39-month performance period.

The ISO names Model Context Protocol (MCP) as the orchestration layer for the supervisory agent’s interactions with the AI tools it supervises. MCP, as specified, does not include attestation primitives at the protocol level. The protocol carries content. It does not carry, structurally, the kind of cryptographic attestation that Source-level provenance — the first surface of SACS — would require. This is not a defect of MCP. The protocol was designed for context-sharing in agent orchestration, not for substrate-level attestation. But the ADVOCATE architecture, as published, treats the orchestration layer as if it carried attestation — and proposes to build a regulatory-grade supervisory tool on top of a substrate that does not, today, attest what passes through it.

This is the structural meeting point. ADVOCATE meets the substrate gap from the regulatory side. The supervisory agent has to deliver SaMD-grade evidence about the clinical AI it monitors. The substrate underneath the supervisory agent has to attest what the supervised AI did — Source, Action, Context, Scope, at each interaction. The substrate, as Posts 1 and 2 established, cannot currently do this at alignment-grade. The protocol the architecture relies on does not, as specified, do this either. The gap meets ADVOCATE from the substrate side. They meet in the middle. There is nothing there.

The Supervisor’s Alibi Problem

Two systems whose outputs are not anchored externally do not resolve the alibi problem. They produce two alibis instead of one.

A supervisory agent monitoring intent without an attested artifact substrate underneath it inherits Post 1’s structural problem one level up. The supervisor’s evaluation of the clinical AI’s action is itself an output of a probabilistic system. The supervisor’s claim about what the clinical AI did is bounded by the same provenance failure that bounded the clinical AI’s claim about what it did. Two systems whose outputs are not anchored externally do not resolve the alibi problem. They produce two alibis instead of one.

ADVOCATE is funded to run for 39 months in two phases, with multiple competitive down-selections at transition points. The program will produce real research, real datasets, real tooling, and real publications. Whether it produces what it is targeted to produce — an FDA-authorized patient-facing autonomous cardiovascular AI agent and a supervisory tool the FDA can qualify under MDDT for production validation of clinical AI — depends in part on whether the substrate underneath the supervisor reaches alignment-grade in the funded window. As of the publication of this post, the substrate is not there. The HL7 AI Transparency Implementation Guide referenced in Post 2 is in development. The FHIR Provenance resource is unpopulated at the individual level in U.S. healthcare deployments. The protocol layer ADVOCATE relies on does not carry attestation primitives in its specification.

The funder is asking for what the substrate cannot yet supply. The funder is not wrong to ask. The substrate is not wrong to be where it is. The architecture as published assumes the substrate will be there when the program needs it. That assumption is what this series has been mapping. It is the central structural question of agentic AI in healthcare. ADVOCATE is the highest-stakes federally-funded place where that question is currently being asked. And the question is being asked of a substrate that has not, yet, been built to answer it.

Closing the series

Two stewards. One funder. Three capacities. Three witnesses called.

HIPAA could not close the gap because it was not built to. SaMD cannot close the gap because the gating concept it depends on — intended use — assumes a substrate that does not, in current deployments, attest what the agent did. ADVOCATE meets the gap from the funder side and inherits the same problem one level up. Each witness, asked the question honestly, has to answer that the substrate is what would close the gap, and the substrate is not yet there.

This is the structural conclusion the series rests on:

The gating question for agentic AI in healthcare has to change. From what was the intent of the action — the question healthcare’s century-old safety culture was built to ask — to what does the substrate attest about the action. Both questions matter. Only one can be answered structurally. Only one can be load-bearing for systems whose actor cannot be subpoenaed.

The substrate has to come first. Before the regulatory frame can authorize agentic clinical AI at scale, the substrate that the regulatory frame depends on has to be built. The HL7 AI Transparency IG is one starting point. The FHIR Provenance resource, populated at the individual level, is another. The four surfaces — Source, Action, Context, Scope — are the architectural locations where the substrate work has to be done. Until each surface reaches alignment-grade, the field is deploying agentic systems on top of infrastructure that cannot attest what the systems do.

The Naming

The substrate is upstream of regulation, not produced by it.

The regulatory frame cannot lead. The two regulatory stewards and the federal funder examined in this post are doing what they were built to do, within the capacities they have. None of them can build the substrate the agentic deployment requires. The substrate is built upstream of regulation, not by it. The standards body specifies it. The deployment populates it. The enforcement frame requires it. The regulatory frame can only attest what the substrate makes attestable. Until the substrate exists, regulation can authorize tools but cannot govern actions.

This series sits inside a larger structural argument Luminity has been developing across the catalog. The Captured Vertical examined what happens when the deployment surface for clinical agentic AI consolidates inside a single closed-stack vendor. The Great Compression series mapped the broader pattern of agentic AI infrastructure compressing into a small number of substrate-providing platforms. The Provenance Gap series, closing here, names the substrate work that has to happen before either the closed stack or the open one can deliver alignment-grade agentic systems in clinical scope. The structural argument across the catalog is consistent: the substrate is the work. Everything else is downstream.

The patient is still alive in this post, because the deployments have not yet reached the scale at which the structural defect produces routine consequence. That is the window the field has. It is not a wide window. The deployment pattern is accelerating. The substrate work is downstream of deployment. The gap is widening. Each post in this series has named one piece of what the field will have to do before that window closes.

The witnesses have turned state’s evidence. The frameworks the field is counting on cannot close this gap on the current substrate. The substrate is the work. The rest is downstream.

The Post 3 Claim

The three institutional actors that the field is counting on — HIPAA and SaMD as regulatory stewards, the ARPA-H ADVOCATE program as a federal funder counting on the stewards — each reach the substrate gap from a different direction, and none of them can close it on the current infrastructure. HIPAA was not built to attest agentic provenance. SaMD’s gating concept of intended use presumes a substrate that attests what the software did; for agentic deployment, that substrate is not alignment-grade. ARPA-H ADVOCATE’s supervisory-agent architecture is asking for regulatory-grade outputs from a substrate the protocol layer cannot today attest. The substrate is what would close the gap. The substrate is upstream of regulation, not produced by it. Until the four surfaces — Source, Action, Context, Scope — reach alignment-grade, the regulatory frame can authorize tools but cannot govern actions. The substrate is the work. The rest is downstream.

The Provenance Gap Series Closes Here

Three posts. Post 1 named the structural defect in intent verification for agents. Post 2 named the substrate — SACS, the four surfaces where alignment-grade provenance has to live. Post 3 walked the regulatory frame from the inside and showed that none of the institutional actors the field is counting on can close the gap without the substrate. The substrate is the work. The rest is downstream.

The Provenance Gap  ·  A 3-Post Series  ·  Series Closed
Post 02 · Published Four Surfaces, No Witness
Post 03 · Now Reading The Witnesses Turn State’s Evidence
References & Sources

Share this:

Like this:

Like Loading…