If the failure is upstream — wrong-document retrieval — and at the boundary — fabricated or misattributed citation — then the fix is not a stronger generator.
It is a more disciplined retrieval layer and an enforced citation contract. The evidence points to three moves: retrieve by legal structure rather than surface similarity, enforce citation and permit abstention at generation, and verify the citation against ground truth before it is surfaced.
Structure beats similarity
The retrieval failures of Post 1 share a root cause: ranking by semantic resemblance over text fragments, in corpora where the relevant and irrelevant documents read alike. The fix is to retrieve by legal structure. Modeling doctrine at the level of statutory factors and citation-weighted graphs — rather than flat similarity — produces retrieval that tracks legal relevance instead of surface overlap, improving the doctrinal precision of what reaches the model [1]. This is the structural turn the broader generative-IR literature frames as the foundation of a grounded system: the retrieval layer is not a search box bolted onto a model, it is the part of the architecture that decides what the model is allowed to reason over [3].
The point is not a better embedding. It is that legal relevance is structural — hierarchy, authority, temporality — and a retriever blind to that structure will keep returning the confident wrong document.
Enforce the citation, permit the abstention
Better retrieval narrows the failure; it does not close the citation boundary. That requires changing the generation contract. A deployed pattern in a high-stakes US regulatory setting shows the shape: over IRS and state tax materials, the system enforces citation during generation, preserves page-level provenance, and — critically — abstains when the retrieved evidence is insufficient rather than answering anyway [2]. Each of the three is a direct counter to a Post 1 failure: enforced citation counters fabrication, provenance counters wrong-document mismatch, and permitted abstention counters the model’s refusal to decline.
This is the move that converts “we added RAG” into something defensible. The system is no longer trusted to cite faithfully because the model is good; it is constrained to cite, to show where the citation came from, and to stop when it cannot.
Verify the citation against ground truth
The last move assumes even an enforced citation can be wrong, and checks it before it is surfaced. Here the strongest methods are jurisdiction-neutral, and the series is candid about that. Citation grounding verifies a generated legal citation against a ground-truth citation graph, decomposing the check into whether the provision exists, whether it is contextually relevant, and whether it was valid at the relevant date — turning “is this citation real” into a measurable, three-part test [4]. Attribution-based re-ranking improves citation faithfulness by ranking passages on how much they actually drive the answer rather than how similar they look, correcting the very mismatch Post 1 identified [5]. And retrieval-grounded verification of citations against external sources catches fabricated or metadata-corrupted references outright [6].
These are method anchors, not US-native legal deployments — citation grounding and attribution re-ranking are demonstrated on non-US corpora, and citation verification is shown cross-domain. The principle is established and measurable; its US-legal instantiation is the work still arriving. Stating that is part of the discipline this series is about.
The grounding layer, made to hold
Put the three together and the grounding layer stops being the weak link. Structure-aware retrieval delivers the right authority; an enforced citation contract with abstention makes the model cite it or stop; verification confirms the citation is real and relevant before it reaches the reader. Each move answers a specific Post 1 failure, and together they produce a retrieval-and-citation layer that carries its own evidence — which is exactly what the assurance stack of Series 23 requires of its grounding layer.
That is the relationship between the two series. Series 23 argued, at the level of architecture, that defensibility is structural. This series went into one of those structures and showed, with the failure evidence and the fixes, why the grounding layer is the hardest and most consequential one to get right.
The fix for legal RAG is structure and citation discipline, not a better model. Retrieve by legal structure, enforce the citation and permit the abstention, and verify the citation against ground truth — and the layer where legal AI fails becomes the layer where it earns trust.
A retrieval-and-citation layer that returns the right authority, cites it faithfully, and declines when it cannot is the difference between a system that reads grounded and a system that is.
This concludes Why Legal RAG Fails. For the architectural frame this series sits inside — the full assurance stack from grounding to governance — see Assurance by Architecture (Series 23).
