The obvious objection to this whole series is that better-trained models will close the coordination gap on their own.
This closer takes that objection seriously and tests it against the training literature.
The bet, stated plainly
The capability bet is that coordination is a side effect of intelligence — make each agent smart enough and teamwork follows. The anchor of this series already cut against it: CooperBench found that coding skill provides no protection against coordination overhead, with its weakest individual coder retaining the most capability under cooperation and a mid-tier coder the least (Khatua et al., 2026, CooperBench: Why Coding Agents Cannot be Your Teammates Yet, arXiv:2601.13295v2, preprint). The training literature lets us test the bet directly, by separating training that targets the individual from training that targets the team.
Training the individual: a clean win on the wrong axis
The strongest recent evidence on individual-agent training is also the clearest demonstration that capability is the wrong axis. A reinforcement-learning pipeline — rejection fine-tuning followed by multi-turn RL — takes an open-weight 72B model from 11.4% to 39.0% on SWE-bench Verified, competitive with much larger models and achieved without teacher distillation (Golubev et al., 2025, Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning, arXiv:2508.03501v2, preprint). It is a genuine advance in long-horizon, tool-using competence. It is also entirely single-agent: there is no coordination, no division of labor, no second agent anywhere in the method. It answers “can training raise individual agent skill?” with an emphatic yes and leaves “can training fix teamwork?” untouched. Read against CooperBench, that is the whole point — this is precisely the capability that was shown not to predict coordination, now tripled, with no reason to expect the coordination gap to move.
Training the team: a qualified yes
Coordination does respond to training when training targets it directly. Modeling multi-LLM collaboration as a cooperative reinforcement-learning problem and optimizing a shared reward over the joint output — the MAGRPO method — produces coordination behaviors the agents were never prompted to perform: fallback schemes, coordinator roles, strategy filters emerge under a single joint reward, and joint return rises far above untuned multi-agent prompting (Liu et al., 2025, LLM Collaboration with Multi-Agent Reinforcement Learning, arXiv:2508.04652v7, preprint). This is real evidence that teamwork can be installed by training rather than hoped for. The qualifications are equally real and the series keeps them visible: the results are on two agents, roughly three-billion-parameter models, and short horizons; the rewards are hand-built metric proxies; and the best results lean on a frontier model as an in-the-loop advisor, raising the question of how much is learned cooperation versus distilled guidance. The honest reading is that training for coordination works in the small and is unproven at the scale enterprises deploy.
There is a deeper point in how MAGRPO works. It still requires a structural reward design and feedback scaffolding to train against — training tunes the agents inside a structure; it does not dissolve the need for the structure. The same is true of adaptive partner-modeling: the A-ToM agent improves coordination not by reasoning harder but by wrapping the model in a lightweight loop that aligns reasoning depth with its partner (Mu et al., 2026, Adaptive Theory of Mind for LLM-based Multi-Agent Coordination, arXiv:2603.16264v1, AAAI 2026, preprint). Every training approach that improves coordination does so by building structure around or into the learning process, not by removing the need for it.
The theoretical promise, and its honest distance
There is a route on which coordination becomes a guaranteed property of training rather than an emergent hope. In cooperative multi-agent reinforcement learning, sufficiently high entropy regularization can force independently trained agents to converge on a single shared convention, making them compatible by construction — and the approach sets a new inter-seed cross-play record on Hanabi while doing it (Forkel et al., 2026, High Entropy Regularization Leads to Symmetry Equivariant Policies in Dec-POMDPs, arXiv:2511.22581v4, preprint). It is the cleanest statement of coordination by construction in the literature: align conventions at training time so independent agents interoperate without per-pair negotiation. Two honesty notes belong here. The result is in tabular multi-agent reinforcement learning on cooperative games, not language-model agents — the symmetries it aligns are formal game automorphisms, not natural-language conventions, and the bridge to LLM systems is the reader’s analogy, not the paper’s claim. And the authors themselves downgraded the convergence guarantee in an erratum, disclosing a flaw they could not repair, so the proof now holds only for an idealized exact-gradient case. It is a powerful conceptual prior and not a deployable technique.
What the production view is actually doing
The most telling signal is what the leading production system does today. Anthropic’s multi-agent Research system achieves its coordination through orchestration structure and prompt design — a lead agent that decomposes and delegates, subagents with bounded scope, shared artifacts passed by reference — not through training agents to coordinate (Hadfield et al., 2025, How we built our multi-agent research system, Anthropic Engineering). A production team with every incentive and resource to train coordination into its models instead engineers it into the architecture, because that is what works now. That is the practical verdict on the capability bet: the deployable lever today is structure, and training for coordination is a research frontier that, where it works at all, works by adding structure of its own.
The series, closed
Six posts, one argument. The coordination gap is structural, not a capability shortfall (Post 1). It lives at integration, not communication, so message volume is the wrong telemetry (Post 2). Structure closes the spatial half of it cleanly and leaves the semantic half open (Post 3). Governance makes the remaining cooperation observable and repairable (Post 4). The human is a load-bearing element whose placement, not presence, determines value (Post 5). And training, the natural escape hatch, turns out to confirm the thesis rather than dissolve it: capability training does not produce coordination, coordination training works only in the small and only by building structure into the process, and the theoretical route to coordination-by-construction lives, for now, in environments unlike the ones enterprises run. The conclusion an architect can act on is unchanged from the first post: build the structure — but building is not hand-building. The coordination structures this series describes are arriving as a provider-delivered substrate, a coordination-grade harness to adopt and govern: the shift named in The Provider Is the New Enterprise OS, and one of the architecture decisions enterprises are already making. That landing is the Luminity read, resting on the production evidence rather than the papers. Build the structure, and recognize it is arriving as a substrate to govern. Do not wait for the model.
The training literature confirms, not retires, the case for coordination by construction. Capability and coordination are different axes: training that raises one does not move the other, and the clearest individual-capability result in the corpus is single-agent by design.
Where training does improve coordination, it does so by engineering structure into the learning process — which is also what the leading production system has chosen. That structure is increasingly a provider-delivered substrate: a coordination-grade harness the enterprise adopts and governs rather than hand-builds — our read of where the viable path runs, continuous with The Provider Is the New Enterprise OS, not a finding of the corpus. Build the structure. Do not wait for the model.
