Can Training Fix Teamwork? Can Training Fix Teamwork?

This is Post 6 of 6 — the closer of Coordination by Construction, Series 19. The earlier posts established that the coordination gap is structural (The Coordination Gap Is an Architecture Problem), lives at integration (Talking Is Not Coordinating), has a buildable answer (Coordination by Construction), needs governance (Observable, Repairable Cooperation), and runs through a load-bearing human (The Human Is a Design Element). This post tests the natural escape hatch — whether better-trained models close the gap on their own. It runs alongside Series 17 — Assurance, which frames assurance as a property built into the architecture; coordination by construction is that same discipline applied to how agents work together. Across the six posts, the argument is built on those 13 research papers plus Anthropic’s production account — the evidence base for treating coordination as something engineered, not awaited.

The obvious objection to this whole series is that better-trained models will close the coordination gap on their own.

This closer takes that objection seriously and tests it against the training literature.

The bet, stated plainly

The capability bet is that coordination is a side effect of intelligence — make each agent smart enough and teamwork follows. The anchor of this series already cut against it: CooperBench found that coding skill provides no protection against coordination overhead, with its weakest individual coder retaining the most capability under cooperation and a mid-tier coder the least (Khatua et al., 2026, CooperBench: Why Coding Agents Cannot be Your Teammates Yet, arXiv:2601.13295v2, preprint). The training literature lets us test the bet directly, by separating training that targets the individual from training that targets the team.

Training the individual: a clean win on the wrong axis

The strongest recent evidence on individual-agent training is also the clearest demonstration that capability is the wrong axis. A reinforcement-learning pipeline — rejection fine-tuning followed by multi-turn RL — takes an open-weight 72B model from 11.4% to 39.0% on SWE-bench Verified, competitive with much larger models and achieved without teacher distillation (Golubev et al., 2025, Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning, arXiv:2508.03501v2, preprint). It is a genuine advance in long-horizon, tool-using competence. It is also entirely single-agent: there is no coordination, no division of labor, no second agent anywhere in the method. It answers “can training raise individual agent skill?” with an emphatic yes and leaves “can training fix teamwork?” untouched. Read against CooperBench, that is the whole point — this is precisely the capability that was shown not to predict coordination, now tripled, with no reason to expect the coordination gap to move.

Training the team: a qualified yes

Coordination does respond to training when training targets it directly. Modeling multi-LLM collaboration as a cooperative reinforcement-learning problem and optimizing a shared reward over the joint output — the MAGRPO method — produces coordination behaviors the agents were never prompted to perform: fallback schemes, coordinator roles, strategy filters emerge under a single joint reward, and joint return rises far above untuned multi-agent prompting (Liu et al., 2025, LLM Collaboration with Multi-Agent Reinforcement Learning, arXiv:2508.04652v7, preprint). This is real evidence that teamwork can be installed by training rather than hoped for. The qualifications are equally real and the series keeps them visible: the results are on two agents, roughly three-billion-parameter models, and short horizons; the rewards are hand-built metric proxies; and the best results lean on a frontier model as an in-the-loop advisor, raising the question of how much is learned cooperation versus distilled guidance. The honest reading is that training for coordination works in the small and is unproven at the scale enterprises deploy.

There is a deeper point in how MAGRPO works. It still requires a structural reward design and feedback scaffolding to train against — training tunes the agents inside a structure; it does not dissolve the need for the structure. The same is true of adaptive partner-modeling: the A-ToM agent improves coordination not by reasoning harder but by wrapping the model in a lightweight loop that aligns reasoning depth with its partner (Mu et al., 2026, Adaptive Theory of Mind for LLM-based Multi-Agent Coordination, arXiv:2603.16264v1, AAAI 2026, preprint). Every training approach that improves coordination does so by building structure around or into the learning process, not by removing the need for it.

The theoretical promise, and its honest distance

There is a route on which coordination becomes a guaranteed property of training rather than an emergent hope. In cooperative multi-agent reinforcement learning, sufficiently high entropy regularization can force independently trained agents to converge on a single shared convention, making them compatible by construction — and the approach sets a new inter-seed cross-play record on Hanabi while doing it (Forkel et al., 2026, High Entropy Regularization Leads to Symmetry Equivariant Policies in Dec-POMDPs, arXiv:2511.22581v4, preprint). It is the cleanest statement of coordination by construction in the literature: align conventions at training time so independent agents interoperate without per-pair negotiation. Two honesty notes belong here. The result is in tabular multi-agent reinforcement learning on cooperative games, not language-model agents — the symmetries it aligns are formal game automorphisms, not natural-language conventions, and the bridge to LLM systems is the reader’s analogy, not the paper’s claim. And the authors themselves downgraded the convergence guarantee in an erratum, disclosing a flaw they could not repair, so the proof now holds only for an idealized exact-gradient case. It is a powerful conceptual prior and not a deployable technique.

What the production view is actually doing

The most telling signal is what the leading production system does today. Anthropic’s multi-agent Research system achieves its coordination through orchestration structure and prompt design — a lead agent that decomposes and delegates, subagents with bounded scope, shared artifacts passed by reference — not through training agents to coordinate (Hadfield et al., 2025, How we built our multi-agent research system, Anthropic Engineering). A production team with every incentive and resource to train coordination into its models instead engineers it into the architecture, because that is what works now. That is the practical verdict on the capability bet: the deployable lever today is structure, and training for coordination is a research frontier that, where it works at all, works by adding structure of its own.

The series, closed

Six posts, one argument. The coordination gap is structural, not a capability shortfall (Post 1). It lives at integration, not communication, so message volume is the wrong telemetry (Post 2). Structure closes the spatial half of it cleanly and leaves the semantic half open (Post 3). Governance makes the remaining cooperation observable and repairable (Post 4). The human is a load-bearing element whose placement, not presence, determines value (Post 5). And training, the natural escape hatch, turns out to confirm the thesis rather than dissolve it: capability training does not produce coordination, coordination training works only in the small and only by building structure into the process, and the theoretical route to coordination-by-construction lives, for now, in environments unlike the ones enterprises run. The conclusion an architect can act on is unchanged from the first post: build the structure — but building is not hand-building. The coordination structures this series describes are arriving as a provider-delivered substrate, a coordination-grade harness to adopt and govern: the shift named in The Provider Is the New Enterprise OS, and one of the architecture decisions enterprises are already making. That landing is the Luminity read, resting on the production evidence rather than the papers. Build the structure, and recognize it is arriving as a substrate to govern. Do not wait for the model.

The Hard Claim

The training literature confirms, not retires, the case for coordination by construction. Capability and coordination are different axes: training that raises one does not move the other, and the clearest individual-capability result in the corpus is single-agent by design.

Where training does improve coordination, it does so by engineering structure into the learning process — which is also what the leading production system has chosen. That structure is increasingly a provider-delivered substrate: a coordination-grade harness the enterprise adopts and governs rather than hand-builds — our read of where the viable path runs, continuous with The Provider Is the New Enterprise OS, not a finding of the corpus. Build the structure. Do not wait for the model.

Coordination by Construction · Series 19 · 6 Posts

Post 01 · Published The Coordination Gap Is an Architecture Problem

Post 02 · Published Talking Is Not Coordinating

Post 03 · Published Coordination by Construction

Post 04 · Published Observable, Repairable Cooperation

Post 05 · Published The Human Is a Design Element

Post 06 · Now Reading Can Training Fix Teamwork?

Better-trained models will not close the coordination gap on their own. Capability and coordination are different axes: training that raises individual capability does not move coordination, and where training does improve coordination it works only in the small and only by engineering structure into the learning process. The deployable answer today is architectural — which is what the leading production system has chosen. Build the structure; do not wait for the model.

The question Will better-trained models close the coordination gap on their own?
Single-agent RL Lifts one agent 11.4% → 39.0% on SWE-bench Verified — a clean win on the individual axis, with no coordination component. Capability is not coordination.
MAGRPO Training for coordination directly does install it (emergent fallback, coordinator, strategy-filter schemes) — but on two agents, ~3B models, short horizons, with structural reward scaffolding.
Dec-POMDP entropy A theoretical route to coordination-by-construction via entropy — but MARL not LLMs, and the convergence guarantee was downgraded to an exact-gradient case in an erratum.
The precedent Anthropic’s production system coordinates through orchestration and prompt design, not coordination training.
The conclusion Build the structure. Do not wait for the model.

Series 17 · Post 01 Compression Debt Assurance
Series 17 · Post 02 Certification Boundary Assurance
Series 17 · Post 03 Audit Substrate Assurance
Series 17 · Post 04 Convergence Pattern Assurance
Series 17 · Post 05 Assurance as Architecture Assurance

Can Training Fix Teamwork?

The bet, stated plainly

Training the individual: a clean win on the wrong axis

Training the team: a qualified yes

The theoretical promise, and its honest distance

What the production view is actually doing

The series, closed

Don’t Build the Harness. Govern It.

Like this:

Related

Can Training Fix Teamwork?

The bet, stated plainly

Training the individual: a clean win on the wrong axis

Training the team: a qualified yes

The theoretical promise, and its honest distance

What the production view is actually doing

The series, closed

Don’t Build the Harness. Govern It.

Share this:

Like this:

Related