Multi-agent cooperation through in-context co-player inference
Marissa A. Weis, Maciej Wołczyk, Rajai Nasser, Rif A. Saurous, Blaise Agüera y Arcas, João Sacramento, Alexander Meulemans (Equal contribution; affiliated with Google, Paradigms of Intelligence Team)
• Arxiv
This paper addresses a core challenge in multi-agent reinforcement learning (MARL): enabling robust cooperation among self-interested agents without relying on hand-coded coordination, explicit opponent models, or artificial separations like naive learners vs. meta-learners.
Multi-agent cooperation through in-context co-player inference
Authors: Marissa A. Weis, Maciej Wołczyk, Rajai Nasser, Rif A. Saurous, Blaise Agüera y Arcas, João Sacramento, Alexander Meulemans (Equal contribution; affiliated with Google, Paradigms of Intelligence Team)
Journal: Arxiv
Published: February 2026
This paper addresses a core challenge in multi-agent reinforcement learning (MARL): enabling robust cooperation among self-interested agents without relying on hand-coded coordination, explicit opponent models, or artificial separations like naive learners vs. meta-learners.
The authors show that training sequence model-based agents (e.g., transformer-style architectures) against a diverse pool of co-players naturally induces in-context learning capabilities. Agents use recent interaction history as context to infer and adapt to their partner's strategy within an episode—acting like fast-timescale learners without explicit programming.
This in-context adaptation creates vulnerability to exploitation (e.g., extortion strategies), which in turn generates mutual pressure during decentralized training, leading agents to shape each other's dynamics toward cooperative equilibria. The mechanism mirrors prior "learning-aware" MARL ideas but emerges organically from standard decentralized RL + sequence model inductive biases + co-player diversity no hardcoded assumptions required.
The work demonstrates emergent cooperation (e.g., in iterated Prisoner's Dilemma settings) and suggests a scalable, decentralized path for cooperative multi-agent systems leveraging the power of large sequence models.