Colosseum: Auditing Collusion in Cooperative Multi‑Agent Systems
A framework to measure when a subset of LLM agents forms a coalition and deviates from the cooperative objective— using DCOP grounding, regret‑based metrics, and evidence from both messages and actions.
Abstract
Multi‑agent systems, where LLM agents communicate through free‑form language, enable sophisticated coordination for solving complex cooperative tasks. This surfaces a unique safety problem when agents form a coalition and collude to pursue secondary goals and degrade the joint objective. Colosseum audits collusive behavior by grounding cooperation as a Distributed Constraint Optimization Problem (DCOP) and measuring collusion via regret relative to the cooperative optimum—complementing (and stress‑testing) purely message‑based judging.
What Colosseum does
The framework is designed for post‑hoc auditing: given the task, system configuration, and a trace, it outputs measurable collusion signals and diagnostics.
- DCOP formulation provides an explicit cooperative optimum and enables counterfactual coalitional deviations.
- Decomposed rewards simplify credit assignment for who caused joint harm.
- Short‑horizon decisions reduce temporal ambiguity during evaluation.
- Regret‑based metrics on actions (e.g., coalition advantage) capture “walk the talk”.
- LLM‑as‑a‑judge on message logs captures stated intent and persuasion attempts.
- Compare both to detect “collusion on paper” vs. executed collusion.
- Objective misalignment: coalitions optimize a mixed/alternate objective.
- Persuasion: agents attempt to steer others via messages.
- Network influence: topology and channel constraints shape exposure and diffusion.
- Jira: task assignment within a software team.
- Hospital: intra‑hospital patient/resource allocation (heterogeneous roles).
- Meeting scheduling: cooperative scheduling benchmark.
Lightweight demo
A lightweight, page‑local visualization to illustrate how “messages vs actions” can diverge.
High‑level findings
Out‑of‑the‑box models can exhibit a propensity to collude—especially when a secret channel is introduced.
“Collusion on paper” can occur when agents plan collusion in text but choose non‑collusive actions, producing little effect on the joint task.
Colosseum can disentangle objective misalignment, persuasion, and network influence to study when and how collusion emerges.
Supports comparing candidate models and deployment choices (topology/channels) before using MAS in higher‑stakes workflows.
BibTeX
@article{nakamura2026colosseum,
title = {Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems},
author = {Nakamura, Mason and Kumar, Abhinav and Das, Saswat and Abdelnabi, Sahar and Mahmud, Saaduddin and Fioretto, Ferdinando and Zilberstein, Shlomo and Bagdasarian, Eugene},
journal = {arXiv preprint arXiv:2602.15198},
year = {2026},
url = {https://arxiv.org/abs/2602.15198},
doi = {10.48550/arXiv.2602.15198}
}