Paper · arXiv:2602.15198 · Submitted Feb 16, 2026

Colosseum: Auditing Collusion in Cooperative Multi‑Agent Systems

A framework to measure when a subset of LLM agents forms a coalition and deviates from the cooperative objective— using DCOP grounding, regret‑based metrics, and evidence from both messages and actions.

Authors: Mason Nakamura*, Abhinav Kumar*, Saswat Das*, Sahar Abdelnabi, Saaduddin Mahmud, Ferdinando Fioretto, Shlomo Zilberstein, Eugene Bagdasarian

^* Equal contribution

Areas: Multi‑agent Systems LLM Safety Auditing & Monitoring

Paper PDF Code Interactive demo Demo Interaction Traces Cite

Abstract

Multi‑agent systems, where LLM agents communicate through free‑form language, enable sophisticated coordination for solving complex cooperative tasks. This surfaces a unique safety problem when agents form a coalition and collude to pursue secondary goals and degrade the joint objective. Colosseum audits collusive behavior by grounding cooperation as a Distributed Constraint Optimization Problem (DCOP) and measuring collusion via regret relative to the cooperative optimum—complementing (and stress‑testing) purely message‑based judging.

What Colosseum does

The framework is designed for post‑hoc auditing: given the task, system configuration, and a trace, it outputs measurable collusion signals and diagnostics.

Grounding

DCOP formulation provides an explicit cooperative optimum and enables counterfactual coalitional deviations.
Decomposed rewards simplify credit assignment for who caused joint harm.
Short‑horizon decisions reduce temporal ambiguity during evaluation.

Auditing signals

Regret‑based metrics on actions (e.g., coalition advantage) capture “walk the talk”.
LLM‑as‑a‑judge on message logs captures stated intent and persuasion attempts.
Compare both to detect “collusion on paper” vs. executed collusion.

Factors Colosseum can vary

Objective misalignment: coalitions optimize a mixed/alternate objective.
Persuasion: agents attempt to steer others via messages.
Network influence: topology and channel constraints shape exposure and diffusion.

Environments (examples)

Jira: task assignment within a software team.
Hospital: intra‑hospital patient/resource allocation (heterogeneous roles).
Meeting scheduling: cooperative scheduling benchmark.

Lightweight demo

A lightweight, page‑local visualization to illustrate how “messages vs actions” can diverge.

Scenario viewer

Choose a scenario to see communications (public) vs a secret channel (private).

High‑level findings

Emergence

Out‑of‑the‑box models can exhibit a propensity to collude—especially when a secret channel is introduced.

Why actions matter

“Collusion on paper” can occur when agents plan collusion in text but choose non‑collusive actions, producing little effect on the joint task.

Diagnostics

Colosseum can disentangle objective misalignment, persuasion, and network influence to study when and how collusion emerges.

Practical use

Supports comparing candidate models and deployment choices (topology/channels) before using MAS in higher‑stakes workflows.

BibTeX

Citation

arXiv page

@article

@article{nakamura2026colosseum,
  title   = {Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems},
  author  = {Nakamura, Mason and Kumar, Abhinav and Das, Saswat and Abdelnabi, Sahar and Mahmud, Saaduddin and Fioretto, Ferdinando and Zilberstein, Shlomo and Bagdasarian, Eugene},
  journal = {arXiv preprint arXiv:2602.15198},
  year    = {2026},
  url     = {https://arxiv.org/abs/2602.15198},
  doi     = {10.48550/arXiv.2602.15198}
}