Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures

Talk Isn’t Always Cheap: Understanding Failure Modes in Multi-Agent Debate

Harsh Satija · Andrea Wynn · Gillian Hadfield


Abstract:

While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. While prior work has exclusively focused on debates within homogeneous groups of agents, we explore how diversity in model capabilities influences the dynamics and outcomes of multi-agent interactions. Through a series of experiments, we demonstrate that debate can lead to a decrease in accuracy over time — even in settings where stronger (i.e., more capable) models outnumber their weaker counterparts. Our analysis reveals that models frequently shift from correct to incorrect answers in response to peer reasoning, favoring agreement over challenging flawed reasoning. These results highlight important failure modes in the exchange of reasons during multi-agent debate, suggesting that naive applications of debate may cause performance degradation when agents are neither incentivized nor adequately equipped to resist persuasive but incorrect reasoning.

Chat is not available.