Poster
The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Michal Valko · Vianney Perchet
East Exhibition Hall A-B #E-1902
We study the design of efficient algorithms that learn optimal strategies for playing games. We focus on a setting where two independent algorithms learn by repeatedly playing against each other, without any communication.Most existing works assume that the algorithms can freely play against each other and only output a final good strategy. In this article, we add a constraint: the strategies must improve over time. Especially, the algorithms are not allowed to submit a potentially "dumb" strategy at a given time as a test. We show that this makes learning harder and propose two methods that are mathematically near-optimal.This research can be applied to learning actual games such as poker, but not only, as many important machine learning problems can be formulated this way. For example, when two language models propose a response, the winner can be the one chosen by the user.