ICML Poster Best of Both Worlds: Regret Minimization versus Minimax Play

Poster

Best of Both Worlds: Regret Minimization versus Minimax Play

Adrian Müller · Jon Schneider · EFSTRATIOS PANTELEIMON SKOULAKIS · Luca Viano · Volkan Cevher

West Exhibition Hall B2-B3 #W-818

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

Lay Summary:

When repeatedly playing a game such as Rock-Paper-Scissors or Poker against an unknown opponent, the following dilemma arises: Should one rather a) compute a strong strategy and play it in every round, or b) run a learning algorithm that automatically adapts to the opponent's play over time? The first approach (a) would guarantee that one can expect to lose nothing against the opponent. Yet, this static approach comes at the cost of potentially missing out on systematically winning against the opponent if they are weak. Indeed, the second approach (b) would guarantee to systematically win against such weak opponents. However, in this case we also risk losing a significant amount due to the slow learning process. In this paper, we show that, perhaps surprisingly, it is possible to essentially guarantee the benefits of both of these approaches in many games of interest, even if one does not observe all information the learning algorithm may benefit from. This implies that in such games, one can indeed hope to systematically win against weak opponents while risking only a small expected loss, even if the opponent turns out to be strong.

Chat is not available.