Poster
Best of Both Worlds: Regret Minimization versus Minimax Play
Adrian Müller · Jon Schneider · EFSTRATIOS PANTELEIMON SKOULAKIS · Luca Viano · Volkan Cevher
West Exhibition Hall B2-B3 #W-818
When repeatedly playing a game such as Rock-Paper-Scissors or Poker against an unknown opponent, the following dilemma arises: Should one rather a) compute a strong strategy and play it in every round, or b) run a learning algorithm that automatically adapts to the opponent's play over time? The first approach (a) would guarantee that one can expect to lose nothing against the opponent. Yet, this static approach comes at the cost of potentially missing out on systematically winning against the opponent if they are weak. Indeed, the second approach (b) would guarantee to systematically win against such weak opponents. However, in this case we also risk losing a significant amount due to the slow learning process. In this paper, we show that, perhaps surprisingly, it is possible to essentially guarantee the benefits of both of these approaches in many games of interest, even if one does not observe all information the learning algorithm may benefit from. This implies that in such games, one can indeed hope to systematically win against weak opponents while risking only a small expected loss, even if the opponent turns out to be strong.