Poster
Learning Imperfect Information Extensive-form Games with Last-iterate Convergence under Bandit Feedback
Canzhe Zhao · Yutian Cheng · Jing Dong · Baoxiang Wang · Shuai Li
West Exhibition Hall B2-B3 #W-706
In competitive games like poker, players must make decisions without knowing their opponent’s moves—a scenario called imperfect information. Finding optimal strategies (Nash equilibria) in such games is challenging, especially when players only learn from their own gameplay (bandit feedback), rather than observing the full game dynamics.Our work introduces a new algorithm that helps players efficiently improve their strategies over time, ensuring they converge to a near-optimal solution with guaranteed performance. Unlike prior methods, which either require full knowledge of the game or lack convergence guarantees, our approach uses a carefully designed regularization technique to balance exploration and exploitation. We prove that our method converges reliably, even with limited feedback, and demonstrate its effectiveness across various game scenarios. This could have applications in AI for games, strategic decision-making, and even real-world negotiations where information is incomplete.In short: we propose a practical way for AI (or humans) to learn strong strategies in competitive, hidden-information settings—with theoretical guarantees and strong empirical results.