Poster
Policy-Regret Minimization in Markov Games with Function Approximation
Thanh Nguyen-Tang · Raman Arora
West Exhibition Hall B2-B3 #W-908
In many real-world situations—from autonomous vehicles to online recommendation systems—multiple decision-makers (or agents) must interact with one another while learning how to make better choices. But when these agents have competing goals, learning can become difficult and unpredictable. In this work, we design a learning algorithm that helps a decision-maker perform well even when facing an opponent that learns and adapts over time. Unlike past work, our approach works in complex settings where the environment is too large to fully memorize and where the opponent's behavior must be approximated. We show that our algorithm learns to make good decisions over time and provides the first theoretical guarantees for success in such challenging situations. Our results could help build safer and more reliable AI systems that learn through repeated interaction.