ICML Poster Policy-Regret Minimization in Markov Games with Function Approximation

Poster

Policy-Regret Minimization in Markov Games with Function Approximation

Thanh Nguyen-Tang · Raman Arora

West Exhibition Hall B2-B3 #W-908

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: We study policy-regret minimization problem in dynamically evolving environments, modeled as Markov games between a learner and a strategic, adaptive opponent. We propose a general algorithmic framework that achieves the optimal $\mathcal{O}(\sqrt{T})$ policy regret for a wide class of large-scale problems characterized by an Eluder-type condition--extending beyond the tabular settings of previous work. Importantly, our framework uncovers a simpler yet powerful algorithmic approach for handling reactive adversaries, demonstrating that leveraging opponent learning in such settings is key to attaining the optimal $\mathcal{O}(\sqrt{T})$ policy regret.

Lay Summary:

In many real-world situations—from autonomous vehicles to online recommendation systems—multiple decision-makers (or agents) must interact with one another while learning how to make better choices. But when these agents have competing goals, learning can become difficult and unpredictable. In this work, we design a learning algorithm that helps a decision-maker perform well even when facing an opponent that learns and adapts over time. Unlike past work, our approach works in complex settings where the environment is too large to fully memorize and where the opponent's behavior must be approximated. We show that our algorithm learns to make good decisions over time and provides the first theoretical guarantees for success in such challenging situations. Our results could help build safer and more reliable AI systems that learn through repeated interaction.

Chat is not available.