ICML Oracle-Efficient Adversarial Reinforcement Learning via Max-Following

Poster
in
Workshop: Exploration in AI Today (EXAIT)

Oracle-Efficient Adversarial Reinforcement Learning via Max-Following

Sikata Sengupta · Zakaria Mhammedi · Teodor Vanislavov Marinov

Keywords: [ reinforcement learning ] [ machine learning theory ] [ max-following ] [ ensembling ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Learning the optimal policy in reinforcement learning (RL) with large state andaction spaces remains a notoriously difficult problem from both computational andstatistical perspectives. A recent line of work addresses this challenge by aimingto compete with, or improve upon, a given base class of policies. One approach,known as max-following, selects at each state the policy from the base class whoseestimated value function is highest. In this paper, we extend the max-followingframework to the setting of regret minimization under adversarial initial states andlimited feedback. Our algorithm is oracle-efficient, achieves no-regret guaranteeswith respect to the base class (and to the worst approximate max-following policy),and avoids any dependence on the size of the state or action space. It also attainsthe optimal rate in terms of the number of episodes. Additionally, we establish alower bound on the regret of any max-following algorithm as a function of β, aparameter that quantifies the approximation slack in the benchmark policy class.Finally, we empirically validate our theoretical findings on the Linear QuadraticRegulator (LQR) problem.

Chat is not available.

Poster in Workshop: Exploration in AI Today (EXAIT)

Oracle-Efficient Adversarial Reinforcement Learning via Max-Following

Sikata Sengupta · Zakaria Mhammedi · Teodor Vanislavov Marinov

Poster
in
Workshop: Exploration in AI Today (EXAIT)