ICML Poster Polynomial-Time Approximability of Constrained Reinforcement Learning

Poster

Polynomial-Time Approximability of Constrained Reinforcement Learning

Jeremy McMahan

West Exhibition Hall B2-B3 #W-1016

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.

Lay Summary:

In many sequential decision-making settings, agents must obey multiple constraints of different forms. For example, an autonomous vehicle may be required to make consistent, deterministic, decisions that obeys 1) an anytime constraint on fuel consumption to ensure the destination is reached, 2) a expectation constraint on the CO2 consumption to minimize impact on the environment, and 3) a chance constraint that manages the risk of wear and tear on the vehicle throughout the route. We capture such problems through a general cost-criterion Markov Decision Process setting. Our main contribution is a general algorithm for these problems with strong theoretical guarantees. The existence of our algorithm provides the first ever provable approximation guarantees for many popular settings, answering several open questions, some of which have been open for 25 years.

Chat is not available.