Poster
Polynomial-Time Approximability of Constrained Reinforcement Learning
Jeremy McMahan
West Exhibition Hall B2-B3 #W-1016
In many sequential decision-making settings, agents must obey multiple constraints of different forms. For example, an autonomous vehicle may be required to make consistent, deterministic, decisions that obeys 1) an anytime constraint on fuel consumption to ensure the destination is reached, 2) a expectation constraint on the CO2 consumption to minimize impact on the environment, and 3) a chance constraint that manages the risk of wear and tear on the vehicle throughout the route. We capture such problems through a general cost-criterion Markov Decision Process setting. Our main contribution is a general algorithm for these problems with strong theoretical guarantees. The existence of our algorithm provides the first ever provable approximation guarantees for many popular settings, answering several open questions, some of which have been open for 25 years.