Poster
Sleeping Reinforcement Learning
Simone Drago · Marco Mussi · Alberto Maria Metelli
West Exhibition Hall B2-B3 #W-707
In Reinforcement Learning , an agent learns which actions to perform, i.e., a behavior, in order to solve a sequential decision making problem. The standard assumption is that, at each decision step, the agent selects an action from a fixed and immutable action space. However, in real-world applications, not all actions may be available at every decision stage, with their availability depending on the environment state, on domain-specific constraints, or on other (potentially stochastic) exogenous factors. To address this scenarios, we propose the Sleeping Reinforcement Learning paradigm, extending the standard episodic tabular Reinforcement Learning setting with an action availability model. We study two scenarios, namely action availability revealed for the entire episode and availability revealed for a single stage at a time, and two action availability models, namely independent and Markovian. Using the regret (i.e., how much is lost w.r.t. always making optimal decisions) as a performance index, we study the lower bound, i.e., the theoretical limit, of the regret and propose algorithms based on the state-of-the-art for standard RL that match such lower bounds up to logarithmic terms.