Poster
Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction
Yiting He · Zhishuai Liu · Weixin Wang · Pan Xu
West Exhibition Hall B2-B3 #W-715
When training an agent to make decisions by interacting with a simulated environment, it is crucial that the agent continues to perform well even if the real environment differs slightly. But how can we determine whether such robust learning is possible, and under what conditions?To address this question, we introduce a simple metric that compares how easily the agent can reach certain states in the training environment versus in the perturbed environment. This measure captures the difficulty of using exploration in the nominal environment to gather enough information for estimating the perturbed environment. When this value remains bounded, we design a learning algorithm and prove that robust learning is achievable.Our findings offer a quantitative framework to assess the impact of environmental changes on learning performance and help guide the development of algorithms that remain effective under uncertainty. We also provide sample complexity estimates for learning such a robust policy.