Poster
Convergence Analysis of Policy Gradient Methods with Dynamic Stochasticity
Alessandro Montenegro · Marco Mussi · Matteo Papini · Alberto Maria Metelli
West Exhibition Hall B2-B3 #W-808
Reinforcement Learning (RL) is a subfield of machine learning in which agents learn through interaction with an environment to determine the optimal behavior in sequential decision-making problems. Among the various families of RL methods, policy gradient (PG) approaches have demonstrated notable success in tackling continuous control tasks. These methods directly learn the parameters of stochastic (hyper)policies by exploring either at the action level or the parameter level, depending on a certain level of exploration.While theoretical convergence guarantees for PG methods typically assume a fixed level of exploration, practitioners often adjust it dynamically during training. In this work, we bridge this gap between theory and practice by providing convergence guarantees for PG methods under a dynamically changing level of exploration, thus offering a theoretical foundation for a common empirical practice.