Poster
Actor-Critics Can Achieve Optimal Sample Efficiency
Kevin Tan · Wei Fan · Yuting Wei
West Exhibition Hall B2-B3 #W-917
Reinforcement learning is a type of machine learning where an agent learns by trying different actions and getting feedback, much like how people learn through trial and error. One popular paradigm within it combines two parts: one that decides what to do (the actor) and one that evaluates how good those decisions are (the critic). However, current actor-critic methods can be slow and inefficient, especially when the agent needs to explore and try new things informed by inexact critic estimates, while the critic has to continually evaluate an ever-changing actor. It has been an open question as to whether one can devise an actor-critic algorithm that converges at an optimal rate when the critic can be arbitrarily parameterized -- with deep neural nets, linear regressions, random forests, or some other class of machine learning algorithms. We provide a method that does so by exploring strategically, using past experience more effectively, and occasionally restarting the decision-making process to avoid getting stuck. We also show that using previously collected data, rather than only learning from scratch, can speed things up even more. Our approach not only improves learning efficiency but also answers a long-standing question in the field about whether these systems can be sample efficient or not.