Poster
Reinforcement Learning with Segment Feedback
Yihan Du · Anna Winnicki · Gal Dalal · Shie Mannor · R Srikant
West Exhibition Hall B2-B3 #W-601
Standard reinforcement learning (RL) assumes that we can observe a reward for each state-action pair. However, in real-world applications such as autonomous driving, it is often difficult and costly to collect a reward for each state-action pair. While there have been prior works which consider collecting a reward for a whole trajectory, it is unclear if such trajectory feedback is inefficient when trajectories are long.In this work, we consider a model called RL with segment feedback, where each trajectory is divided into multiple segments, and we collect a reward for each segment. This model offers a general paradigm filling the gap between per-state-action feedback and trajectory feedback. Under this model, an interesting and important question is how segments impact learning performance?Our theoretical and experimental results show that: under binary feedback, where we observe a binary outcome generated according to the reward function, segments help accelerate learning; surprisingly, under sum feedback, where we observe the sum of random rewards, segments do not help expedite learning much.