Poster
Imitation Learning from a Single Temporally Misaligned Video
William Huey · Yuki (Huaxiaoyue) Wang · Anne Wu · Yoav Artzi · Sanjiban Choudhury
West Exhibition Hall B2-B3 #W-709
Teaching robots new tasks usually requires detailed instructions about what actions are good at every moment, which we call “designing a reward function.” This is difficult and time consuming. An easier alternative is to show the robot a video demonstrating how to solve the task. However, these demonstrations often move at a different speed compared to how the robot can move. This makes it difficult or even impossible for the robot to follow them exactly. We find, both in theory and practice, that traditional methods fail when the demonstration is at a different speed.Our solution is to treat the frames of the video as a sequence of subgoals that the robot must achieve at some point in time instead of matching the timing exactly. Specifically, we define the reward function as how well the robot can match ALL the subgoals in the EXACT SAME order as the video. Then, we can teach the robot to try different actions and repeat actions that have high rewards, a technique known as reinforcement learning. Our work focuses on robot video, but it sets the foundation for learning from human videos, which typically have different speeds from robots.