Invited Talk
in
Workshop: Workshop on Computer Use Agents
Victor Zhong - Building and Evaluating Generalist Agents
Talk: Building and Evaluating Generalist Agents Abstract: How can we build generalist language agents that assist us in the digital and physical world? First, we will discuss OSWorld, a new interactive, executable testbed for generalist agents that follow natural language instructions to perform long-horizon real-world tasks in virtual machines in real-time. Second, we will examine recent and ongoing efforts to train generalist agents, including learning from both human and automatic language feedback. Third, we will introduce AgentArena, a newly launched dynamic evaluation platform for OS agents that challenges state-of-the-art foundation model agents. Finally, we will explore ongoing and future directions in building and evaluating generalist agents.