Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Workshop on Computer Use Agents

Victor Zhong - Building and Evaluating Generalist Agents

[ ]
Sat 19 Jul 4:05 p.m. PDT — 4:30 p.m. PDT

Abstract:

Talk: Building and Evaluating Generalist Agents Abstract: How can we build generalist language agents that assist us in the digital and physical world? First, we will discuss OSWorld, a new interactive, executable testbed for generalist agents that follow natural language instructions to perform long-horizon real-world tasks in virtual machines in real-time. Second, we will examine recent and ongoing efforts to train generalist agents, including learning from both human and automatic language feedback. Third, we will introduce AgentArena, a newly launched dynamic evaluation platform for OS agents that challenges state-of-the-art foundation model agents. Finally, we will explore ongoing and future directions in building and evaluating generalist agents.

Chat is not available.