Workshop on Computer Use Agents

Workshop

Workshop on Computer Use Agents

David Barber · Doina Precup · Andrei Nica · Roberta Raileanu · Harshil Shah · Boyuan Zheng · Shuyan Zhou

West Meeting Room 211-214

Sat 19 Jul, 8:30 a.m. PDT

[ Abstract ] Workshop Website

[ OpenReview]

Computer use models are attracting significant interest in academia and industry due to their ability to perform complex tasks in non-deterministic environments. However, they are far from being ready for unattended deployment, as evidenced by their performance on the OSWorld benchmark where they achieve only a small fraction of human performance. The rapid evolution of these agents raises important questions regarding their accuracy, safe deployment, and potential impact on the future of work. The topics we would like to cover are:- Learning Algorithms --- which new architectures and learning techniques (e.g. memory mechanisms for extended tasks, exploration strategies) can enhance the intrinsic ability of computer use agents to acquire, represent, and refine knowledge?- Orchestration --- what novel frameworks or control methods (e.g. dynamic task planning, modular coordination, multi-agent systems) can efficiently manage and integrate multiple learning components to optimize overall agent performance?- Interfaces --- how should agents perceive and act within their environments (e.g., via APIs or UI interactions), and should we design unified systems or specialized agents for different modalities?- Guardrails, safety \& societal implications --- what guardrails do we need in order to make computer use models safe for deployment ``in the wild'' while ensuring that they have a positive impact on society?- Benchmarking \& tools --- how can we develop robust environments and evaluation metrics that capture the diversity of real-world settings? Do we need new tools or frameworks to make research on computer use more efficient and accessible?- Human-agent interaction --- how will future interactions evolve? Should we optimize agents for full autonomy or design them as personalized, human-centric collaborators?- Broader applications --- what are some practical applications for computer use agents across domains such as healthcare, scientific research, software engineering and testing etc.?- Capability horizon --- what breakthroughs or engineering challenges are required to enable agents orders of magnitude more capable than today, and what implications would such advances have?

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 8:30 a.m. - 8:40 a.m.	Opening remarks ( Intro ) >	🔗
Sat 8:40 a.m. - 9:05 a.m.	Nouha Dziri - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety ( Invited Talk ) >	🔗
Sat 9:05 a.m. - 9:30 a.m.	Zhiyong Wu - Large Scale Reinforcement Leanring for General Computer Agents ( Invited Talk ) >	🔗
Sat 9:30 a.m. - 9:40 a.m.	Spotlight "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows" - Rob Xiangru Tang ( Accepted Paper Spotlight ) >	🔗
Sat 9:40 a.m. - 10:30 a.m.	Posters & Coffee break ( Poster Session ) >	🔗
Sat 10:30 a.m. - 10:55 a.m.	Qingyun Wu ( Invited Talk ) >	🔗
Sat 10:55 a.m. - 11:20 a.m.	Yu Su - The Intelligence Feedback Loop: From Biological Inspiration to Augmented Cognition ( Invited Talk ) >	🔗
Sat 11:20 a.m. - 11:45 a.m.	Ruslan Salakhutdinov - Scaling up Multimodal AI Agents ( Invited Talk ) >	🔗
Sat 11:45 a.m. - 11:55 a.m.	Spotlight "Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search" - Sam Holt ( Accepted Paper Spotlight ) >	🔗
Sat 1:00 p.m. - 1:15 p.m.	Sercan Arık ( Invited Talk ) >	🔗
Sat 1:15 p.m. - 2:15 p.m.	Panel discussion - Ruslan Salakhutdinov, Alexandre Drouin, Qingyun Wu, Victor Zhong, Nouha Dziri, Yu Su ( Panel ) >	🔗
Sat 2:15 p.m. - 2:40 p.m.	Alexandre Drouin - Computer-use agents in the enterprise: progress and key challenges ( Invited Talk ) >	🔗
Sat 2:40 p.m. - 2:50 p.m.	Spotlight "How to Train Your LLM Web Agent: A Statistical Diagnosis" - Massimo Caccia ( Accepted Paper Spotlight ) >	🔗
Sat 2:50 p.m. - 3:00 p.m.	Spotlight "OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents" - Maksym Andriushchenko ( Accepted Paper Spotlight ) >	🔗
Sat 3:00 p.m. - 3:50 p.m.	Posters & Coffee break ( Poster Session ) >	🔗
Sat 3:50 p.m. - 4:05 p.m.	Graham Neubig ( Invited Talk ) >	🔗
Sat 4:05 p.m. - 4:30 p.m.	Victor Zhong - Building and Evaluating Generalist Agents ( Invited Talk ) >	🔗
Sat 4:30 p.m. - 4:55 p.m.	Alane Suhr - Training Language-Conditioned Agents with Reinforcement Learning ( Invited Talk ) >	🔗
Sat 4:55 p.m. - 5:05 p.m.	Closing Remarks ( Closing Remarks ) >	🔗
Sat 5:05 p.m. - 6:00 p.m.	Poster & Social ( Poster Session ) >	🔗
-	Weathering the CUA Storm: Mapping Security Threats in the Rapid Rise of Computer Use Agents ( Poster ) > link Link	Dan Jones · Martin Pouliot · Giorgio Severi · Joris de Gruyter · Gary Lopez Munoz · Santiago Zanella-Beguelin · Justin Song · Amanda Minnich · Pamela Cortez 🔗
-	Universal Retrieval for Multimodal Trajectory Modeling ( Poster ) > link Link	Xuan Zhang · Ziyan Jiang · Rui Meng · Yifei Leng · Zhenbang Xiao · Zhiruo Wang · Yanni Shawn · Yanni Shawn 🔗
-	UI-Evol: Automatic Knowledge Evolving for Computer Use Agents ( Poster ) > link Link	Ziyun Zhang · Xinyi Liu · Xiaoyi Zhang · Jun Wang · Gang Chen · Yan Lu 🔗
-	BIMgent: Towards Autonomous Building Modeling via Computer-use Agents ( Poster ) > link Link	Zihan Deng · Changyu Du · Stavros Nousias · André Borrmann 🔗
-	OSWorld-Gold: Benchmarking the Efficiency of Computer-Use Agents ( Poster ) > link Link	Reyna Abhyankar · Qi Qi · Yiying Zhang 🔗
-	InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection ( Poster ) > link Link	Yuhang Liu · Pengxiang Li · Zishu Wei · Congkai Xie · Xueyu Hu · Xinchen Xu · Shengyu Zhang · Xiaotian Han · Hongxia Yang · Fei Wu 🔗
-	EARL: Early Intent Recognition in GUI Tasks Using Theory of Mind ( Poster ) > link Link	Shraddha Vijay Pawar · Balavarun Pedapudi · Pramod Kaushik · Sarath Sivaprasad · Mario Fritz · Shirish Karande 🔗
-	OS-MAP: How Far Can Computer Use Agents Go in Breadth and Depth? ( Poster ) > link Link	15 presenters Xuetian Chen · Yinghao Chen · Xinfeng Yuan · ZhuoPeng · Lu Chen · Yuekeng Li · Zhoujia Zhang · Yingqian Huang · Leyan Huang · Jiaqing Liang · Tianbao Xie · Zhiyong Wu · Qiushi Sun · Biqing Qi · Bowen Zhou 🔗
-	AgentSearchBench: Evaluating Agentic Search with Agent-as-a-Judge ( Poster ) > link Link	26 presenters Boyu Gou · Zanming Huang · Yuting Ning · Yu Gu · Michael Lin · Botao Yu · Andrei Kopanev · Weijian Qi · Yiheng Shu · Jiaman Wu · Chan Hee Song · Bernal Jimenez Gutierrez · Yifei Li · Zeyi Liao · Hanane Nour Moussa · TIANSHU ZHANG · Jian Xie · Tianci Xue · Shijie Chen · Boyuan Zheng · Kai Zhang · Zhaowei Cai · Viktor Rozgic · Morteza Ziyadi · Huan Sun · Yu Su 🔗
-	WebGames: Challenging General-Purpose Web-Browsing AI Agents ( Poster ) > link Link	George Thomas · Filippos Christianos · Alexander Chan · Rohit Midha · Jikun Kang · Wenqi Wu · Fraser Greenlee · Andrew Toulis · Marvin Purtorab 🔗
-	OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents ( Oral ) > link Link	Thomas Kuntz · Agatha Duzan · Hao Zhao · Francesco Croce · Zico Kolter · Nicolas Flammarion · Maksym Andriushchenko 🔗
-	WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks ( Poster ) > link Link	Ivan Evtimov · Arman Zharmagambetov · Aaron Grattafiori · Chuan Guo · Kamalika Chaudhuri 🔗
-	EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments ( Poster ) > link Link	Sara Fish · Julia Shephard · Minkai Li · Ran Shorrer · Yannai A. Gonczarowski 🔗
-	Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment ( Poster ) > link Link	Siliang Zeng · Quan Wei · William Brown · Oana Frunza · Yuriy Nevmyvaka · Yang Zhao · Mingyi Hong 🔗
-	ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents ( Poster ) > link Link	Ido Levy · Ben wiesel · Sami Marreed · Alon Oved · Avi Yaeli · Segev Shlomov 🔗
-	Dynamic Risk Assessments for Offensive Cybersecurity Agents ( Poster ) > link Link	Boyi Wei · Benedikt Stroebl · Jiacen Xu · Joie Zhang · Zhou Li · Peter Henderson 🔗
-	DoomArena: A framework for Testing AI Agents Against Evolving Security Threats ( Poster ) > link Link	11 presenters Léo Boisvert · Abhay Puri · Gabriel Huang · Mihir Bansal · Chandra Kiran Evuru · Avinandan Bose · Maryam Fazel · Quentin Cappart · Alexandre Lacoste · Alexandre Drouin · Krishnamurthy Dvijotham 🔗
-	API Agents vs. GUI Agents: Divergence and Convergence ( Poster ) > link Link	Chaoyun Zhang · Shilin He · Liqun Li · Si Qin · Yu Kang · Qingwei Lin · Saravanakumar Rajmohan · Dongmei Zhang 🔗
-	Semantic Context for Tool Orchestration ( Poster ) > link Link	Robert Müller 🔗
-	Reimagining ABM with LLM Agents via Shachi ( Poster ) > link Link	So Kuroki · Yingtao Tian · Kou Misaki · Takashi Ikegami · Takuya Akiba · Yujin Tang 🔗
-	AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents ( Poster ) > link Link	Arman Zharmagambetov · Chuan Guo · Ivan Evtimov · Maya Pavlova · Russ Salakhutdinov · Kamalika Chaudhuri 🔗
-	Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning ( Poster ) > link Link	12 presenters Léo Boisvert · Abhay Puri · Chandra Kiran Evuru · Joshua Kazdan · Avinandan Bose · Quentin Cappart · Maryam Fazel · Sai Rajeswar Mudumba · Jason Stanley · Nicolas Chapados · Alexandre Drouin · Krishnamurthy Dvijotham 🔗
-	WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning ( Poster ) > link Link	12 presenters Zhepei Wei · Wenlin Yao · Yao Liu · Weizhi Zhang · Qin Lu · Liang Qiu · Changlong Yu · Puyang Xu · Chao Zhang · Bing Yin · Hyokun Yun · Lihong Li 🔗
-	GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning ( Poster ) > link Link	13 presenters Zhen Xiang · Linzhi Zheng · Yanjie Li · Junyuan Hong · Qinbin Li · Han Xie · Jiawei Zhang · Zidi Xiong · Chulin Xie · Nathaniel Bastian · Carl Yang · Dawn Song · Bo Li 🔗
-	Replacing thinking with tool usage enables reasoning in small language models ( Poster ) > link Link	Corrado Rainone · Tim Bakker · Roland Memisevic 🔗
-	Toward Autonomous UI Exploration: The UIExplorer Benchmark ( Poster ) > link Link	Andrei Nica · Akshaya Shanbhogue · Harshil Shah · Aleix Cambray · Tudor Berariu · Lucas Maystre · David Barber 🔗
-	ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows ( Oral ) > link Link	21 presenters Qiushi Sun · Zhoumianze Liu · Chang Ma · Zichen Ding · Fangzhi Xu · Zhangyue Yin · Haiteng Zhao · Zhenyu Wu · Kanzhi Cheng · Zhaoyang Liu · Jianing Wang · Qintong Li · Robert Tang · Tianbao Xie · Xiachong Feng · Xiang Li · Ben Kao · Wenhai Wang · Biqing Qi · Lingpeng Kong · Zhiyong Wu 🔗
-	WebQuest: A Benchmark for Multimodal QA on Web Page Sequences ( Poster ) > link Link	Maria Wang · Srinivas Sunkara · Jason Lin · Gilles Baechler · Fedir Zubach · Lei Shu · YUN ZHU · Jindong Chen 🔗
-	VerificAgent: Integrating Expert Knowledge and Fact-Checked Memory for Robust Domain-Specific Task Planning ( Poster ) > link Link	Thong Nguyen · Shubhang Desai · Yash Jain · Tanvir Aumi · Vishal Chowdhary 🔗
-	How to Train Your LLM Web Agent: A Statistical Diagnosis ( Oral ) > link Link	16 presenters Dheeraj Vattikonda · Santhoshi Ravichandran · Emiliano Penaloza · Hadi Nekoei · Megh Thakkar · Thibault de Chezelles · Nicolas Gontier · Miguel Muñoz-Mármol · Sahar Omidi Shayegan · Stefania Raimondo · Xue Liu · Alexandre Drouin · Laurent Charlin · Alex Piche · Alexandre Lacoste · Massimo Caccia 🔗
-	Coding Agents with Multimodal Browsing are Generalist Problem Solvers ( Poster ) > link Link	Aditya Bharat Soni · Boxuan Li · Xingyao Wang · Valerie Chen · Graham Neubig 🔗
-	Context manipulation attacks : Web agents are susceptible to corrupted memory ( Poster ) > link Link	Atharv Singh Patlan · Ashwin Hebbar · Pramod Viswanath · Prateek Mittal 🔗
-	Improving LLM Agent Planning for Computer Use via In-Context Learning with Atomic Fact Augmentation and Lookahead Search ( Oral ) > link Link	Samuel Holt · Max Ruiz Luyten · Thomas Pouplin · Mihaela van der Schaar 🔗