ICML Nouha Dziri - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

Invited Talk
in
Workshop: Workshop on Computer Use Agents

Nouha Dziri - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

[ Abstract ]

Sat 19 Jul 8:40 a.m. PDT — 9:05 a.m. PDT

Abstract:

Talk: OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety OpenAgentSafety, a flexible simulation framework designed to evaluate AI agent safety across eight key risk categories. It features over 350 multi-turn tasks, spanning both benign and adversarial scenarios, with real tool use. Our evaluation of five leading LLMs shows concerning trends: unsafe behavior emerged in over half of safety-critical tasks. These results highlight the urgent need to prioritize safety alongside performance in agent development.

Bio: Nouha Dziri is an AI research scientist at the Allen Institute for AI (Ai2). Her research investigates a wide variety of problems across NLP and AI including building state-of-the-art language models and understanding their limits and inner workings. She also works on AI safety to ensure the responsible deployment of LLMs while enhancing their reasoning capabilities. Prior to Ai2, she worked at Google DeepMind, Microsoft Research and Mila. She earned her PhD from the University of Alberta and the Alberta Machine Intelligence Institute. Her work has been published in top-tier AI venues including NeurIPS, ICML, ICLR, TACL, ACL, NAACL and EMNLP. She was recently awarded the runner-up Best Paper Award at NAACL 2025.

Chat is not available.

Invited Talk in Workshop: Workshop on Computer Use Agents

Nouha Dziri - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

Invited Talk
in
Workshop: Workshop on Computer Use Agents