ICML Beyond Pass/Fail: Extracting Behavioral Insights from Large-Scale AI Agent Safety Evaluations

Invited Speaker
in
Workshop: Workshop on Technical AI Governance

Beyond Pass/Fail: Extracting Behavioral Insights from Large-Scale AI Agent Safety Evaluations

Cozmin Ududec

[ Abstract ]

Sat 19 Jul 9:15 a.m. PDT — 9:40 a.m. PDT

Abstract:

Automated LLM-based agent evaluations have become a standard for assessing AI capabilities in both industry and government, but current reporting practices focus on what agents accomplish without resolution on how they accomplish it. In this talk I will discuss how UK AISI mines evaluation transcripts to (i) detect issues in evaluation tasks that could lead to mis-estimating capabilities, and (ii) understand how agent capabilities are evolving. I will survey a selection of AISI's methods, tools, and results, and outline research opportunities for better analysis instruments and their connection to safety and governance.

Chat is not available.

Invited Speaker in Workshop: Workshop on Technical AI Governance

Beyond Pass/Fail: Extracting Behavioral Insights from Large-Scale AI Agent Safety Evaluations

Cozmin Ududec

Invited Speaker
in
Workshop: Workshop on Technical AI Governance