ICML Poster CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models

Poster

CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models

Guangzhi Sun · Xiao Zhan · Shutong Feng · Phil Woodland · Jose Such

East Exhibition Hall A-B #E-704

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Aligning large language models (LLMs) with human values is essential for their safe deployment and widespread adoption. Current LLM safety benchmarks often focus solely on the refusal of individual problematic queries, which overlooks the importance of the context where the query occurs and may cause undesired refusal of queries under safe contexts that diminish user experience. Addressing this gap, we introduce CASE-Bench, a Context-Aware SafEty Benchmark that integrates context into safety assessments of LLMs. CASE-Bench assigns distinct, formally described contexts to categorized queries based on Contextual Integrity theory. Additionally, in contrast to previous studies which mainly rely on majority voting from just a few annotators, we recruited a sufficient number of annotators necessary to ensure the detection of statistically significant differences among the experimental conditions based on power analysis. Our extensive analysis using CASE-Bench on various open-source and commercial LLMs reveals a substantial and significant influence of context on human judgments ($p<$0.0001 from a z-test), underscoring the necessity of context in safety evaluations. We also identify notable mismatches between human judgments and LLM responses, particularly in commercial models within safe contexts. Code and data used in the paper are available at https://anonymous.4open.science/r/CASEBench-D5DB.

Lay Summary:

As powerful AI language models (like ChatGPT) become more common, it's crucial to make sure they act in ways that align with human values and safety. Right now, many safety tests often ignore the context in which a question is asked — which can lead to the model wrongly refusing to answer even when it's safe and appropriate to do so.To fix this, we created a new test called CASE-Bench, which adds context to the safety checks. This means it doesn’t just look at what the question is, but also the situation around it — like who is asking and why. We also used sufficiently large number of human annotators to make sure the results were accurate and meaningful.Our findings showed that context really matters — people judge the safety of a response very differently depending on the situation. We also found that some commercial AI models (like those from big tech companies) don’t always match what people expect, especially when it's actually safe to give an answer.In short, this study shows that AI safety testing needs to take context into account, or else we risk making these tools less helpful — and possibly less trustworthy.

Chat is not available.