ICML Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Poster
in
Workshop: Workshop on Technical AI Governance

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Chris Schnabl · Daniel Hugenroth · Bill Marino · Alastair Beresford

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Benchmarks are important measures to evaluate safety and compliance of AI models at scale. However, they typically do not offer verifiable results and lack confidentiality for the model IP and benchmark dataset, which creates a gap in AI Governance. We propose Attestable Audits, a new approach that runs inside Trusted Executions Environments (TEEs) and enables users to verify that they are interacting with a compliant AI model. Our work protects sensitive data even if model provider and auditor do not trust each other. This solves verification challenges proposed in recent AI governance frameworks. We build a prototype to demonstrate the feasibility of our approach for typical audit benchmarks against Llama-3.1.

Chat is not available.

Poster in Workshop: Workshop on Technical AI Governance

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Chris Schnabl · Daniel Hugenroth · Bill Marino · Alastair Beresford

Poster
in
Workshop: Workshop on Technical AI Governance