Skip to yearly menu bar Skip to main content


Oral
in
Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio

BLAB: Brutally Long Audio Bench

Orevaoghene M Ahia · Martijn Bartelds · KABIR AHUJA · Hila Gonen · Valentin Hofmann · Siddhant Arora · Stella Li · Vishal Puttagunta · Mofetoluwa Adeyemi · Charishma Buchireddy · Ben Walls · Noah Bennett · Shinji Watanabe · Noah Smith · Yulia Tsvetkov · Sachin Kumar

[ ]
Sat 19 Jul 10:10 a.m. PDT — 10:30 a.m. PDT
 
presentation: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio
Sat 19 Jul 9 a.m. PDT — 5 p.m. PDT

Abstract:

We introduce Brutally Long Audio Bench (BLAB), a challenging long-form audio benchmark that evaluates audio LMs on localization, duration estimation, emotion and counting tasks using audio segments averaging 51 minutes in length. BLAB consists of 833+ hours of diverse, full-length audio clips, each paired with human-annotated, text-based natural language questions and answers. Our audio data were collected from permissively licensed sources and underwent a human-assisted filtering process to ensure task compliance. We evaluate six open-source and proprietary audio LMs on BLAB, and find that all of them, including advanced models such as Gemini 2.0 Pro and GPT-4o, struggle with the tasks in BLAB. In general, we find that audio LMs struggle with long-form speech, they perform poorly on localization, temporal reasoning, counting, and struggle to understand non-phonemic information, relying more on prompts than audio content. BLAB serves as a challenging evaluation framework to develop audio LMs with robust long-form audio understanding capabilities.

Chat is not available.