Workshop
AI Heard That! ICML 2025 Workshop on Machine Learning for Audio
Alice Baird · Sander Dieleman · Chris Donahue · Brian Kulis · David Liu · Rachel Manzelli · Shrikanth Narayanan
West Meeting Room 118-120
Sat 19 Jul, 9 a.m. PDT
The Machine Learning for Audio workshop at ICML 2025 will cover a broad range of tasks and challenges involving audio data. These include, but are not limited to: methods of speech modeling, environmental sound generation or other forms of ambient sound, novel generative models, music generation in the form of raw audio, text-to-speech methods, denoising of speech and music, data augmentation, classification of acoustic events, transcription, source separation, and multimodal problems.
Chat is not available.
Timezone: America/Los_Angeles
Schedule
Sat 9:00 a.m. - 9:10 a.m.
|
Opening Remarks
(
Intro
)
>
|
🔗 |
Sat 9:10 a.m. - 9:40 a.m.
|
On The Role of Speech in Speech Language Models (Yossi Adi)
(
Invited Talk
)
>
|
🔗 |
Sat 9:40 a.m. - 10:10 a.m.
|
Recomposer: Event-roll-guided generative audio editing (Dan Ellis)
(
Invited Talk
)
>
|
🔗 |
Sat 10:10 a.m. - 10:30 a.m.
|
BLAB: Brutally Long Audio Bench
(
Oral
)
>
|
16 presentersOrevaoghene M Ahia · Martijn Bartelds · KABIR AHUJA · Hila Gonen · Valentin Hofmann · Siddhant Arora · Stella Li · Vishal Puttagunta · Mofetoluwa Adeyemi · Charishma Buchireddy · Ben Walls · Noah Bennett · Shinji Watanabe · Noah Smith · Yulia Tsvetkov · Sachin Kumar |
Sat 10:30 a.m. - 10:50 a.m.
|
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation
(
Oral
)
>
|
Yoonjin Chung · Pilsun Eu · Junwon Lee · Keunwoo Choi · Juhan Nam · Ben Sangbae Chon 🔗 |
Sat 10:50 a.m. - 11:20 a.m.
|
Coffee Break
|
🔗 |
Sat 11:20 a.m. - 11:50 a.m.
|
Invited Talk 3 (Albert Gu)
(
Invited Talk
)
>
|
🔗 |
Sat 11:50 a.m. - 12:20 p.m.
|
Foundation models for seismic waveforms: moving toward a unified framework for earthquake science (Laura Laurenti)
(
Invited Talk
)
>
|
🔗 |
Sat 12:20 p.m. - 1:30 p.m.
|
Lunch
|
🔗 |
Sat 1:30 p.m. - 3:00 p.m.
|
Poster & Demo Session
(
Poster Session
)
>
|
🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Design considerations for learned audio representations (James Betker)
(
Invited Talk
)
>
|
🔗 |
Sat 3:30 p.m. - 4:00 p.m.
|
Towards Understanding the Communication of Animals (Pratyusha Sharma)
(
Invited Talk
)
>
|
🔗 |
Sat 4:00 p.m. - 4:20 p.m.
|
MMMG: A Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
(
Oral
)
>
|
12 presentersJihan Yao · Yushi Hu · Yujie Yi · Bin Han · Shangbin Feng · Guang Yang · Bingbing Wen · Ranjay Krishna · Lucy Lu Wang · Yulia Tsvetkov · Noah Smith · Banghua Zhu |
Sat 4:20 p.m. - 4:40 p.m.
|
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
(
Oral
)
>
|
Shehzeen Hussain · Paarth Neekhara · Xuesong Yang · Edresson Casanova · Subhankar Ghosh · Roy Fejgin · Mikyas Desta · Rafael Valle · Jason Li 🔗 |
Sat 4:40 p.m. - 5:00 p.m.
|
Large-Scale Training Data Attribution for Music Generative Models via Unlearning
(
Oral
)
>
|
Woosung Choi · Junghyun (Tony) Koo · Kin Wai Cheuk · Joan Serrà · Marco Martínez-Ramírez · Yukara Ikemiya · Naoki Murata · Yuhta Takida · WeiHsiang Liao · Yuki Mitsufuji 🔗 |
Sat 5:00 p.m. -
|
Wrap-up and Open Conversation
(
Closing Remarks
)
>
|
🔗 |
-
|
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
(
Poster
)
>
|
Neta Glazer · Aviv Navon · yael segal · Aviv Shamsian · Hilit Segev · Asaf Buchnick · Menachem Pirchi · Gil Hetz · Joseph Keshet 🔗 |
-
|
Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond ( Poster ) > link | Jessie Richter-Powell · Antonio Torralba · Jonathan Lorraine 🔗 |
-
|
“I am bad”: Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
(
Poster
)
>
|
Isha Gupta · David Khachaturov · Robert Mullins 🔗 |
-
|
SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
(
Poster
)
>
|
Yueqian Lin · Yuzhe Fu · Jingyang Zhang · Yudong Liu · Jianyi Zhang · Jingwei Sun · Hai Li · Yiran Chen 🔗 |
-
|
Audio-Visual Speech Separation via Bottleneck Iterative Network
(
Poster
)
>
|
Sidong Zhang · Shiv Shankar · Trang Nguyen · Andrea Fanelli · Madalina Fiterau 🔗 |
-
|
ViRN: Variational Inference and Distribution Trilateration for Long-Tailed Continual Representation Learning
(
Poster
)
>
|
Hao Dai · Chong Tang · Jagmohan Chauhan 🔗 |
-
|
Fine-tuning Speech Models on Brain Responses Strengthens Semantic Content
(
Poster
)
>
|
Nishitha Vattikonda · Aditya Vaidya · Richard Antonello · Alexander Huth 🔗 |
-
|
Early Attentive Sparsification Accelerates Neural Speech Transcription
(
Poster
)
>
|
Zifei Xu · Sayeh Sharify · Hesham Mostafa · Tristan Webb · Wanzin Yazar · Xin Wang 🔗 |
-
|
Unsupervised Multi-channel Speech Dereverberation via Diffusion
(
Poster
)
>
|
Yulun Wu · Zhongweiyang Xu · Jianchong Chen · Zhong-Qiu Wang · Romit Roy Choudhury 🔗 |
-
|
The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology
(
Poster
)
>
|
Fagun Patel · Duc Nguyen · Sang Truong · Jody Vaynshtok · Sanmi Koyejo · Nick Haber 🔗 |
-
|
WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
(
Poster
)
>
|
Qihui Yang · Taylor Berg-Kirkpatrick · Julian McAuley · Zachary Novack 🔗 |
-
|
Bridging the Language Gap: Synthetic Voice Diversity via Latent Mixup for Equitable Speech Recognition
(
Poster
)
>
|
Wesley Bian · Xiaofeng Lin · Guang Cheng 🔗 |
-
|
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
(
Poster
)
>
|
Pooneh Mousavi · Yingzhi Wang · Mirco Ravanelli · Cem Subakan 🔗 |
-
|
Broadband Ground Motion Synthesis by Diffusion Model with Minimal Condition
(
Poster
)
>
|
Jaehyuk Lee · Jaeheun Jung · Yeajin Lee · ChangHae Jung · Donghun Lee 🔗 |
-
|
Visual and Aural Explanations for Transformer-Based Deepfake Detection
(
Poster
)
>
|
Georgia Channing · Juil Sock · Phil Torr · Christian Schroeder de Witt 🔗 |
-
|
Evaluation of Neural Surrogates for Physical Modelling Synthesis of Nonlinear Elastic Plates
(
Poster
)
>
|
Carlos De La Vega Martin 🔗 |
-
|
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
(
Poster
)
>
|
Shehzeen Hussain · Paarth Neekhara · Xuesong Yang · Edresson Casanova · Subhankar Ghosh · Roy Fejgin · Mikyas Desta · Rafael Valle · Jason Li 🔗 |
-
|
Large-Scale Training Data Attribution for Music Generative Models via Unlearning
(
Poster
)
>
|
Woosung Choi · Junghyun (Tony) Koo · Kin Wai Cheuk · Joan Serrà · Marco Martínez-Ramírez · Yukara Ikemiya · Naoki Murata · Yuhta Takida · WeiHsiang Liao · Yuki Mitsufuji 🔗 |
-
|
MMMG: A Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
(
Poster
)
>
|
12 presentersJihan Yao · Yushi Hu · Yujie Yi · Bin Han · Shangbin Feng · Guang Yang · Bingbing Wen · Ranjay Krishna · Lucy Lu Wang · Yulia Tsvetkov · Noah Smith · Banghua Zhu |
-
|
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation
(
Poster
)
>
|
Yoonjin Chung · Pilsun Eu · Junwon Lee · Keunwoo Choi · Juhan Nam · Ben Sangbae Chon 🔗 |
-
|
BLAB: Brutally Long Audio Bench
(
Poster
)
>
|
16 presentersOrevaoghene M Ahia · Martijn Bartelds · KABIR AHUJA · Hila Gonen · Valentin Hofmann · Siddhant Arora · Stella Li · Vishal Puttagunta · Mofetoluwa Adeyemi · Charishma Buchireddy · Ben Walls · Noah Bennett · Shinji Watanabe · Noah Smith · Yulia Tsvetkov · Sachin Kumar |