Invited Talk
in
Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio
On The Role of Speech in Speech Language Models (Yossi Adi)
Spoken language models (SLMs) have emerged as an interesting and promising research direction. Yet, the term ‘spoken language model’ is interpreted differently across research groups. In this talk, I will present my perspective on the current landscape of SLMs, with particular emphasis on their scaling behavior and evaluation methods. Building on this foundation, we will discuss current SLM evaluations and what I believe should be the role of speech and audio in these models. I will conclude by introducing a novel data generation pipeline, alongside empirical evidence demonstrating that fine-tuning SLMs on this data leads to notable improvements in prosodic understanding, without compromising performance on other downstream tasks.
Yossi Adi is an Assistant Professor at the school of computer science and engineering at the Hebrew University of Jerusalem, and a Research Scientist at the FAIR team (Meta). Yossi completed his Ph.D. in computer science at Bar-Ilan University and is a recipient of the IAAI Best Doctoral Dissertation Award and the Alon scholarship. Yossi's research interests are in speech and language processing using machine learning and deep learning models. Yossi's research spans core machine learning and deep learning algorithms, their applications to spoken language processing, and the impact of the technology on social systems.