Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio

Fine-tuning Speech Models on Brain Responses Strengthens Semantic Content

Nishitha Vattikonda · Aditya Vaidya · Richard Antonello · Alexander Huth


Abstract:

Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli. Most performant encoding models linearly map the hidden states of artificial neural networks to brain data, but this linear restriction may limit their effectiveness. In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective, producing a model we name BrainWavLM. We show that fine-tuning on fMRI responses improves average encoding performance with greater stability than without LoRA. Linear probes revealed that the brain data strengthened semantic representations in the speech model without any explicit annotations. Our results demonstrate that brain fine-tuning produces best-in-class speech encoding models, and that brain data may be a promising source of semantic representations in artificial networks.

Chat is not available.