Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio

Design considerations for learned audio representations (James Betker)

[ ]
Sat 19 Jul 3 p.m. PDT — 3:30 p.m. PDT

Abstract:

In this presentation, we'll explore design considerations for training autoencoders of various types for learning representations which are well-suited to produce latents which can be used to train large generative models to perceive and generate audio. We'll cover autoencoder architectures, bottleneck types, losses and strategies for both learning great latent spaces and training high fidelity decoders on top of those spaces. We'll also discuss how these design choices interact with the downstream priors you train and, ultimately, the software you build around generative audio models.

James Betker is a research scientist at OpenAI, where he is one of the audio leads for GPT-4o. He is also the lead author of DALL-E 3. Previously, he created TorToiSe, a popular open source text-to-speech system. He also had a long tenure as a senior software engineer at Garmin, where he developed vehicular navigation systems. His research interests include generative models for audio and images.

Chat is not available.