Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio

KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation

Yoonjin Chung · Pilsun Eu · Junwon Lee · Keunwoo Choi · Juhan Nam · Ben Sangbae Chon

[ ]
 
presentation: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio
Sat 19 Jul 9 a.m. PDT — 5 p.m. PDT

Abstract:

In this paper, we introduce the Kernel Audio Distance -- a novel, distribution-free, unbiased, and computationally efficient metric based on Maximum Mean Discrepancy (MMD). We propose it as an alternative to the widely adopted Frechet Audio Distance (FAD), which suffers from significant limitations, including reliance on Gaussian assumptions, sensitivity to sample size, and high computational complexity. Through analysis and empirical validation, we demonstrate KAD’s advantages: (1) faster convergence with smaller sample sizes, enabling reliable evaluation with limited data; (2) lower computational cost, with scalable GPU acceleration; and (3) stronger alignment with human perceptual evaluations. We open-source the KAD toolkit, \texttt{kadtk}, providing an official benchmark metric for generative audio models.

Chat is not available.