ICML Unsupervised Multi-channel Speech Dereverberation via Diffusion

Poster
in
Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio

Unsupervised Multi-channel Speech Dereverberation via Diffusion

Yulun Wu · Zhongweiyang Xu · Jianchong Chen · Zhong-Qiu Wang · Romit Roy Choudhury

[ Abstract ]

Abstract:

We consider the problem of multi-channel single-speaker blind dereverberation, where multi-channel mixtures are used to recover the clean anechoic speech. To solve this problem, we propose USD-DPS, Unsupervised Speech Dereverberation via Diffusion Posterior Sampling. USD-DPS uses an unconditional clean speech diffusion model as a strong prior to solve the problem by posterior sampling. At each diffusion sampling step, we estimate all microphone channels' room impulse responses (RIRs), which are further used to enforce a multi-channel mixture consistency constraint for diffusion guidance. For multi-channel RIR estimation, we estimate reference-channel RIR by optimizing RIR parameters of a sub-band RIR signal model, with the Adam optimizer. We estimate non-reference channels' RIRs analytically using forward convolutive prediction (FCP). We found that this combination provides a good balance between sampling efficiency and RIR prior modeling, which shows superior performance among unsupervised dereverberation approaches.

Chat is not available.

Poster in Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio

Unsupervised Multi-channel Speech Dereverberation via Diffusion

Yulun Wu · Zhongweiyang Xu · Jianchong Chen · Zhong-Qiu Wang · Romit Roy Choudhury

Poster
in
Workshop: AI Heard That! ICML 2025 Workshop on Machine Learning for Audio