Skip to yearly menu bar Skip to main content


Poster

Gaussian Mixture Flow Matching Models

Hansheng Chen · Kai Zhang · Hao Tan · Zexiang Xu · Fujun Luan · Leonidas Guibas · Gordon Wetzstein · Sai Bi

East Exhibition Hall A-B #E-3011
[ ] [ ]
Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models reparameterize the Gaussian mean as flow velocity. However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. We demonstrate that GMFlow generalizes previous diffusion and flow matching models where a single Gaussian is learned with an $L_2$ denoising loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic denoising distributions and velocity fields for precise few-step sampling. Furthermore, we introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality. Extensive experiments demonstrate that GMFlow consistently outperforms flow matching baselines in generation quality, achieving a Precision of 0.942 with only 6 sampling steps on ImageNet 256$\times$256.

Lay Summary:

Image generation models like diffusion and flow matching have revolutionized digital content creation but still face challenges. They often need many computational steps to generate high-quality images and tend to produce overly vivid, unrealistic colors when guided to follow specific styles or prompts.To address these issues, we developed Gaussian Mixture Flow Matching (GMFlow). Unlike previous methods that predict only one possible outcome for each step of image creation, GMFlow predicts multiple possible outcomes simultaneously, capturing a richer set of variations through something called a Gaussian mixture. We then designed specialized algorithms to efficiently generate high-quality images with fewer steps and less computational effort. Additionally, we introduced a probabilistic approach to better control image styles, reducing unrealistic colors.Our method significantly improves image generation quality, producing clearer, more realistic images faster. This advancement means generating high-quality visuals becomes quicker and more reliable, benefiting applications ranging from digital art to realistic virtual environments.

Chat is not available.