Skip to yearly menu bar Skip to main content


Oral Sessions

Oral 2A Diffusion Models

West Exhibition Hall C

Moderator: Qing Qu

Tue 15 Jul 3:30 p.m. PDT — 4:30 p.m. PDT
Abstract:
Chat is not available.

Tue 15 July 15:30 - 15:45 PDT

DeFoG: Discrete Flow Matching for Graph Generation

Yiming Qin · Manuel Madeira · Dorina Thanou · Pascal Frossard

Graph generative models are essential across diverse scientific domains by capturing complex distributions over relational data. Among them, graph diffusion models achieve superior performance but face inefficient sampling and limited flexibility due to the tight coupling between training and sampling stages. We introduce DeFoG, a novel graph generative framework that disentangles sampling from training, enabling a broader design space for more effective and efficient model optimization. DeFoG employs a discrete flow-matching formulation that respects the inherent symmetries of graphs. We theoretically ground this disentangled formulation by explicitly relating the training loss to the sampling algorithm and showing that DeFoG faithfully replicates the ground truth graph distribution. Building on these foundations, we thoroughly investigate DeFoG's design space and propose novel sampling methods that significantly enhance performance and reduce the required number of refinement steps. Extensive experiments demonstrate state-of-the-art performance across synthetic, molecular, and digital pathology datasets, covering both unconditional and conditional generation settings. It also outperforms most diffusion-based models with just 5–10\% of their sampling steps.

Tue 15 July 15:45 - 16:00 PDT

MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models

Jeffrey A. Chan-Santiago · praveen tirupattur · Gaurav Kumar Nayak · Gaowen Liu · Mubarak Shah

Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment.Recent advances have leveraged generative models to distill datasets by capturing the underlying data distribution. Unfortunately, existing methods require model fine-tuning with distillation losses to encourage diversity and representativeness. However, these methods do not guarantee sample diversity, limiting their performance.We propose a mode-guided diffusion model leveraging a pre-trained diffusion model without the need to fine-tune with distillation losses. Our approach addresses dataset diversity in three stages: Mode Discovery to identify distinct data modes, Mode Guidance to enhance intra-class diversity, and Stop Guidance to mitigate artifacts in synthetic samples that affect performance.We evaluate our approach on ImageNette, ImageIDC, ImageNet-100, and ImageNet-1K, achieving accuracy improvements of 4.4%, 2.9%, 1.6%, and 1.6%, respectively, over state-of-the-art methods. Our method eliminates the need for fine-tuning diffusion models with distillation losses, significantly reducing computational costs.

Tue 15 July 16:00 - 16:15 PDT

Inductive Moment Matching

Linqi (Alex) Zhou · Stefano Ermon · Jiaming Song

Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Moment Matching Self-Distillation (MMSD), a new class of generative models for one- or few-step sampling with a single-stage training procedure. Unlike distillation, MMSD does not require pre-training initialization and optimization of two networks; and unlike Consistency Models, MMSD guarantees distribution-level convergence and remains stable under various hyperparameters and standard model architectures. MMSD surpasses diffusion models on ImageNet-256x256 with 2.13 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 2.05 on CIFAR-10 for a model trained from scratch.

Tue 15 July 16:15 - 16:30 PDT

Outstanding Paper
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

Jaeyeon Kim · Kulin Shah · Vasilis Kontonis · Sham Kakade · Sitan Chen

In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essentially arbitrary order. In this work we closely examine these two competing effects. On the training front, we theoretically and empirically demonstrate that MDMs indeed train on computationally intractable subproblems compared to their autoregressive counterparts. On the inference front, we show that a suitable strategy for adaptively choosing the token decoding order significantly enhances the capabilities of MDMs, allowing them to sidestep hard subproblems. On logic puzzles like Sudoku, we show that adaptive inference can boost solving accuracy in pretrained MDMs from $<7$\% to $\approx 90$\%, even outperforming ARMs that were explicitly trained via teacher forcing to learn the right order of decoding.