Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
CoDM: A Co-design Framework for Efficient Sparse Diffusion Models
Xiaolong Wu · Xiang Gao · Xiyun Song · Zongfang Lin · Heather Yu · Xianfeng GU
Diffusion models have emerged as a powerfulclass of generative models that excel at capturingcomplex data distributions and producing realis-tic, high-fidelity samples. However, these benefitscome at the cost of expensive computation andmemory requirements due to their iterative denois-ing process. The cost is especially significant forhigh-resolution images, videos, 3D data, or longsequences. In this paper, we propose CoDM, aco-design framework that seamlessly integratesmodel compression techniques with the sparsetensor cores of NVIDIA Hopper H100 GPUs.By leveraging specialized hardware capabilitiesand jointly optimizing the model compressionscheme and storage format, CoDM achieves sig-nificant model speedup while maintaining datageneration quality. Specifically, our approachenhances diffusion models through several keystrategies, namely reducing inference steps andmodel weights through a novel hierarchical prun-ing scheme, improving memory efficiency via anew sparse storage format, and leveraging Ten-sorRT optimization and the specialized cores ofGPU hardware accelerators. This codesign ap-proach addresses the computational challengesof diffusion models, making them more acces-sible for real-world applications. Experimentalresults in a Text-to-Image application demonstratethat our approach surpasses the state-of-the-art,achieving a 7.4-fold speedup on the ImageNet(256×256) dataset and an 11.5-fold speedup onthe CIFAR-10(32×32) dataset, all while preserv-ing the quality of the generated images with asimilar or lower Fr echet Inception Distance (FID)score.