Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: New In ML

Instant Condition Enhancer for Diffusion

Weiyi Hong · Zhang Baosheng


Abstract:

Text-to-image generation has always been a hottopic in computer vision. The most effectiveframe work of the task is the Diffusion Model.Our work aimed to investigate the condition en-hancement problem in the diffusion model. Wetrained an Instant Condition Enhancer based onthe diffusion transformer model, utilizing a pre-trained vision encoder to optimize the conditioninformation across the denoising time steps. Weanalyzed the relationship between improved im-age generation quality and the additional compu-tational overhead measured in Gflops. We dis-covered that incorporating our highly compatibleInstant Condition Enhancer module into a diffu-sion transformer model as a fundamental diffu-sion model led to an average 29.88% reductionin FID-50k↓ metrics on the class condition Im-ageNet 256 × 256 benchmark compared to thediffusion transformer model, while resulting inonly an additional 0.6% Gflops.

Chat is not available.