Poster
Improving Compositional Generation with Diffusion Models Using Lift Scores
Chenning Yu · Sicun Gao
West Exhibition Hall B2-B3 #W-207
We introduce a novel resampling criterion using lift scores, for improving compositional generation in diffusion models. By leveraging the lift scores, we evaluate whether generated samples align with each single condition and then compose the results to determine whether the composed prompt is satisfied. Our key insight is that lift scores can be efficiently approximated using only the original diffusion model, requiring no additional training or external modules. We develop an optimized variant that achieves relatively lower computational overhead during inference while maintaining effectiveness. Through extensive experiments, we demonstrate that lift scores significantly improved the condition alignment for compositional generation across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis. Our code is available at github.com/rainorangelemon/complift.
Modern AI image generation tools can create pictures based on written prompts like “an elephant with glasses.” But when these prompts include multiple conditions — especially when combining them — the results often miss important details.We developed a method to better check whether each part of a prompt is actually being followed. Think of it like a checklist that evaluates if the final image matches each condition before saying the whole prompt is satisfied. Our method uses a simple trick called “lift scores” that can work directly with existing models — no retraining or extra tools needed.We also made an efficient version that works faster without losing accuracy. In experiments ranging from simple shapes to complex scenes and real images, our method showed clear improvements in following the prompt correctly.This could help make AI-generated content more accurate, controllable, and reliable — especially when combining multiple ideas into one request.