Poster
Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?
Yujin Han · Andi Han · Wei Huang · Chaochao Lu · Difan Zou
West Exhibition Hall B2-B3 #W-819
Our study reveals a new failure of diffusion models (DMs)—a previously underexplored and unsolved challenge!In our work, we ask Can image generative models like diffusion models truly capture the underlying rules embedded in image data? For instance, can DMs accurately understand fine-grained rules in the image, such as how the height of the sun influences the length of a shadow? This question is crucial for using DMs to faithfully reconstruct the physical world.By extensive experiments on both synthetic tasks and real-world datasets, our findings provide a clear answer: DMs can learn coarse rules (e.g., the sun and shadow should appear on opposite sides of an object), but they struggle to capture fine-grained rules (e.g., the precise geometric constrains between the sun’s height and the shadow’s length). And our theoretical analysis suggests that the root cause lies in a mismatch between the optimization objective of DMs and the underlying rules embedded in the data, which leads to persistent constant errors in rule learning. What’s worse, addressing this issue using conventional techniques—such as introducing guidance during sampling—has shown limited improvement. One of the key bottlenecks is that fine-grained rules typically manifest as weak signals within the data, making them difficult to capture and leverage for effective guidance.