Poster
ToMA: Token Merge with Attention for Diffusion Models
Wenbo Lu · Shaoyi Zheng · Yuxuan Xia · Shenji Wan
East Exhibition Hall A-B #E-3112
AI image generators such as Stable Diffusion paint stunning pictures, but they must process thousands of tiny image pieces—called tokens—at every step. Shuffling so many tokens makes creation slow and energy-hungry. Earlier shortcuts tried to merge similar tokens, yet the extra bookkeeping erased most of the speed gains.Our work presents ToMA (Token Merge with Attention), a plug-in that lets the model spot and temporarily group tokens that carry nearly the same information. We choose these groups with a fast, easy-to-compute rule that picks a small, diverse set of “representative” tokens, then use the same GPU-friendly math the model already employs for its internal reasoning. After the heavy thinking is done, ToMA cleanly spreads the results back to every original token, so image quality stays intact.In practice, ToMA cuts the time to create a high-resolution image by roughly one-quarter on today’s hardware while keeping visual scores nearly unchanged . Faster generation means lower energy use, smoother creative workflows, and wider public access to top-tier generative art tools.