Poster
OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance
Yongqiang Yao · Jingru Tan · Feizhao Zhang · Jiahao Hu · Yazhe Niu · JinXin · Bo Li · Pengfei Liu · Ruihao Gong · Dahua Lin · Ningyi Xu
East Exhibition Hall A-B #E-2906
Vision-language models, which understand both images and text, are becoming more powerful—but training them is slow and inefficient on large computer clusters. We found that this happens because the image and text parts of the model are very different, leading to an uneven workload across devices.To fix this, we created OmniBal, a new training method that balances the work more fairly. It does this in three ways: by grouping training data more evenly, splitting the model into better-balanced parts, and managing memory more efficiently during training.These improvements work together to make training faster and more stable. In our tests, OmniBal sped up training by about 1.8× compared to current methods. It also works well on different models and datasets.This research matters because it helps developers train large, multi-modal models more efficiently—saving time, energy, and computing resources.