Poster
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
Yongxian Wei · Anke Tang · Li Shen · Zixuan Hu · Chun Yuan · Xiaochun Cao
East Exhibition Hall A-B #E-1304
Merging multiple models presents a promising approach to multi-task learning. We posit that the fundamental objective of model merging is for the merged model to perform as closely as possible to the task-specific models on their respective tasks. Building on this insight, we formulate model merging as a constrained optimization problem—minimizing the performance gap between the merged model and individual models while preserving shared knowledge—and solve it using adaptive projective gradient descent. Our method is entirely data-free. Experiments demonstrate that this plug-and-play approach consistently achieves state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.