Skip to yearly menu bar Skip to main content


Poster

Splitting with Importance-aware Updating for Heterogeneous Federated Learning with Large Language Models

Yangxu Liao · Wenke Huang · Guancheng Wan · Jian Liang · Bin Yang · Mang Ye

East Exhibition Hall A-B #E-2807
[ ] [ ] [ Project Page ]
Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Federated learning provides an efficient privacy-preserving distributed training framework for large language models, addressing the growing scarcity of publicly available training data while enabling the utilization of private datasets. While integrating large language model fine-tuning with federated learning emerges as a promising research direction, researchers pay limited attention to non-IID instruction-following scenarios. Our key insight is decomposing client updates into consensus and divergence components, enabling the model to maintain core capabilities while adapting to domain-specific knowledge. We propose a novel federated learning framework called FedICU (Splitting with ImportanCe-aware Updating for Heterogeneous Federated Learning with Large Language Models), which introduces an aggregation mechanism that dynamically balances these components based on their contribution to global model performance, while implementing an importance-aware parameter updating strategy to prevent catastrophic forgetting and domain overfitting. Extensive experiments across diverse domains demonstrate that FedICU significantly outperforms existing federated learning approaches in terms of both generalization performance and domain adaptation. Our code is available at https://github.com/liaosunny123/FedICU.

Lay Summary:

Large language models like ChatGPT are powerful tools, but training them often requires high-quality datasets. However, in recent years, these high-quality datasets have been gradually exhausted, and more attention is now being given to private datasets, which have yielded significant results. Federated learning helps by allowing many individuals or organizations to train a shared model without moving their data — each trains locally and only sends updates. However, when different users have very different data or needs, this approach can degrade the shared model’s performance for everyone. Our research addresses this issue. We designed a new method called FedICU, which helps the shared model learn both general knowledge and user-specific needs without sacrificing overall performance. It achieves this by carefully separating what is common across users from what is unique to each one, and then combining these updates in an intelligent way. It also only sends the most important parts of the update, saving both time and computing resources. This means we can train powerful language models using private, diverse data, while minimizing the degradation of model generalization due to heterogeneous datasets and maintaining overall model performance across various downstream tasks.

Chat is not available.