ICML Poster What Limits Bidirectional Model's Generative Capabilities? A Uni-Bi-Directional Mixture-of-Expert Method For Bidirectional Fine-tuning

Poster

What Limits Bidirectional Model's Generative Capabilities? A Uni-Bi-Directional Mixture-of-Expert Method For Bidirectional Fine-tuning

Zuchao Li · Yonghua Hei · Qiwei Li · Lefei Zhang · Ping Wang · hai zhao · qi baoyuan · Liu Guoming

East Exhibition Hall A-B #E-2701

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Large Language Models (LLMs) excel in generation tasks, yet their causal attention mechanisms limit performance in embedding tasks. While bidirectional modeling may enhance embeddings, naively fine-tuning unidirectional models bidirectionally severely degrades generative performance.To investigate this trade-off, we analyze attention weights as dependence indicators and find that bidirectional fine-tuning increases subsequent dependence, impairing unidirectional generation. Through systematic Transformer module evaluations, we discover the FFN layer is least affected by such dependence. Leveraging this discovery, we propose UBMoE-LLM, a novel Uni-Bi-directional Mixture-of-Experts LLM, which integrates the original unidirectional FFN with a bidirectionally fine-tuned FFN via unsupervised contrastive learning. This MoE-based approach enhances embedding performance while preserving robust generation.Extensive experiments across diverse datasets and model scales validate our attention dependence metric and demonstrate UBMoE-LLM’s superior generative quality and reduced hallucination. Code is available at: https://github.com/heiyonghua/ubmoe_llm.

Lay Summary:

Large Language Models (LLMs) excel in generation tasks, yet their causal attention mechanisms limit performance in embedding tasks. While bidirectional modeling may enhance embeddings, naively fine-tuning unidirectional models bidirectionally severely degrades generative performance. To investigate this trade-off, we analyze attention weights as dependence indicators and find that bidirectional fine-tuning increases subsequent dependence, impairing unidirectional generation. Through systematic Transformer module evaluations, we discover the FFN layer is least affected by such dependence. Leveraging this discovery, we propose UBMoE-LLM, a novel Uni-Bi-directional Mixture-of-Experts LLM, which integrates the original unidirectional FFN with a bidirectionally fine-tuned FFN via unsupervised contrastive learning. This MoE-based approach enhances embedding performance while preserving robust generation. Extensive experiments across diverse datasets and model scales validate our attention dependence metric and demonstrate UBMoE-LLM’s superior generative quality and reduced hallucination.

Chat is not available.