Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: 2nd Generative AI for Biology Workshop

MINT: Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications

Da Wu · Zhanliang Wang · Quan Nguyen · Zhuoran Xu · Kai Wang

Keywords: [ Preference Optimization ] [ Large Language Models ] [ Multimodal Models ] [ Tissue Type Classification ] [ Rare Disease Prediction ]


Abstract:

The scarcity of high-quality multimodal biomedical data limits the effective fine-tuning of Large Language Models (LLMs) for specialized tasks. We introduce MINT (Multimodal Integrated kNowledge Transfer), a framework that aligns unimodal decoder-only models with domain-specific patterns from multimodal biomedical data through preference optimization, primarily implemented using the Odds Ratio Preference Optimization (ORPO) framework. MINT leverages upstream multimodal machine learning models to transfer domain expertise to downstream text-only or image-only LLMs, as demonstrated in two applications: (1) Rare genetic disease prediction from texts; (2) Tissue type classification using cell nucleus images. In both cases, MINT-based models outperform those enhanced with alternative approaches such as Supervised Fine-tuning and Retrieval-augmented Generation, even surpassing much larger foundation models in some scenarios. Our study highlights how MINT effectively grafts the classification strengths of encoder-only models into large decoder only models, enhancing reasoning abilities, and reducing hallucination in biomedical applications.

Chat is not available.