Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)
Towards Generalizable Multimodal ECG Representation Learning with LLM-extracted Clinical Entities
Mingsheng Cai · Jiuming Jiang · Wenhao Huang · che liu · Rossella Arcucci
Electrocardiogram (ECG) recordings are essential for cardiac diagnostics but require large-scale annotation for supervised learning. In this work, we propose a supervised pre-training framework for multimodal ECG representation learning that leverages Large Language Model (LLM) based clinical entity extraction from ECG reports to build structured cardiac queries. By fusing ECG signals with standardized queries rather than categorical labels, our model enables zero-shot classification of unseen conditions. Experiments on six downstream datasets demonstrate competitive zero-shot AUC of 77.20\%, outperforming state-of-the-art self-supervised and multimodal baselines by 4.98\%. Our findings suggest that incorporating structured clinical knowledge via LLM-extracted entities leads to more semantically aligned and generalizable ECG representations than typical contrastive or generative objectives.