Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: LatinX in AI

Bridging the Gap in Clinical AI: Zero-Shot Multimodal ECG Analysis with Textual Explanations

Luiz Facury de Souza · Jose Fernandes · Pedro Dutenhefner · Turi Rezende · Gisele Pappa · Gabriela Paixão · Antonio Ribeiro · Wagner Jr.


Abstract:

Clinical trust in classification models remains a critical barrier to deploying automated ECG analysis, despite the transformative potential of deep learning. Although recent models have achieved significant progress in classifying arrhythmias, their reliance on opaque `black-box" reasoning limits practical utility: clinicians often dismiss classic explainability methods such as attention maps as unconvincing evidence for diagnoses. To address this, we present a multimodal system that generates human-understandable explanations by aligning the ECG features with textual diagnostic criteria from expert annotations. For example, the left bundle Brench Block (LBBB) abnormality is classified not just as a label but through explicit links to descriptors such aswidened QRS complex (>120 ms)'. Trained on paired ECG-text data, our approach performs zero-shot classification, matching supervised performance using a Vision Trasnformer (ViT) without task-specific fine-tuning. By extracting sub-features directly from clinical narratives, such as rhythm irregularities or morphological anomalies, the model grounds its predictions in domain-specific knowledge, mirroring clinician reasoning. This fusion of interpretable feature mapping and robust performance advances a paradigm where AI-driven diagnostics complement, rather than conflict with, the need for transparency in medical decision-making.

Chat is not available.