Poster
in
Affinity Workshop: LatinX in AI
Bridging the Gap in Clinical AI: Zero-Shot Multimodal ECG Analysis with Textual Explanations
Luiz Facury de Souza · Jose Fernandes · Pedro Dutenhefner · Turi Rezende · Gisele Pappa · Gabriela Paixão · Antonio Ribeiro · Wagner Jr.
Clinical trust in classification models remains a critical barrier to deploying automated ECG analysis, despite the transformative potential of deep learning. Although recent models have achieved significant progress in classifying arrhythmias, their reliance on opaque `black-box" reasoning limits practical utility: clinicians often dismiss classic explainability methods such as attention maps as unconvincing evidence for diagnoses. To address this, we present a multimodal system that generates human-understandable explanations by aligning the ECG features with textual diagnostic criteria from expert annotations. For example, the left bundle Brench Block (LBBB) abnormality is classified not just as a label but through explicit links to descriptors such as
widened QRS complex (>120 ms)'. Trained on paired ECG-text data, our approach performs zero-shot classification, matching supervised performance using a Vision Trasnformer (ViT) without task-specific fine-tuning. By extracting sub-features directly from clinical narratives, such as rhythm irregularities or morphological anomalies, the model grounds its predictions in domain-specific knowledge, mirroring clinician reasoning. This fusion of interpretable feature mapping and robust performance advances a paradigm where AI-driven diagnostics complement, rather than conflict with, the need for transparency in medical decision-making.