Poster
in
Workshop: Actionable Interpretability
Persistent Demographic Information in X-ray Foundation Embeddings: a Risk for a Safe and Fair Deployment in Healthcare
Filipe Santos · Aldo Marzullo · Alessandro Quarta · João Sousa · Susana Vieira · Leo Celi · Francesco Calimeri · Laleh Seyyed-Kalantari
Medical imaging vector embeddings (vembs) from foundation models improve efficiency in medical imaging tasks but may leak sensitive demographic information, raising concerns about fairness and privacy and limiting their safe deployment in real world. In this paper, we analyze chest X-ray embeddings from MIMIC-CXR and CheXpert using two foundation models (CXR Foundation, BiomedCLIP), revealing significant and persistent leakage across demographic information. We investigate whether demographic attributes—age, sex, ethnicity, and insurance type—are implicitly encoded in patient representations. Using predictive modeling, we show that machine learning models can reliably infer these attributes, even when they are not explicitly included. We also identify specific embedding dimensions associated with demographic prediction and demonstrate that altering or removing them has minimal impact on task performance. These findings expose persistent fairness vulnerabilities in medical AI systems and are indicative that demographic information is deeply and robustly encoded, making simple feature removal strategies inadequate for bias mitigation.