Poster
in
Affinity Workshop: LatinX in AI
Integrating ViT-Derived Visual Semantics into Credit Scoring Models for Informal Businesses in Latin America
Alejandro Mildiner · Michael Moreno · Viviana Siless
Access to formal credit remains a significant barrier for micro-entrepreneurs in Latin America. This work introduces a novel credit scoring methodology that integrates image-based cluster features extracted via Vision Transformer (ViT) models into a tabular classification pipeline. Using images submitted by loan applicants, we generate high-dimensional embeddings with ViT (Dosovitskiy et al., 2021), CLIP (Radford et al.,2021), and I-JEPA (Assran et al., 2023), and apply unsupervised clustering to discover latent visual patterns correlated with creditworthiness. Each applicant is then encoded based on their proximity to learned clusters; yielding categorical features that represent visual similarity to known payer profiles. These features are incorporated into XGBoost models alongside financial and demographic data. Our results show that visual-cluster-based features improve predictive performance and, they outperform a baseline model utilizing traditional indicators from credit bureaus and alternative data (AUC .79), reaching AUC .843 in the case of I-JEPA. This approach demonstrates how computer vision can provide interpretable, transferable insights from visual content, offering a new pathway toward fairer, more inclusive credit evaluation in underserved economies (Salcedo-Perez & Patino, 2018).