Poster
Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings
Minh Hieu Nong · Antoine Ledent
West Exhibition Hall B2-B3 #W-910
Contrastive Representation Learning (CRL) is a powerful machine learning framework that enhances data representation by pulling similar data pairs together while pushing dissimilar pairs apart. This requires each training dataset to take the form of a collection of small groups where each group is composed of two similar objects (referred to as ‘anchors’), together a set of other objects which are known to be very different from the two anchor objects. We study CRL in the context of generalization theory, which is concerned with estimating the amount of data necessary for models to attain a desirable performance (also referred to as the ‘sample complexity’). Previous works have explored CRL settings where the groups are independent of each other. In our work, we explored the setting where the groups are formed from a finite pool of labeled examples, allowing the objects to be recycled across groups, breaking the assumption of statistical independence which is central in classic learning theory. Under some assumptions on the proportion of objects in each class, we show that the sample complexity is not worse than the fully independent settings. Experimentally, we demonstrate that models which reuse objects in different groups can outperform models which do not.