ICML Poster A Generalization Theory for Zero-Shot Prediction

Spotlight Poster

A Generalization Theory for Zero-Shot Prediction

Ronak Mehta · Zaid Harchaoui

West Exhibition Hall B2-B3 #W-905

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Oral presentation: Oral 3B Representations 1
Wed 16 Jul 10 a.m. PDT — 11 a.m. PDT

Abstract:

A modern paradigm for generalization in machine learning and AI consists of pre-training a task-agnostic foundation model, generally obtained using self-supervised and multimodal contrastive learning. The resulting representations can be used for prediction on a downstream task for which no labeled data is available. We present a theoretical framework to better understand this approach, called zero-shot prediction. We identify the target quantities that zero-shot prediction aims to learn, or learns in passing, and the key conditional independence relationships that enable its generalization ability.

Lay Summary:

Traditional machine learning approaches rely on creating models that adhere to a set of input-output examples. For data-scarce applications such as classifying medical images, there may not be enough such examples to produce a performant classifier. Zero-shot prediction is a method in which models that were trained for complex tasks can be combined and reused to create classifiers for certain applications, without the need for any additional labeled training examples. This modern, remarkable technique does not have the same level of mathematical understanding that the classical approach outlined above. We aim to address this gap by providing a theoretical model for zero-shot prediction in which the qualities of the data and task that make this method succeed and fail can be expressed and analyzed mathematically.

Chat is not available.