ICML Poster A General Representation-Based Approach to Multi-Source Domain Adaptation

Poster

A General Representation-Based Approach to Multi-Source Domain Adaptation

Ignavier Ng · Yan Li · Zijian Li · Yujia Zheng · Guangyi Chen · Kun Zhang

West Exhibition Hall B2-B3 #W-821

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

A central problem in unsupervised domain adaptation is determining what to transfer from labeled source domains to an unlabeled target domain. To handle high-dimensional observations (e.g., images), a line of approaches use deep learning to learn latent representations of the observations, which facilitate knowledge transfer in the latent space. However, existing approaches often rely on restrictive assumptions to establish identifiability of the joint distribution in the target domain, such as independent latent variables or invariant label distributions, limiting their real-world applicability. In this work, we propose a general domain adaptation framework that learns compact latent representations to capture distribution shifts relative to the prediction task and address the fundamental question of what representations should be learned and transferred. Notably, we first demonstrate that learning representations based on all the predictive information, i.e., the label's Markov blanket in terms of the learned representations, is often underspecified in general settings. Instead, we show that, interestingly, general domain adaptation can be achieved by partitioning the representations of Markov blanket into those of the label's parents, children, and spouses. Moreover, its identifiability guarantee can be established. Building on these theoretical insights, we develop a practical, nonparametric approach for domain adaptation in a general setting, which can handle different types of distribution shifts.

Lay Summary:

A major challenge in machine learning is adapting models trained on multiple labeled sources to make predictions in a new, unlabeled domain. We address this by developing a new framework that learns compact, meaningful representations of data that capture how the underlying distributions change across domains. This helps answer the fundamental question of what information should be transferred, leading to more reliable adaptation in diverse, real-world scenarios.

Chat is not available.