ICML Poster Latent Variable Causal Discovery under Selection Bias

Poster

Latent Variable Causal Discovery under Selection Bias

Haoyue Dai · Yiwen Qiu · Ignavier Ng · Xinshuai Dong · Peter Spirtes · Kun Zhang

East Exhibition Hall A-B #E-1803

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Addressing selection bias in latent variable causal discovery is important yet underexplored, largely due to a lack of suitable statistical tools: While various tools beyond basic conditional independencies have been developed to handle latent variables, none have been adapted for selection bias. We make an attempt by studying rank constraints, which, as a generalization to conditional independence constraints, exploits the ranks of covariance submatrices in linear Gaussian models. We show that although selection can significantly complicate the joint distribution, interestingly, the ranks in the biased covariance matrices still preserve meaningful information about both causal structures and selection mechanisms. We provide a graph-theoretic characterization of such rank constraints. Using this tool, we demonstrate that the one-factor model, a classical latent variable model, can be identified under selection bias. Simulations and real-world experiments confirm the effectiveness of using our rank constraints.

Lay Summary:

Understanding causal relations from data is a central goal in science, but it is hard when often the variables of interest are latentconstructs that cannot be directly observed. For instance, in a psychological survey, measured responses serve merely as proxies for latent personality traits. At the same time, the data is often not randomly sampled from the global population–imagine individuals with certain traits may be more willing to take a psychological survey. Ignoring such "selection bias" can lead to incorrect causal discovery results.This paper tackles both challenges at once: how to figure out the causal relations among both latent and observed variables when (1) we cannot observe everything, and (2) our data is biased by selection. We propose a new method that looks at how observed variables are connected–specifically, the mathematical "rank" of their dependencies, which are graphically informative for both latent causation and selection mechanisms–and identify the causal structure using these rank constraints.Our approach may serve as a valuable tool for researchers in social sciences, psychology, and more who want to understand the true causal relations beyond the possibly incomplete variables they observe, and behind the possibly biased data they collect. For researchers familiar with causal discovery, it is worth noting that while various tools beyond basic conditional independencies have been developed to handle latent variables since FCI, none seems to have been adapted for selection bias. This work may be a first attempt to bridge that gap.

Chat is not available.