Oral Sessions
Oral 3E Causality and Domain Generalization
West Ballroom D
Moderator: Bryon Aragam
One-Step Generalization Ratio Guided Optimization for Domain Generalization
Sumin Cho · Dongwon Kim · Kwangsu Kim
Domain Generalization (DG) aims to train models that generalize to unseen target domains but often overfit to domain-specific features, known as undesired correlations. Gradient-based DG methods typically guide gradients in a dominant direction but often inadvertently reinforce spurious correlations. Recent work has employed dropout to regularize overconfident parameters, but has not explicitly adjusted gradient alignment or ensured balanced parameter updates. We propose GENIE (Generalization-ENhancing Iterative Equalizer), a novel optimizer that leverages the One-Step Generalization Ratio (OSGR) to quantify each parameter's contribution to loss reduction and assess gradient alignment. By dynamically equalizing OSGR via a preconditioning factor, GENIE prevents a small subset of parameters from dominating optimization, thereby promoting domain-invariant feature learning. Theoretically, GENIE balances convergence contribution and gradient alignment among parameters, achieving higher OSGR while retaining SGD's convergence rate. Empirically, it outperforms existing optimizers and enhances performance when integrated with various DG and single-DG methods.
An Improved Clique-Picking Algorithm for Counting Markov Equivalent DAGs via Super Cliques Transfer
Lifu Liu · Shiyuan He · Jianhua Guo
Efficiently counting Markov equivalent directed acyclic graphs (DAGs) is crucial in graphical causal analysis. Wienöbst et al. (2023) introduced a polynomial-time algorithm, known as the Clique-Picking algorithm, to count the number of Markov equivalent DAGs for a given completed partially directed acyclic graph (CPDAG). This algorithm iteratively selects a root clique, determines fixed orientations with outgoing edges from the clique, and generates the unresolved undirected connected components (UCCGs). In this work, we propose a more efficient approach to UCCG generation by utilizing previously computed results for different root cliques. Our method introduces the concept of super cliques within rooted clique trees, enabling their efficient transfer between trees with different root cliques. The proposed algorithm effectively reduces the computational complexity of the Clique-Picking method, particularly when the number of cliques is substantially smaller than the number of vertices and edges.
Polynomial-Delay MAG Listing with Novel Locally Complete Orientation Rules
Tian-Zuo Wang · Wen-Bo Du · Zhi-Hua Zhou
A maximal ancestral graph (MAG) is widely used to characterize the causal relations among observable variables in the presence of latent variables. However, given observational data, only a partial ancestral graph representing a Markov equivalence class (MEC) of MAGs is identifiable, which generally contains uncertain causal relations. Due to the uncertainties, \emph{MAG listing}, \emph{i.e.}, listing all the MAGs in the MEC, is critical for many downstream tasks. In this paper, we present the first \emph{polynomial-delay} MAG listing method, where delay refers to the time for outputting each MAG, through introducing enumerated structural knowledge in the form of \emph{singleton background knowledge (BK)}. To incorporate such knowledge, we propose the \emph{sound} and \emph{locally complete} orientation rules. By recursively introducing singleton BK and applying the rules, our method can output all and only MAGs in the MEC with polynomial delay. Additionally, while the proposed novel rules enable more efficient MAG listing, for the goal of incorporating general BK, we present two counterexamples to imply that existing rules including ours, are not yet \emph{complete}, which motivate two more rules. Experimental results validate the efficiency of the proposed MAG listing method.
Sanity Checking Causal Representation Learning on a Simple Real-World System
Juan L. Gamella · Simon Bing · Jakob Runge
We evaluate methods for causal representation learning (CRL) on a simple, real-world system where these methods are expected to work. The system consists of a controlled optical experiment specifically built for this purpose, which satisfies the core assumptions of CRL and where the underlying causal factors---the inputs to the experiment---are known, providing a ground truth. We select methods representative of different approaches to CRL and find that they all fail to recover the underlying causal factors. To understand the failure modes of the evaluated algorithms, we perform an ablation on the data by substituting the real data-generating process with a simpler synthetic equivalent. The results reveal a reproducibility problem, as most methods already fail on this synthetic ablation despite its simple data-generating process. Additionally, we observe that common assumptions on the mixing function are crucial for the performance of some of the methods but do not hold in the real data. Our efforts highlight the contrast between the theoretical promise of the state of the art and the challenges in its application. We hope the benchmark serves as a simple, real-world sanity check to further develop and validate methodology, bridging the gap towards CRL methods that work in practice. We make all code and datasets publicly available at