Poster
An Efficient Search-and-Score Algorithm for Ancestral Graphs using Multivariate Information Scores for Complex Non-linear and Categorical Data
Nikita Lagrange · Herve Isambert
East Exhibition Hall A-B #E-1306
The likelihood function is a fundamental concept in machine learning, quantifying how "likely" a given model explains observed data. Consequently, selecting the model that maximizes likelihood provides the most plausible explanation for the data, when no prior information about possible models is available.Typically, identifying the best explanatory model involves maximizing likelihood across a set of candidate models. For directed acyclic graph (DAG) models—structures where variables are represented as nodes connected by directed edges without forming cycles—the global likelihood function conveniently decomposes into local likelihood terms, involving each observed variables and its parent nodes. However, in practice, all relevant variables might not be observed in the dataset.This paper addresses this limitation by extending the likelihood formulation to handle DAGs with unobserved variables. Such hidden variables introduce edges with two arrowheads, indicating an unobserved common cause between observed variables. We show that the likelihood for these generalized "ancestral graphs" similarly decomposes into local contributions involving specific subsets of observed variables, and propose an estimation of these local likelihood contributions directly from observed data. We also introduce an efficient search and score algorithm, which does not assume simple linear relations between variables (like most other state-of-the-art methods) thereby providing a causal discovery method with hidden variables for complex non-linear and categorical data, that are common in `real-word' applications.