Track: Oral 5C Probablistic Models

Thu 17 July 10:00 - 10:15 PDT

Rényi Neural Processes

Xuesong Wang · He Zhao · Edwin V. Bonilla

Neural Processes (NPs) are deep probabilistic models that represent stochastic processes by conditioning their prior distributions on a set of context points. Despite their advantages in uncertainty estimation for complex distributions, NPs enforce parameterization coupling between the conditional prior model and the posterior model. We show that this coupling amounts to prior misspecification and revisit the NP objective to address this issue. More specifically, we propose Rényi Neural Processes (RNP), a method that replaces the standard KL divergence with the Rényi divergence, dampening the effects of the misspecified prior during posterior updates. We validate our approach across multiple benchmarks including regression and image inpainting tasks, and show significant performance improvements of RNPs in real-world problems. Our extensive experiments show consistently better log-likelihoods over state-of-the-art NP models.

Thu 17 July 10:15 - 10:30 PDT

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

Nuojin Cheng · Leonard Papenmeier · Stephen Becker · Luigi Nardi

Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function’s optimum and are often considered fundamentally distinct from EI. In this work, we challenge this prevailing perspective by introducing a unified theoretical framework, Variational Entropy Search, which reveals that EI and information-theoretic acquisition functions are more closely related than previously recognized. We demonstrate that EI can be interpreted as a variational inference approximation of the popular information-theoretic acquisition function, named Max-value Entropy Search. Building on this insight, we propose VES-Gamma, a novel acquisition function that balances the strengths of EI and MES. Extensive empirical evaluations across both low- and high-dimensional synthetic and real-world benchmarks demonstrate that VES-Gamma is competitive with state-of-the-art acquisition functions and in many cases outperforms EI and MES.

Thu 17 July 10:30 - 10:45 PDT

Outstanding Paper

Score Matching with Missing Data

Josh Givens · Song Liu · Henry Reeve

Score matching is a vital tool for learning the distribution of data with applications across many areas including diffusion processes, energy based modelling, and graphical model estimation. Despite all these applications, little work explores its use when data is incomplete. We address this by adapting score matching (and its major extensions) to work with missing data in a flexible setting where data can be partially missing over any subset of the coordinates. We provide two separate score matching variations for general use, an importance weighting (IW) approach, and a variational approach. We provide finite sample bounds for our IW approach in finite domain settings and show it to have especially strong performance in small sample lower dimensional cases. Complementing this, we show our variational approach to be strongest in more complex high-dimensional settings which we demonstrate on graphical model estimation tasks on both real and simulated data.

Thu 17 July 10:45 - 11:00 PDT

Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs

Jie Hu · Yi-Ting Ma · Do-Young Eun

We propose a *history-driven target (HDT)* framework in Markov Chain Monte Carlo (MCMC) to improve any random walk algorithm on discrete state spaces, such as general undirected graphs, for efficient sampling from target distribution $\\boldsymbol{\\mu}$. With broad applications in network science and distributed optimization, recent innovations like the self-repellent random walk (SRRW) achieve near-zero variance by prioritizing under-sampled states through transition kernel modifications based on past visit frequencies. However, SRRW's reliance on explicit computation of transition probabilities for all neighbors at each step introduces substantial computational overhead, while its strict dependence on time-reversible Markov chains excludes advanced non-reversible MCMC methods. To overcome these limitations, instead of direct modification of transition kernel, HDT introduces a history-dependent target distribution $\\boldsymbol{\\pi}[\\mathbf{x}]$ to replace the original target $\\boldsymbol{\\mu}$ in any graph sampler, where $\\mathbf{x}$ represents the empirical measure of past visits. This design preserves lightweight implementation by requiring only local information between the current and proposed states and achieves compatibility with both reversible and non-reversible MCMC samplers, while retaining unbiased samples with target distribution $\\boldsymbol{\\mu}$ and near-zero variance performance. Extensive experiments in graph sampling demonstrate consistent performance gains, and a memory-efficient Least Recently Used (LRU) cache ensures scalability to large general graphs.

Main Navigation

Oral Sessions

Oral 5C Probablistic Models

West Ballroom B

Rényi Neural Processes

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

Score Matching with Missing Data

Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs