Track: Poster Session 2 West

Poster

#W-100

EGPlace: An Efficient Macro Placement Method via Evolutionary Search with Greedy Repositioning Guided Mutation

ji deng · Zhao Li · Ji Zhang · Jun Gao

Macro placement, which involves optimizing the positions of modules, is a critical phase in modern integrated circuit design and significantly influences chip performance. The growing complexity of integrated circuits demands increasingly sophisticated placement solutions. Existing approaches have evolved along two primary paths (e.g., constructive and adjustment methods), but they face significant practical limitations that affect real-world chip design. Recent hybrid frameworks such as WireMask-EA have attempted to combine these strategies, but significant technical barriers still remain, including the computational overhead from separated layout adjustment and reconstruction that often require complete layout rebuilding, the inefficient exploration of design spaces due to random mutation operations, and the computational complexity of mask-based construction methods that limit scalability. To overcome these limitations, we introduce EGPlace, a novel evolutionary optimization framework that combines guided mutation strategies with efficient layout reconstruction. EGPlace introduces two key innovations: a greedy repositioning-guided mutation operator that systematically identifies and optimizes critical layout regions, and an efficient mask computation algorithm that accelerates layout evaluation. Our extensive evaluation using ISPD2005 and Ariane RISC-V CPU benchmarks demonstrate that EGPlace reduces wirelength by \textbf{10.8\%} and \textbf{9.3\%} compared to WireMask-EA and the state-of-the-art reinforcement learning-based constructive method EfficientPlace, respectively, while achieving speedups of 7.8$\times$ and 2.8$\times$ over these methods.

Poster

#W-1000

Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization

Cameron Jakub · Mihai Nica

Despite remarkable performance on a variety of tasks, many properties of deep neural networks are not yet theoretically understood. One such mystery is the depth degeneracy phenomenon: the deeper you make your network, the closer your network is to a constant function on initialization. In this paper, we examine the evolution of the angle between two inputs to a ReLU neural network as a function of the number of layers. By using combinatorial expansions, we find precise formulas for how fast this angle goes to zero as depth increases. These formulas capture microscopic fluctuations that are not visible in the popular framework of infinite width limits, and leads to qualitatively different predictions. We validate our theoretical results with Monte Carlo experiments and show that our results accurately approximate finite network behaviour. We also empirically investigate how the depth degeneracy phenomenon can negatively impact training of real networks. The formulas are given in terms of the mixed moments of correlated Gaussians passed through the ReLU function. We also find a surprising combinatorial connection between these mixed moments and the Bessel numbers that allows us to explicitly evaluate these moments.

Poster

#W-1001

Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements

Arya Mazumdar · Neha Sangwan

We consider the problem of *exact* recovery of a $k$-sparse binary vector from generalized linear measurements (such as *logistic regression*). We analyze the *linear estimation* algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number of required measurements. As a consequence of our results, for noisy one bit quantized linear measurements ($\mathsf{1bCS}$), we obtain a sample complexity of $O((k+\sigma^2)\log{n})$, where $\sigma^2$ is the noise variance. This is shown to be optimal due to the information theoretic lower bound. We also obtain tight sample complexity characterization for logistic regression.Since $\mathsf{1bCS}$ is a strictly harder problem than noisy linear measurements ($\mathsf{SparseLinearReg}$) because of added quantization, the same sample complexity is achievable for $\mathsf{SparseLinearReg}$. While this sample complexity can be obtained via the popular lasso algorithm, linear estimation is computationally more efficient. Our lower bound holds for any set of measurements for $\mathsf{SparseLinearReg}$ (similar bound was known for Gaussian measurement matrices) and is closely matched by the maximum-likelihood upper bound. For $\mathsf{SparseLinearReg}$, it was conjectured in Gamarnik and Zadik, 2017 that there is a statistical-computational gap and the number of measurements should be at least $(2k+\sigma^2)\log{n}$ for efficient algorithms to exist. It is worth noting that our results imply that there is no such statistical-computational gap for $\mathsf{1bCS}$ and logistic regression.

Poster

#W-1002

How Distributed Collaboration Influences the Diffusion Model Training? A Theoretical Perspective

Jing Qiao · Yu Liu · YUAN YUAN · Xiao Zhang · Zhipeng Cai · Dongxiao Yu

This paper examines the theoretical performance of distributed diffusion models in environments where computational resources and data availability vary significantly among workers. Traditional models centered on single-worker scenarios fall short in such distributed settings, particularly when some workers are resource-constrained. This discrepancy in resources and data diversity challenges the assumption of accurate score function estimation foundational to single-worker models. We establish the inaugural generation error bound for distributed diffusion models in resource-limited settings, establishing a linear relationship with the data dimension $d$ and consistency with established single-worker results. Our analysis highlights the critical role of hyperparameter selection in influencing the training dynamics, which are key to the performance of model generation. This study provides a streamlined theoretical approach to optimizing distributed diffusion models, paving the way for future research in this area.

Poster

#W-1003

The Price of Linear Time: Error Analysis of Structured Kernel Interpolation

Alexander Moreno · Justin Xiao · Jonathan Mei

Structured Kernel Interpolation (SKI) scales Gaussian Processes (GPs) by approximating the kernel matrix via inducing point interpolation, achieving linear computational complexity. However, it lacks rigorous theoretical error analysis. This paper bridges this gap by proving error bounds for the SKI Gram matrix and examining their effect on hyperparameter estimation and posterior inference. We further provide a practical guide to selecting the number of inducing points under convolutional cubic interpolation: they should grow as (n^{d/3}) for error control. Crucially, we identify two dimensionality regimes for the SKI Gram matrix spectral norm error vs. complexity trade-off. For (d<3), \textit{any} error tolerance can achieve linear time for sufficiently large sample size. For (d\geq 3), the error must \textit{increase} with sample size for our guarantees to hold. Our analysis provides key insights into SKI's scalability-accuracy trade-offs, establishing precise conditions for achieving linear-time GP inference with controlled error.

Poster

#W-1005

The Underlying Universal Statistical Structure of Natural Datasets

Noam Levi · Yaron Oz

We study universal properties in real-world complex and synthetically generated datasets. Our approach is to analogize data to a physical system and employ tools from statistical physics and Random Matrix Theory (RMT) to reveal their underlying structure.Examining the local and global eigenvalue statistics of feature-feature covariance matrices, we find: (i) bulk eigenvalue power-law scaling vastly differs between uncorrelated Gaussian and real-world data, (ii) this power law behavior is reproducible using Gaussian data with long-range correlations, (iii) all dataset types exhibit chaotic RMT universality, (iv) RMT statistics emerge at smaller dataset sizes than typical training sets, correlating with power-law convergence, (v) Shannon entropy correlates with RMT structure and requires fewer samples in strongly correlated datasets. These results suggest natural image Gram matrices can be approximated by Wishart random matrices with simple covariance structure, enabling rigorous analysis of neural network behavior.

Poster

#W-1006

A Reductions Approach to Risk-Sensitive Reinforcement Learning with Optimized Certainty Equivalents

Kaiwen Wang · Dawen Liang · Nathan Kallus · Wen Sun

We study risk-sensitive RL where the goal is learn a history-dependent policy that optimizes some risk measure of cumulative rewards.We consider a family of risks called the optimized certainty equivalents (OCE), which captures important risk measures such as conditional value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. In this setting, we propose two meta-algorithms: one grounded in optimism and another based on policy gradients, both of which can leverage the broad suite of risk-neutral RL algorithms in an augmented Markov Decision Process (MDP). Via a reductions approach, we leverage theory for risk-neutral RL to establish novel OCE bounds in complex, rich-observation MDPs. For the optimism-based algorithm, we prove bounds that generalize prior results in CVaR RL and that provide the first risk-sensitive bounds for exogenous block MDPs. For the gradient-based algorithm, we establish both monotone improvement and global convergence guarantees under a discrete reward assumption. Finally, we empirically show that our algorithms learn the optimal history-dependent policy in a proof-of-concept MDP, where all Markovian policies provably fail.

Poster

#W-1008

Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback

Simone Drago · Marco Mussi · Alberto Maria Metelli

The success of sequential decision-making approaches, such as reinforcement learning (RL), is closely tied to the availability of a reward feedback. However, designing a reward function that encodes the desired objective is a challenging task. In this work, we address a more realistic scenario: sequential decision making with preference feedback provided, for instance, by a human expert. We aim to build a theoretical basis linking preferences, (non-Markovian) utilities, and (Markovian) rewards, and we study the connections between them. First, we model preference feedback using a partial (pre)order over trajectories, enabling the presence of incomparabilities that are common when preferences are provided by humans but are surprisingly overlooked in existing works. Second, to provide a theoretical justification for a common practice, we investigate how a preference relation can be approximated by a multi-objective utility. We introduce a notion of preference-utility compatibility and analyze the computational complexity of this transformation, showing that constructing the minimum-dimensional utility is NP-hard. Third, we propose a novel concept of preference-based policy dominance that does not rely on utilities or rewards and discuss the computational complexity of assessing it. Fourth, we develop a computationally efficient algorithm to approximate a utility using (Markovian) rewards and quantify the error in terms of the suboptimality of the optimal policy induced by the approximating reward. This work aims to lay the foundation for a principled approach to sequential decision making from preference feedback, with promising potential applications in RL from human feedback.

Poster

#W-1009

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

Audrey Huang · Adam Block · Qinghua Liu · Nan Jiang · Akshay Krishnamurthy · Dylan Foster

Recent work on inference-time alignment has established benefits of increasing inference-time computation in language models, but naively scaling compute through techniques like Best-of-N sampling can cause performance to degrade due to reward hacking. Toward a theoretical understanding of how to best leverage additional computation, we formalize inference-time alignment as improving a pre-trained policy’s responses for a prompt of interest, given access to an imperfect reward model. We analyze the performance of inference-time alignment algorithms in terms of (i) response quality, and (ii) compute, and provide new results that highlight the importance of the pre-trained policy’s coverage over high-quality responses for performance and compute scaling: (1) We show that Best-of-N alignment with an ideal N can achieve optimal performance under stringent notions of coverage, but provably suffers from reward hacking when N is large, and fails to achieve tight guarantees under more realistic coverage conditions; (2) We introduce InferenceTimePessimism, a new algorithm which mitigates reward hacking through deliberate use of inference-time compute, implementing pessimism in the face of uncertainty; we prove that its performance is optimal and scaling-monotonic, i.e., does ot degrade as N increases. We complement our theoretical results with experiments that demonstrate the practicality of our algorithm across a variety of tasks and models.

Poster

#W-101

Cradle: Empowering Foundation Agents towards General Computer Control

Weihao Tan · Wentao Zhang · Xinrun Xu · Haochong Xia · gang Ding · Boyu Li · Bohan Zhou · Junpeng Yue · Jiechuan Jiang · Yewen Li · Ruyi An · Molei Qin · Chuqiao Zong · Longtao Zheng · YuJie Wu · Xiaoqiang Chai · Yifei Bi · Tianbao Xie · Pengjie Gu · Xiyun Li · Ceyao Zhang · Long Tian · Chaojie Wang · Xinrun Wang · Börje F. Karlsson · Bo An · Shuicheng YAN · Zongqing Lu

Despite their success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i.e., using screenshots as input and keyboard and mouse actions as output. We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. Enhanced by six key modules, Information Gathering, Self-Reflection, Task Inference, Skill Curation, Action Planning, and Memory, Cradle is able to understand input screenshots and output executable code for low-level keyboard and mouse control after high-level planning and information retrieval, so that Cradle can interact with any software and complete long-horizon complex tasks without relying on any built-in APIs. Experimental results show that Cradle exhibits remarkable generalizability and impressive performance across four previously unexplored commercial video games (Red Dead Redemption 2, Cities:Skylines, Stardew Valley and Dealer's Life 2), five software applications (Chrome, Outlook, Feishu, Meitu and CapCut), and a comprehensive benchmark, OSWorld. With a unified interface to interact with any software, Cradle greatly extends the reach of foundation agents thus paving the way for generalist agents.

Spotlight Poster

#W-1010

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

Pedro Santos · Alberto Sardinha · Francisco S. Melo

The general-utility Markov decision processes (GUMDPs) framework generalizes the MDPs framework by considering objective functions that depend on the frequency of visitation of state-action pairs induced by a given policy. In this work, we contribute with the first analysis on the impact of the number of trials, i.e., the number of randomly sampled trajectories, in infinite-horizon GUMDPs. We show that, as opposed to standard MDPs, the number of trials plays a key-role in infinite-horizon GUMDPs and the expected performance of a given policy depends, in general, on the number of trials. We consider both discounted and average GUMDPs, where the objective function depends, respectively, on discounted and average frequencies of visitation of state-action pairs. First, we study policy evaluation under discounted GUMDPs, proving lower and upper bounds on the mismatch between the finite and infinite trials formulations for GUMDPs. Second, we address average GUMDPs, studying how different classes of GUMDPs impact the mismatch between the finite and infinite trials formulations. Third, we provide a set of empirical results to support our claims, highlighting how the number of trajectories and the structure of the underlying GUMDP influence policy evaluation.

Poster

#W-1011

Near-Optimal Sample Complexity for MDPs via Anchoring

Jongmin Lee · Mario Bravo · Roberto Cominetti

We study a new model-free algorithm to compute $\varepsilon$-optimal policies for average reward Markov decision processes, in the weakly communicating setting. Given a generative model, our procedure combines a recursive sampling technique with Halpern's anchored iteration, and computes an $\varepsilon$-optimal policy with sample and time complexity $\widetilde{O}(|\mathcal{S}||\mathcal{A}|\||h\||^{2}/\varepsilon^{2})$ both in high probability and in expectation. To our knowledge, this is the best complexity among model-free algorithms, matching the known lower bound up to a factor $ \||h\|| $. Although the complexity bound involves the span seminorm $ \||h\|| $ of the unknown bias vector, the algorithm requires no prior knowledge and implements a stopping rule which guarantees with probability 1 that the procedure terminates in finite time. We also analyze how these techniques can be adapted for discounted MDPs.

Poster

#W-1012

Low-Rank Thinning

Annabelle Carrell · Albert Gong · Abhishek Shetty · Raaz Dwivedi · Lester Mackey

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

Poster

#W-1013

PAC-Bayes Analysis for Recalibration in Classification

Masahiro Fujisawa · Futoshi Futami

Nonparametric estimation using uniform-width binning is a standard approach for evaluating the calibration performance of machine learning models.However, existing theoretical analyses of the bias induced by binning are limited to binary classification, creating a significant gap with practical applications such as multiclass classification. Additionally, many parametric recalibration algorithms lack theoretical guarantees for their generalization performance.To address these issues, we conduct a generalization analysis of calibration error using the probably approximately correct Bayes framework. This approach enables us to derive the first optimizable upper bound for generalization error in the calibration context. On the basis of our theory, we propose a generalization-aware recalibration algorithm. Numerical experiments show that our algorithm enhances the performance of Gaussian process-based recalibration across various benchmark datasets and models.

Poster

#W-1014

When to Forget? Complexity Trade-offs in Machine Unlearning

Martin Van Waerebeke · Marco Lorenzi · Giovanni Neglia · Kevin Scaman

Machine Unlearning (MU) aims at removing the influence of specific data points from a trained model, striving to achieve this at a fraction of the cost of full model retraining. In this paper, we analyze the efficiency of unlearning methods and establish the first upper and lower bounds on minimax computation times for this problem, characterizing the performance of the most efficient algorithm against the most difficult objective function. Specifically, for strongly convex objective functions and under the assumption that the forget data is inaccessible to the unlearning method, we provide a phase diagram for the unlearning complexity ratio---a novel metric that compares the computational cost of the best unlearning method to full model retraining. The phase diagram reveals three distinct regimes: one where unlearning at a reduced cost is infeasible, another where unlearning is trivial because adding noise suffices, and a third where unlearning achieves significant computational advantages over retraining. These findings highlight the critical role of factors such as data dimensionality, the number of samples to forget, and privacy constraints in determining the practical feasibility of unlearning.

Poster

#W-1015

Optimal Decision Tree Pruning Revisited: Algorithms and Complexity

Juha Harviainen · Frank Sommer · Manuel Sorge · Stefan Szeider

We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size $D$ or number $d$ of features, it can be solved in $D^{2d} \cdot |I|^{O(1)}$ time, where $|I|$ is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis.

Poster

#W-1016

Learning Minimum-Size BDDs: Towards Efficient Exact Algorithms

Christian Komusiewicz · André Schidler · Frank Sommer · Manuel Sorge · Luca Staus

Binary decision diagrams (BDDs) are widely applied tools to compactly represent labeled data as directed acyclic graphs; for efficiency and interpretability reasons small BDDs are preferred.Given labeled data, minimizing BDDs is NP-complete and thus recent research focused on the influence of parameters such as the solution size $s$ on the complexity [Ordyniak et al., AAAI 2024].Our main positive result is an algorithm that is efficient if in particular $s$, the domain size $D$, and the Hamming distance between any two data points is small, improving on previous running-time bounds.This algorithm is inspired by the witness-tree paradigm that was recently successful for computing decision trees [Komusiewicz et al., ICML 2023], whose extension to BDDs was open.We extend our algorithmic results to the case where we allow a small number of misclassified data points and complement them with lower bounds that show that the running times are tight from multiple points of view.We show that our main algorithm holds practical promise by providing a proof-of-concept implementation.

Poster

#W-1017

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Junyan Liu · Lillian Ratliff

We study the repeated principal-agent bandit game, where the principal indirectly explores an unknown environment by incentivizing an agent to play arms. Unlike prior work that assumes a greedy agent with full knowledge of reward means, we consider a self-interested learning agent who iteratively updates reward estimates and may explore arbitrarily with some probability.As a warm-up, we first consider a self-interested learning agent without exploration.We propose algorithms for both i.i.d. and linear reward settings with bandit feedback in a finite horizon $T$, achieving regret bounds of $\widetilde{O}(\sqrt{T})$ and $\widetilde{O}(T^{\frac{2}{3}})$, respectively. Specifically, these algorithms rely on a novel elimination framework coupled with new search algorithms which accommodate the uncertainty from the agent's learning behavior. We then extend the framework to handle an exploratory learning agent and develop an algorithm to achieve a $\widetilde{O}(T^{\frac{2}{3}})$ regret bound in i.i.d. reward setup by enhancing the robustness of our elimination framework to the potential agent exploration. Finally, when our agent model reduces to that in (Dogan et al., 2023a), we propose an algorithm based on our robust framework, which achieves a $\widetilde{O}(\sqrt{T})$ regret bound, significantly improving upon their $\widetilde{O}(T^{\frac{11}{12}})$ bound.

Poster

#W-1018

Leveraging Offline Data in Linear Latent Contextual Bandits

Chinmaya Kausik · Kevin Tan · Ambuj Tewari

Leveraging offline data is an attractive way to accelerate online sequential decision-making. However, it is crucial to account for latent states in users or environments in the offline data, and latent bandits form a compelling model for doing so. In this light, we design end-to-end latent bandit algorithms capable of handing uncountably many latent states. We focus on a linear latent contextual bandit — a linear bandit where each user has its own high-dimensional reward parameter in $\mathbb{R}^{d_A}$, but reward parameters across users lie in a low-rank latent subspace of dimension $d_K \ll d_A$. First, we provide an offline algorithm to learn this subspace with provable guarantees. We then present two online algorithms that utilize the output of this offline algorithm to accelerate online learning. The first enjoys $\tilde O(\min(d_A\sqrt{T}, d_K\sqrt{T}(1+\sqrt{d_AT/d_KN})))$ regret guarantees, so that the effective dimension is lower when the size $N$ of the offline dataset is larger. We prove a matching lower bound on regret, showing that our algorithm is minimax optimal. The second is a practical algorithm that enjoys only a slightly weaker guarantee, but is computationally efficient. We also establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from MovieLens. Finally, we theoretically establish the generality of the latent bandit model by proving a de Finetti theorem for stateless decision processes.

Poster

#W-1019

Contextual Bandits for Unbounded Context Distributions

Puning Zhao · Rongfei Fan · Shaowei Wang · Li Shen · Qixin Zhang · Zong Ke · Tianhang Zheng

Nonparametric contextual bandit is an important model of sequential decision making problems. Under $\alpha$-Tsybakov margin condition, existing research has established a regret bound of $\tilde{O}\left(T^{1-\frac{\alpha+1}{d+2}}\right)$ for bounded supports. However, the optimal regret with unbounded contexts has not been analyzed. The challenge of solving contextual bandit problems with unbounded support is to achieve both exploration-exploitation tradeoff and bias-variance tradeoff simultaneously. In this paper, we solve the nonparametric contextual bandit problem with unbounded contexts. We propose two nearest neighbor methods combined with UCB exploration. The first method uses a fixed $k$. Our analysis shows that this method achieves minimax optimal regret under a weak margin condition and relatively light-tailed context distributions. The second method uses adaptive $k$. By a proper data-driven selection of $k$, this method achieves an expected regret of $\tilde{O}\left(T^{1-\frac{(\alpha+1)\beta}{\alpha+(d+2)\beta}}+T^{1-\beta}\right)$, in which $\beta$ is a parameter describing the tail strength. This bound matches the minimax lower bound up to logarithm factors, indicating that the second method is approximately optimal.

Poster

#W-1020

Exploiting Curvature in Online Convex Optimization with Delayed Feedback

Hao Qiu · Emmanuel Esposito · Mengxiao Zhang

In this work, we study the online convex optimization problem with curved losses and delayed feedback.When losses are strongly convex, existing approaches obtain regret bounds of order $d_{\max} \ln T$, where $d_{\max}$ is the maximum delay and $T$ is the time horizon. However, in many cases, this guarantee can be much worse than $\sqrt{d_{\mathrm{tot}}}$ as obtained by a delayed version of online gradient descent, where $d_{\mathrm{tot}}$ is the total delay.We bridge this gap by proposing a variant of follow-the-regularized-leader that obtains regret of order $\min\\{\sigma_{\max}\ln T, \sqrt{d_{\mathrm{tot}}}\\}$, where $\sigma_{\max}$ is the maximum number of missing observations.We then consider exp-concave losses and extend the Online Newton Step algorithm to handle delays with an adaptive learning rate tuning, achieving regret $\min\\{d_{\max} n\ln T, \sqrt{d_{\mathrm{tot}}}\\}$ where $n$ is the dimension.To our knowledge, this is the first algorithm to achieve such a regret bound for exp-concave losses.We further consider the problem of unconstrained online linear regression and achieve a similar guarantee by designing a variant of the Vovk-Azoury-Warmuth forecaster with a clipping trick.Finally, we implement our algorithms and conduct experiments under various types of delay and losses, showing an improved performance over existing methods.

Poster

#W-103

Agent-as-a-Judge: Evaluate Agents with Agents

Mingchen Zhuge · Changsheng Zhao · Dylan Ashley · Wenyi Wang · Dmitrii Khizbullin · Yunyang Xiong · Zechun Liu · Ernie Chang · Raghuraman Krishnamoorthi · Yuandong Tian · Yangyang Shi · Vikas Chandra · Jürgen Schmidhuber

Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes---ignoring the step-by-step nature of the thinking done by agentic systems---or require excessive manual labour. To address this, we introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is a natural extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving processes for more precise evaluations. We apply the Agent-as-a-Judge framework to the task of code generation. To overcome issues with existing benchmarks and provide a proof-of-concept testbed for Agent-as-a-Judge, we present DevAI, a new benchmark of 55 realistic AI code generation tasks. DevAI includes rich manual annotations, like a total of 365 hierarchical solution requirements, which make it particularly suitable for an agentic evaluator. We benchmark three of the top code-generating agentic systems using Agent-as-a-Judge and find that our framework dramatically outperforms LLM-as-a-Judge and is as reliable as our human evaluation baseline. Altogether, we believe that this work represents a concrete step towards enabling vastly more sophisticated agentic systems. To help that, our dataset and the full implementation of Agent-as-a-Judge will be publically available at https://github.com/metauto-ai/agent-as-a-judge

Poster

#W-104

StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models

Ya Jiang · Chuxiong Wu · Massieh Kordi Boroujeny · Brian Mark · Kai Zeng

Watermarking for large language models (LLMs) offers a promising approach to identifying AI-generated text. Existing approaches, however, either compromise the distribution of original generated text by LLMs or are limited to embedding zero-bit information that only allows for watermark detection but ignores identification. We present StealthInk, a stealthy multi-bit watermarking scheme that preserves the original text distribution while enabling the embedding of provenance data, such as userID, TimeStamp, and modelID, within LLM-generated text. This enhances fast traceability without requiring access to the language model's API or prompts. We derive a lower bound on the number of tokens necessary for watermark detection at a fixed equal error rate, which provides insights on how to enhance the capacity. Comprehensive empirical evaluations across diverse tasks highlight the stealthiness, detectability, and resilience of StealthInk, establishing it as an effective solution for LLM watermarking applications.

Poster

#W-105

What can large language models do for sustainable food?

Anna Thomas · Adam Yee · Andrew Mayne · Maya Mathur · Dan Jurafsky · Kristina Gligoric

Food systems are responsible for a third of human-caused greenhouse gas emissions. We investigate what Large Language Models (LLMs) can contribute to reducing the environmental impacts of food production. We define a typology of design and prediction tasks based on the sustainable food literature and collaboration with domain experts, and evaluate six LLMs on four tasks in our typology. For example, for a sustainable protein design task, food science experts estimated that collaboration with an LLM can reduce time spent by 45% on average, compared to 22% for collaboration with another expert human food scientist. However, for a sustainable menu design task, LLMs produce suboptimal solutions when instructed to consider both human satisfaction and climate impacts. We propose a general framework for integrating LLMs with combinatorial optimization to improve reasoning capabilities. Our approach decreases emissions of food choices by 79% in a hypothetical restaurant while maintaining participants' satisfaction with their set of choices. Our results demonstrate LLMs' potential, supported by optimization techniques, to accelerate sustainable food development and adoption.

Poster

#W-106

MultiPDENet: PDE-embedded Learning with Multi-time-stepping for Accelerated Flow Simulation

Qi Wang · Yuan Mi · Wang Haoyun · Yi Zhang · Ruizhi Chengze · Hongsheng Liu · Ji-Rong Wen · Hao Sun

Solving partial differential equations (PDEs) by numerical methods meet computational cost challenge for getting the accurate solution since fine grids and small time steps are required. Machine learning can accelerate this process, but struggle with weak generalizability, interpretability, and data dependency, as well as suffer in long-term prediction. To this end, we propose a PDE-embedded network with multiscale time stepping (MultiPDENet), which fuses the scheme of numerical methods and machine learning, for accelerated simulation of flows. In particular, we design a convolutional filter based on the structure of finite difference stencils with a small number of parameters to optimize, which estimates the equivalent form of spatial derivative on a coarse grid to minimize the equation's residual. A Physics Block with a 4th-order Runge-Kutta integrator at the fine time scale is established that embeds the structure of PDEs to guide the prediction. To alleviate the curse of temporal error accumulation in long-term prediction, we introduce a multiscale time integration approach, where a neural network is used to correct the prediction error at a coarse time scale. Experiments across various PDE systems, including the Navier-Stokes equations, demonstrate that MultiPDENet can accurately predict long-term spatiotemporal dynamics, even given small and incomplete training data, e.g., spatiotemporally down-sampled datasets. MultiPDENet achieves the state-of-the-art performance compared with other neural baseline models, also with clear speedup compared to classical numerical methods

Poster

#W-107

Perceptually Constrained Precipitation Nowcasting Model

Wenzhi Feng · Xutao Li · Zhe Wu · Kenghong Lin · Demin Yu · Yunming Ye · Yaowei Wang

Most current precipitation nowcasting methods aim to capture the underlying spatiotemporal dynamics of precipitation systems by minimizing the mean square error (MSE). However, these methods often neglect effective constraints on the data distribution, leading to unsatisfactory prediction accuracy and image quality, especially for long forecast sequences. To address this limitation, we propose a precipitation nowcasting model incorporating perceptual constraints. This model reformulates precipitation nowcasting as a posterior MSE problem under such constraints. Specifically, we first obtain the posteriori mean sequences of precipitation forecasts using a precipitation estimator. Subsequently, we construct the transmission between distributions using rectified flow. To enhance the focus on distant frames, we design a frame sampling strategy that gradually increases the corresponding weights. We theoretically demonstrate the reliability of our solution, and experimental results on two publicly available radar datasets demonstrate that our model is effective and outperforms current state-of-the-art models.

Poster

#W-108

DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra

Montgomery Bohde · Mrunali Manjrekar · Runzhong Wang · Shuiwang Ji · Connor Coley

Mass spectrometry plays a fundamental role in elucidating the structures of unknown molecules and subsequent scientific discoveries. One formulation of the structure elucidation task is the conditional de novo generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. The encoder utilizes a transformer architecture and models mass spectra domain knowledge such as peak formulae and neutral losses, and the decoder is a discrete graph diffusion model restricted by the heavy-atom composition of a known chemical formula. To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs, which are available in virtually infinite quantities, compared to structure-spectrum pairs that number in the tens of thousands. Extensive experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation. We provide several ablations to demonstrate the effectiveness of our diffusion and pretraining approaches and show consistent performance scaling with increasing pretraining dataset size. DiffMS code is publicly available at https://github.com/coleygroup/DiffMS.

Poster

#W-109

Physics-Informed Generative Modeling of Wireless Channels

Benedikt Böck · Andreas Oeldemann · Timo Mayer · Francesco Rossetto · Wolfgang Utschick

Learning the site-specific distribution of the wireless channel within a particular environment of interest is essential to exploit the full potential of machine learning (ML) for wireless communications and radar applications. Generative modeling offers a promising framework to address this problem. However, existing approaches pose unresolved challenges, including the need for high-quality training data, limited generalizability, and a lack of physical interpretability. To address these issues, we combine the physics-related compressibility of wireless channels with generative modeling, in particular, sparse Bayesian generative modeling (SBGM), to learn the distribution of the underlying physical channel parameters. By leveraging the sparsity-inducing characteristics of SBGM, our methods can learn from compressed observations received by an access point (AP) during default online operation. Moreover, they are physically interpretable and generalize over system configurations without requiring retraining.

Poster

#W-110

Diagonal Symmetrization of Neural Network Solvers for the Many-Electron Schrödinger Equation

Kevin Han Huang · Ni Zhan · Elif Ertekin · Peter Orbanz · Ryan P. Adams

Incorporating group symmetries into neural networks has been a cornerstone of success in many AI-for-science applications. Diagonal groups of isometries, which describe the invariance under a simultaneous movement of multiple objects, arise naturally in many-body quantum problems. Despite their importance, diagonal groups have received relatively little attention, as they lack a natural choice of invariant maps except in special cases. We study different ways of incorporating diagonal invariance in neural network ansatze trained via variational Monte Carlo methods, and consider specifically data augmentation, group averaging and canonicalization. We show that, contrary to standard ML setups, in-training symmetrization destabilizes training and can lead to worse performance. Our theoretical and numerical results indicate that this unexpected behavior may arise from a unique computational-statistical tradeoff not found in standard ML analyses of symmetrization. Meanwhile, we demonstrate that post hoc averaging is less sensitive to such tradeoffs and emerges as a simple, flexible and effective method for improving neural network solvers.

Poster

#W-1100

Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents

Shuo Han · German Espinosa · Junda Huang · Daniel A. Dombeck · Malcolm MacIver · Bradly Stadie

Recent advances in reinforcement learning (RL) have demonstrated impressive capabilities in complex decision-making tasks. This progress raises a natural question: how do these artificial systems compare to biological agents, which have been shaped by millions of years of evolution? To help answer this question, we undertake a comparative study of biological mice and RL agents in a predator-avoidance maze environment. Through this analysis, we identify a striking disparity: RL agents consistently demonstrate a lack of self-preservation instinct, readily risking ``death'' for marginal efficiency gains. These risk-taking strategies are in contrast to biological agents, which exhibit sophisticated risk-assessment and avoidance behaviors. Towards bridging this gap between the biological and artificial, we propose two novel mechanisms that encourage more naturalistic risk-avoidance behaviors in RL agents. Our approach leads to the emergence of naturalistic behaviors, including strategic environment assessment, cautious path planning, and predator avoidance patterns that closely mirror those observed in biological systems.

Poster

#W-111

Context-Informed Neural ODEs Unexpectedly Identify Broken Symmetries: Insights from the Poincaré–Hopf Theorem

In Huh · Changwook Jeong · Muhammad Alam

Out-Of-Domain (OOD) generalization is a significant challenge in learning dynamical systems, especially when they exhibit bifurcation, a sudden topological transition triggered by a model parameter crossing a critical threshold. A prevailing belief is that machine learning models, unless equipped with strong priors, struggle to generalize across bifurcations due to the abrupt changes in data characteristics. Contrary to this belief, we demonstrate that context-dependent Neural Ordinary Differential Equations (NODEs), trained solely on localized, pre-bifurcation, symmetric data and without physics-based priors, can still identify post-bifurcation, symmetry-breaking behaviors, even in a zero-shot manner. We interpret this capability to the model's implicit utilization of topological invariants, particularly the Poincaré index, and offer a formal explanation based on the Poincaré–Hopf theorem. We derive the conditions under which NODEs can recover—or erroneously hallucinate—broken symmetries without explicit training. Building on this insight, we showcase a topological regularizer inspired by the Poincaré–Hopf theorem and validate it empirically on phase transitions of systems described by the Landau–Khalatnikov equation.

Poster

#W-112

KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Benson Chen · Tomasz Danel · Gabriel Dreiman · Patrick McEnaney · Nikhil Jain · Kirill Novikov · Spurti Akki · Joshua L. Turnbull · Virja Pandya · Boris Belotserkovskii · Jared Weaver · Ankita Biswas · Dat Nguyen · Kent Gorday · Mohammad M Sultan · Nathaniel Stanley · Daniel Whalen · Divya Kanichar · Christoph Klein · Emily Fox · R. Watts

DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.

Poster

#W-113

Wyckoff Transformer: Generation of Symmetric Crystals

Nikita Kazeev · Wei Nong · Ignat Romanov · Ruiming Zhu · Andrey Ustyuzhanin · Shuya Yamazaki · Kedar Hippalgaonkar

Crystal symmetry plays a fundamental role in determining its physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce WyFormer, a generative model that directly tackles this by formally conditioning on space group symmetry. It achieves this by using Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer encoder and an absence of positional encoding. Extensive experimentation demonstrates WyFormer's compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed.

Poster

#W-114

Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training

Minghao Xu · Jiaze Song · Keming Wu · Xiangxin Zhou · Bin Cui · Wentao Zhang

Understanding the various properties of glycans with machine learning has shown some preliminary promise. However, previous methods mainly focused on modeling the backbone structure of glycans as graphs of monosaccharides (i.e., sugar units), while they neglected the atomic structures underlying each monosaccharide, which are actually important indicators of glycan properties. We fill this blank by introducing the GlycanAA model for All-Atom-wise Glycan modeling. GlycanAA models a glycan as a heterogeneous graph with monosaccharide nodes representing its global backbone structure and atom nodes representing its local atomic-level structures. Based on such a graph, GlycanAA performs hierarchical message passing to capture from local atomic-level interactions to global monosaccharide-level interactions. To further enhance model capability, we pre-train GlycanAA on a high-quality unlabeled glycan dataset, deriving the PreGlycanAA model. We design a multi-scale mask prediction algorithm to endow the model about different levels of dependencies in a glycan. Extensive benchmark results show the superiority of GlycanAA over existing glycan encoders and verify the further improvements achieved by PreGlycanAA. We maintain all resources at https://github.com/kasawa1234/GlycanAA.

Spotlight Poster

#W-115

Elucidating the Design Space of Multimodal Protein Language Models

Cheng-Yen Hsieh · Xinyou Wang · Daiheng Zhang · Dongyu Xue · Fei YE · Shujian Huang · Zaixiang Zheng · Quanquan Gu

Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks.To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models.Project page and code: https://bytedance.github.io/dplm/dplm-2.1.

Poster

#W-116

LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models

Jinho Chang · Jong Chul YE

With the emergence of diffusion models as a frontline generative model, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, here we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. By recognizing that the suitable latent space design is the key to the diffusion model performance, we employ a contrastive learning strategy to extract novel feature space from text data that embeds the unique characteristics of the molecule structure. Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark, being one of the first diffusion models that outperforms autoregressive models in textual data generation with a better choice of the latent domain.Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model.

Poster

#W-117

RISE: Radius of Influence based Subgraph Extraction for 3D Molecular Graph Explanation

Jingxiang Qu · Wenhan Gao · Jiaxing Zhang · Xufeng Liu · Hua Wei · Haibin Ling · Yi Liu

3D Geometric Graph Neural Networks (GNNs) have emerged as transformative tools for modeling molecular data. Despite their predictive power, these models often suffer from limited interpretability, raising concerns for scientific applications that require reliable and transparent insights. While existing methods have primarily focused on explaining molecular substructures in 2D GNNs, the transition to 3D GNNs introduces unique challenges, such as handling the implicit dense edge structures created by a cutoff radius. To tackle this, we introduce a novel explanation method specifically designed for 3D GNNs, which localizes the explanation to the immediate neighborhood of each node within the 3D space. Each node is assigned an radius of influence, defining the localized region within which message passing captures spatial and structural interactions crucial for the model's predictions. This method leverages the spatial and geometric characteristics inherent in 3D graphs. By constraining the subgraph to a localized radius of influence, the approach not only enhances interpretability but also aligns with the physical and structural dependencies typical of 3D graph applications, such as molecular learning.

Poster

#W-118

Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning

Runzhong Wang · Rui-Xi Wang · Mrunali Manjrekar · Connor Coley

Molecular machine learning has gained popularity with the advancements of geometric deep learning. In parallel, retrieval-augmented generation has become a principled approach commonly used with language models. However, the optimal integration of retrieval augmentation into molecular machine learning remains unclear. Graph neural networks stand to benefit from clever matching to understand the structural alignment of retrieved molecules to a query molecule. Neural graph matching offers a compelling solution by explicitly modeling node and edge affinities between two structural graphs while employing a noise-robust, end-to-end neural network to learn affinity metrics. We apply this approach to mass spectrum simulation and introduce MARASON, a novel model that incorporates neural graph matching to enhance a fragmentation-based neural network. Experimental results highlight the effectiveness of our design, with MARASON achieving 27% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches. Code is publicly available at https://github.com/coleygroup/ms-pred.

Poster

#W-119

Steering Protein Language Models

Long-Kai Huang · Rongyi Zhu · Bing He · Jianhua Yao

Protein Language Models (PLMs), pre-trained on extensive evolutionary data from natural proteins, have emerged as indispensable tools for protein design. While powerful, PLMs often struggle to produce proteins with precisely specified functionalities or properties due to inherent challenges in controlling their outputs. In this work, we investigate the potential of Activation Steering, a technique originally developed for controlling text generation in Large Language Models (LLMs), to direct PLMs toward generating protein sequences with targeted properties. We propose a simple yet effective method that employs activation editing to steer PLM outputs, and extend this approach to protein optimization through a novel editing site identification module. Through comprehensive experiments on lysozyme-like sequence generation and optimization, we demonstrate that our methods can be seamlessly integrated into both auto-encoding and autoregressive PLMs without requiring additional training. These results highlight a promising direction for precise protein engineering using foundation models.

Poster

#W-120

On the Query Complexity of Verifier-Assisted Language Generation

Edoardo Botta · Yuchen Li · Aashay Mehta · Jordan Ash · Cyril Zhang · Andrej Risteski

Recently, a plethora of works have proposedinference-time algorithms (e.g. best-of-n), whichincorporate verifiers to assist the generation process. Their quality-efficiency trade-offs have beenempirically benchmarked on a variety of constrained generation tasks, but the algorithmic design landscape is still largely poorly understood.In this paper, we develop a mathematical framework for reasoning about constrained generationusing a pre-trained language model generator oracle and a process verifier—which can decidewhether a prefix can be extended to a string whichsatisfies the constraints of choice. We show thateven in very simple settings, access to a verifiercan render an intractable problem (information-theoretically or computationally) to a tractableone. In fact, we show even simple algorithms,like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier. Empirically, we show that a natural modification oftokenwise rejection sampling, in which the sampler is allowed to "backtrack" (i.e., erase the finalfew generated tokens) has robust and substantivebenefits over natural baselines (e.g. (blockwise)rejection sampling, nucleus sampling)—both interms of computational efficiency, accuracy anddiversity.

Poster

#W-121

Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

Mohit Pandey · Gopeshh Subbaraj · Artem Cherkasov · Martin Ester · Emmanuel Bengio

Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from rewards treated as unnormalized distributions. Previous works in this framework often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further implement a goal-conditioned finetuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on a subset of ZINC dataset, and by employing robust evaluation metrics we show the effectiveness of our approach when compared to other relevant baseline methods for a wide range of drug design tasks. The code is accessible at https://github.com/diamondspark/AGFN.

Poster

#W-122

Exploring Vision Semantic Prompt for Efficient Point Cloud Understanding

Yixin Zha · Chuxin Wang · Wenfei Yang · Tianzhu Zhang · Feng Wu

A series of pre-trained models have demonstrated promising results in point cloud understanding tasks and are widely applied to downstream tasks through fine-tuning. However, full fine-tuning leads to the forgetting of pretrained knowledge and substantial storage costs on edge devices. To address these issues, Parameter-Efficient Transfer Learning (PETL) methods have been proposed. According to our analysis, we find that existing 3D PETL methods cannot adequately align with semantic relationships of features required by downstream tasks, resulting in suboptimal performance. To ensure parameter efficiency while introducing rich semantic cues, we propose a novel fine-tuning paradigm for 3D pre-trained models. We utilize frozen 2D pre-trained models to provide vision semantic prompts and design a new Hybrid Attention Adapter to efficiently fuse 2D semantic cues into 3D representations with minimal trainable parameters(1.8M). Extensive experiments conducted on datasets including ScanObjectNN, ModelNet40, and ShapeNetPart demonstrate the effectiveness of our proposed paradigm. In particular, our method achieves 95.6% accuracy on ModelNet40 and attains 90.09% performance on the most challenging classification split ScanObjectNN(PB-T50-RS).

Poster

#W-123

SITCOM: Step-wise Triple-Consistent Diffusion Sampling For Inverse Problems

Ismail Alkhouri · Shijun Liang · Cheng-Han Huang · Jimmy Dai · Qing Qu · Saiprasad Ravishankar · Rongrong Wang

Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse problems, the reverse sampling steps are modified to approximately sample from a measurement-conditioned distribution. However, these modifications may be unsuitable for certain settings (e.g., presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates our proposed step-wise and network-regularized backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions (implicitly or explicitly), our sampler requires significantly fewer reverse steps. Therefore, we refer to our method as Step-wise Triple-Consistent Sampling (SITCOM). Compared to SOTA baselines, our experiments across several linear and non-linear tasks (with natural and medical images) demonstrate that SITCOM achieves competitive or superior results in terms of standard similarity metrics and run-time.

Poster

#W-124

Control and Realism: Best of Both Worlds in Layout-to-Image without Training

Bonan Li · Yinhan Hu · Songhua Liu · Xinchao Wang

Layout-to-Image generation aims to create complex scenes with precise control over the placement and arrangement of subjects. Existing works have demonstrated that pre-trained Text-to-Image diffusion models can achieve this goal without training on any specific data; however, they often face challenges with imprecise localization and unrealistic artifacts. Focusing on these drawbacks, we propose a novel training-free method, WinWinLay. At its core, WinWinLay presents two key strategies—Non-local Attention Energy Function and Adaptive Update—that collaboratively enhance control precision and realism. On one hand, we theoretically demonstrate that the commonly used attention energy function introduces inherent spatial distribution biases, hindering objects from being uniformly aligned with layout instructions. To overcome this issue, non-local attention prior is explored to redistribute attention scores, facilitating objects to better conform to the specified spatial conditions. On the other hand, we identify that the vanilla backpropagation update rule can cause deviations from the pre-trained domain, leading to out-of-distribution artifacts. We accordingly introduce a Langevin dynamics-based adaptive update scheme as a remedy that promotes in-domain updating while respecting layout constraints. Extensive experiments demonstrate that WinWinLay excels in controlling element placement and achieving photorealistic visual fidelity, outperforming the current state-of-the-art methods.

Poster

#W-200

Learning Adaptive Lighting via Channel-Aware Guidance

Qirui Yang · Peng-Tao Jiang · Hao Zhang · Jinwei Chen · Bo Li · Huanjing Yue · Jingyu Yang

Learning lighting adaptation is a crucial step in achieving good visual perception and supporting downstream vision tasks. Current research often addresses individual light-related challenges, such as high dynamic range imaging and exposure correction, in isolation. However, we identify shared fundamental properties across these tasks:i) different color channels have different light properties, and ii) the channel differences reflected in the spatial and frequency domains are different. Leveraging these insights, we introduce the channel-aware Learning Adaptive Lighting Network (LALNet), a multi-task framework designed to handle multiple light-related tasks efficiently. Specifically, LALNet incorporates color-separated features that highlight the unique light properties of each color channel, integrated with traditional color-mixed features by Light Guided Attention (LGA). The LGA utilizes color-separated features to guide color-mixed features focusing on channel differences and ensuring visual consistency across all channels. Additionally, LALNet employs dual domain channel modulation for generating color-separated features and a mixed channel modulation and light state space module for producing color-mixed features. Extensive experiments on four representative light-related tasks demonstrate that LALNet significantly outperforms state-of-the-art methods on benchmark tests and requires fewer computational resources. We provide an anonymous online demo at LALNet.

Poster

#W-201

Fine-Grained Captioning of Long Videos through Scene Graph Consolidation

Sanghyeok Chu · Seonguk Seo · Bohyung Han

Recent advances in vision-language models have led to impressive progress in caption generation for images and short video clips. However, these models remain constrained by their limited temporal receptive fields, making it difficult to producecoherent and comprehensive captions for long videos. While several methods have been proposed to aggregate information across video segments, they often rely on supervised fine-tuning or incur significant computational overhead. To address these challenges, we introduce a novel framework for long video captioning based on graph consolidation. Our approach first generates segment-level captions, corresponding to individual frames or short video intervals, using off-the-shelf visual captioning models. These captions are then parsed into individual scene graphs, which are subsequently consolidated into a unified graph representation that preserves both holistic context and fine-grained details throughout the video. A lightweight graph-to-text decoder then produces the final video-level caption. This framework effectively extends the temporal understanding capabilities of existing models without requiring any additional fine-tuning on long video datasets. Experimental results show that our method significantly outperforms existing LLM-based consolidation approaches, achieving strong zero-shot performance while substantially reducing computational costs.

Poster

#W-202

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Xingyu Fu · Minqian Liu · Zhengyuan Yang · John Corring · Yijuan Lu · Jianwei Yang · Dan Roth · Dinei Florencio · Cha Zhang

Structured image understanding, such as interpreting tables and charts, requires strategically refocusing across various structures and texts within an image, forming a reasoning sequence to arrive at the final answer. However, current multimodal large language models (LLMs) lack this multihop selective attention capability. In this work, we introduce ReFocus, a simple yet effective framework that equips multimodal LLMs with the ability to generate ``visual thoughts'' by performing visual editing on the input image through code, shifting and refining their visual focuses. Specifically, ReFocus enables multimodal LLMs to generate Python codes to call tools and modify the input image, sequentially drawing boxes, highlighting sections, and masking out areas, thereby enhancing the visual reasoning process. We experiment upon a wide range of structured image understanding tasks involving tables and charts. ReFocus largely improves performance on all tasks over GPT-4o without visual editing, yielding an average gain of 11.0% on table tasks and 6.8% on chart tasks. We present an in-depth analysis of the effects of different visual edits, and reasons why ReFocus can improve the performance without introducing additional information. Further, we collect a 14k training set using ReFocus, and prove that such visual chain-of-thought with intermediate information offers a better supervision than standard VQA data, reaching a 8.0% average gain over the same model trained with QA pairs and 2.6% over CoT.

Poster

#W-203

TUMTraf VideoQA: Dataset and Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes

Xingcheng Zhou · Konstantinos Larintzakis · Hao Guo · Walter Zimmer · Mingyu Liu · Hu Cao · Jiajie Zhang · Venkatnarayanan Lakshminarasimhan · Leah Strand · Alois Knoll

We present TUMTraf VideoQA, a novel dataset and benchmark designed for spatio-temporal video understanding in complex roadside traffic scenarios. The dataset comprises 1,000 videos, featuring 85,000 multiple-choice QA pairs, 2,300 object captioning, and 5,700 object grounding annotations, encompassing diverse real-world conditions such as adverse weather and traffic anomalies. By incorporating tuple-based spatio-temporal object expressions, TUMTraf VideoQA unifies three essential tasks—multiple-choice video question answering, referred object captioning, and spatio-temporal object grounding—within a cohesive evaluation framework. We further introduce the TraffiX-Qwen baseline model, enhanced with visual token sampling strategies, providing valuable insights into the challenges of fine-grained spatio-temporal reasoning. Extensive experiments demonstrate the dataset’s complexity, highlight the limitations of existing models, and position TUMTraf VideoQA as a robust foundation for advancing research in intelligent transportation systems. The dataset and benchmark are publicly available to facilitate further exploration.

Poster

#W-204

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao · Jiaxin Shi · Hanwang Zhang · Chunping Wang · Jun Xiao · Long Chen

With the advance of diffusion models, today's video generation has achieved impressive quality. To extend the generation length and facilitate real-world applications, a majority of video diffusion models (VDMs) generate videos in an autoregressive manner, i.e., generating subsequent clips conditioned on the last frame(s) of the previous clip. However, existing autoregressive VDMs are highly inefficient and redundant: The model must re-compute all the conditional frames that are overlapped between adjacent clips. This issue is exacerbated when the conditional frames are extended autoregressively to provide the model with long-term context. In such cases, the computational demands increase significantly (i.e., with a quadratic complexity w.r.t. the autoregression step). In this paper, we propose Ca2-VDM, an efficient autoregressive VDM with Causal generation and Cache sharing. For causal generation, it introduces unidirectional feature computation, which ensures that the cache of conditional frames can be precomputed in previous autoregression steps and reused in every subsequent step, eliminating redundant computations. For cache sharing, it shares the cache across all denoising steps to avoid the huge cache storage cost. Extensive experiments demonstrated that our Ca2-VDM achieves state-of-the-art quantitative and qualitative video generation results and significantly improves the generation speed. Code is available: https://github.com/Dawn-LX/CausalCache-VDM

Poster

#W-205

BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training

Chenyi yang · Wenjie Nie · Yuxin Zhang · Yuhang Wu · Xiawu Zheng · GUANNAN JIANG · Rongrong Ji

N:M sparsity stands as a progressively important tool for DNN compression, achieving practical speedups by stipulating at most N non-zero components within M sequential weights. Unfortunately, most existing works identify the N:M sparse mask through dense backward propagation to update all weights, which incurs exorbitant training costs. In this paper, we introduce BAME, a method that maintains consistent sparsity throughout the N:M sparse training process. BAME perpetually keeps both sparse forward and backward propagation, while iteratively performing weight pruning-and-regrowing within designated weight blocks to tailor the N:M mask. These blocks are selected through a joint assessment based on accumulated mask oscillation frequency and expected loss reduction of mask adaptation, thereby ensuring stable and efficient identification of the optimal N:M mask. Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse ResNet-50 on ImageNet, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. Code is released at \url{https://github.com/BAME-xmu/BAME}

Poster

#W-206

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

Zhenglin Zhou · Xiaobo Xia · Fan Ma · Hehe Fan · Yi Yang · Tat-Seng Chua

Text-to-3D generation automates 3D content creation from textual descriptions, which offers transformative potential across various fields. However, existing methods often struggle to align generated content with human preferences, limiting their applicability and flexibility. To address these limitations, in this paper, we propose DreamDPO, an optimization-based framework that integrates human preferences into the 3D generation process, through direct preference optimization. Practically, DreamDPO first constructs pairwise examples, then validates their alignment with human preferences using reward or large multimodal models, and lastly optimizes the 3D representation with a preference-driven loss function. By leveraging relative preferences, DreamDPO reduces reliance on precise quality evaluations while enabling fine-grained controllability through preference-guided optimization. Experiments demonstrate that DreamDPO achieves state-of-the-art results, and provides higher-quality and more controllable 3D content compared to existing methods. The code and models will be open-sourced.

Poster

#W-207

Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning

Xue Zhao · Qinying Gu · Xinbing Wang · Chenghu Zhou · Nanyang Ye

Improving the generalization of multi-camera 3D object detection is essential for safe autonomous driving in the real world. In this paper, we consider a realistic yet more challenging scenario, which aims to improve the generalization when only single source data available for training, as gathering diverse domains of data and collecting annotations is time-consuming and labor-intensive. To this end, we propose the Fourier Cross-View Learning (FCVL) framework including Fourier Hierarchical Augmentation (FHiAug), an augmentation strategy in the frequency domain to boost domain diversity, and Fourier Cross-View Semantic Consistency Loss to facilitate the model to learn more domain-invariant features from adjacent perspectives. Furthermore, we provide theoretical guarantees via augmentation graph theory. To the best of our knowledge, this is the first study to explore generalizable multi-camera 3D object detection with a single source. Extensive experiments on various testing domains have demonstrated that our approach achieves the best performance across various domain generalization methods.

Poster

#W-208

Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing

Zhuoying Li · Zhu Xu · Yuxin Peng · Yang Liu

Instruction-based image editing, which aims to modify the image faithfully towards instruction while preserving irrelevant content unchanged, has made advanced progresses. However, there still lacks a comprehensive metric for assessing the editing quality. Existing metrics either require high costs concerning human evaluation, which hinders large-scale evaluation, or adapt from other tasks and lose specified concerns, failing to comprehensively evaluate the modification of instruction and the preservation of irrelevant regions, resulting in biased evaluation. To tackle it, we introduce a new metric Balancing Preservation Modification (BPM), that tailored for instruction-based image editing by explicitly disentangling the image into editing-relevant and irrelevant regions for specific consideration. We first identify and locate editing-relevant regions, followed by a two-tier process to assess editing quality: Region-Aware Judge evaluates whether the position and size of the edited region align with instruction, and Semantic-Aware Judge further assesses the instruction content compliance within editing-relevant regions as well as content preservation within irrelevant regions, yielding comprehensive and interpretable quality assessment. Moreover, the editing-relevant region localization in BPM can be integrated into image editing approaches to improve the editing quality, manifesting its wild application. We verify the effectiveness of BPM metric on comprehensive instruction-editing data, and the re- sults show that we yield the highest alignment with human evaluation compared to existing metrics, indicating efficacy. The code is available at https://joyli-x.github.io/BPM/.

Poster

#W-209

HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration

Yushi Huang · Zining Wang · Ruihao Gong · Jing Liu · Xinjie Zhang · Jinyang Guo · Xianglong Liu · Jun Zhang

Diffusion Transformers (DiTs) excel in generative tasks but face practical deployment challenges due to high inference costs. Feature caching, which stores and retrieves redundant computations, offers the potential for acceleration. Existing learning-based caching, though adaptive, overlooks the impact of the prior timestep. It also suffers from misaligned objectives-*aligned predicted noise vs. high-quality images*-between training and inference. These two discrepancies compromise both performance and efficiency.To this end, we *harmonize* training and inference with a novel learning-based *caching* framework dubbed **HarmoniCa**. It first incorporates *Step-Wise Denoising Training* (SDT) to ensure the continuity of the denoising process, where prior steps can be leveraged. In addition, an *Image Error Proxy-Guided Objective* (IEPO) is applied to balance image quality against cache utilization through an efficient proxy to approximate the image error. Extensive experiments across $8$ models, $4$ samplers, and resolutions from $256\times256$ to $2K$ demonstrate superior performance and speedup of our framework. For instance, it achieves over $40\\%$ latency reduction (*i.e.*, $2.07\times$ theoretical speedup) and improved performance on PixArt-$\alpha$. Remarkably, our *image-free* approach reduces training time by $25\\%$ compared with the previous method. Our code is available at https://github.com/ModelTC/HarmoniCa.

Poster

#W-210

Exploring Invariance in Images through One-way Wave Equations

Yinpeng Chen · Dongdong Chen · Xiyang Dai · Mengchen Liu · Yinan Feng · Youzuo Lin · Lu Yuan · Zicheng Liu

In this paper, we empirically demonstrate that natural images can be reconstructed with high fidelity from compressed representations using a simple first-order norm-plus-linear autoregressive (FINOLA) process—without relying on explicit positional information. Through systematic analysis, we observe that the learned coefficient matrices ($\mathbf{A}$ and $\mathbf{B}$) in FINOLA are typically invertible, and their product, $\mathbf{AB}^{-1}$, is diagonalizable across training runs. This structure enables a striking interpretation: FINOLA’s latent dynamics resemble a system of one-way wave equations evolving in a compressed latent space. Under this framework, each image corresponds to a unique solution of these equations. This offers a new perspective on image invariance, suggesting that the underlying structure of images may be governed by simple, invariant dynamic laws. Our findings shed light on a novel avenue for understanding and modeling visual data through the lens of latent-space dynamics and wave propagation.

Poster

#W-211

LRA-QViT: Integrating Low-Rank Approximation and Quantization for Robust and Efficient Vision Transformers

Beom Jin Kang · NamJoon Kim · Hyun Kim

Recently, transformer-based models have demonstrated state-of-the-art performance across various computer vision tasks, including image classification, detection, and segmentation. However, their substantial parameter count poses significant challenges for deployment in resource-constrained environments such as edge or mobile devices. Low-rank approximation (LRA) has emerged as a promising model compression technique, effectively reducing the number of parameters in transformer models by decomposing high-dimensional weight matrices into low-rank representations. Nevertheless, matrix decomposition inherently introduces information loss, often leading to a decline in model accuracy. Furthermore, existing studies on LRA largely overlook the quantization process, which is a critical step in deploying practical vision transformer (ViT) models. To address these challenges, we propose a robust LRA framework that preserves weight information after matrix decomposition and incorporates quantization tailored to LRA characteristics. First, we introduce a reparameterizable branch-based low-rank approximation (RB-LRA) method coupled with weight reconstruction to minimize information loss during matrix decomposition. Subsequently, we enhance model accuracy by integrating RB-LRA with knowledge distillation techniques. Lastly, we present an LRA-aware quantization method designed to mitigate the large outliers generated by LRA, thereby improving the robustness of the quantized model. To validate the effectiveness of our approach, we conducted extensive experiments on the ImageNet dataset using various ViT-based models. Notably, the Swin-B model with RB-LRA achieved a 31.8\% reduction in parameters and a 30.4\% reduction in GFLOPs, with only a 0.03\% drop in accuracy. Furthermore, incorporating the proposed LRA-aware quantization method reduced accuracy loss by an additional 0.83\% compared to naive quantization.

Poster

#W-212

Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences

Yunhong Lu · Qichao Wang · Hengyuan Cao · Xiaoyin Xu · Min Zhang

Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data. Although substantial resources are expended in collecting and labeling datasets, a critical aspect is often neglected: preferences vary across individuals and should be represented with more granularity. To address this, we propose SmPO-Diffusion, a novel method for modeling preference distributions to improve the DPO objective, along with a numerical upper bound estimation for the diffusion optimization objective. First, we introduce a smoothed preference distribution to replace the original binary distribution. We employ a reward model to simulate human preferences and apply preference likelihood averaging to improve the DPO loss, such that the loss function approaches zero when preferences are similar. Furthermore, we utilize an inversion technique to simulate the trajectory preference distribution of the diffusion model, enabling more accurate alignment with the optimization objective. Our approach effectively mitigates issues of excessive optimization and objective misalignment present in existing methods through straightforward modifications. Experimental results demonstrate that our method achieves state-of-the-art performance in preference evaluation tasks, surpassing baselines across various metrics, while reducing the training costs.

Spotlight Poster

#W-213

Flopping for FLOPs: Leveraging Equivariance for Computational Efficiency

Georg Bökman · David Nordström · Fredrik Kahl

Incorporating geometric invariance into neural networks enhances parameter efficiency but typically increases computational costs.This paper introduces new equivariant neural networksthat preserve symmetry while maintaining a comparable number of floating-point operations (FLOPs) per parameter to standard non-equivariant networks. We focus on horizontal mirroring (flopping) invariance, common in many computer vision tasks.The main idea is to parametrize the feature spaces in terms of mirror-symmetric and mirror-antisymmetric features, i.e., irreps of the flopping group.This decomposes the linear layers to be block-diagonal, requiring half the number of FLOPs.Our approach reduces both FLOPs and wall-clock time,providing a practical solution for efficient, scalable symmetry-aware architectures.

Poster

#W-214

More Than Meets the Eye: Enhancing Multi-Object Tracking Even with Prolonged Occlusions

Bishoy Galoaa · Somaieh Amraee · Sarah Ostadabbas

This paper introduces MOTE (MOre Than meets the Eye), a novel multi-object tracking (MOT) algorithm designed to address the challenges of tracking occluded objects. By integrating deformable detection transformers with a custom disocclusion matrix, MOTE significantly enhances the ability to track objects even when they are temporarily hidden from view. The algorithm leverages optical flow to generate features that are processed through a softmax splatting layer, which aids in the creation of a disocclusion matrix. This matrix plays a crucial role in maintaining track consistency by estimating the motion of occluded objects. MOTE's architecture includes modifications to the enhanced track embedding module (ETEM), which allows it to incorporate these advanced features into the track query layer embeddings. This integration ensures that the model not only tracks visible objects but also accurately predicts the trajectories of occluded ones, much like the human visual system. The proposed method is evaluated on multiple datasets, including MOT17, MOT20, and DanceTrack, where it achieves impressive tracking metrics--82.0 MOTA and 66.3 HOTA on the MOT17 dataset, 81.7 MOTA and 65.8 HOTA on the MOT20 dataset, and 93.2 MOTA and 74.2 HOTA on the DanceTrack dataset. Notably, MOTE excels in reducing identity switches and maintaining consistent tracking in complex real-world scenarios with frequent occlusions, outperforming existing state-of-the-art methods across all tested benchmarks.

Poster

#W-216

GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder

Shiming Chen · Dingjie Fu · Salman Khan · Fahad Khan

Remarkable progress in zero-shot learning (ZSL) has been achieved using generative models. However, existing generative ZSL methods merely generate (imagine) the visual features from scratch guided by the strong class semantic vectors annotated by experts, resulting in suboptimal generative performance and limited scene generalization. To address these and advance ZSL, we propose an inductive variational autoencoder for generative zero-shot learning, dubbed GenZSL. Mimicking human-level concept learning, GenZSL operates by inducting new class samples from similar seen classes using weak class semantic vectors derived from target class names (i.e., CLIP text embedding). To ensure the generation of informative samples for training an effective ZSL classifier, our GenZSL incorporates two key strategies. Firstly, it employs class diversity promotion to enhance the diversity of class semantic vectors. Secondly, it utilizes target class-guided information boosting criteria to optimize the model. Extensive experiments conducted on three popular benchmark datasets showcase the superiority and potential of our GenZSL with significant efficacy and efficiency over f-VAEGAN, e.g., 24.7\% performance gains and more than $60\times$ faster training speed on AWA2. Codes are available at \url{https://github.com/shiming-chen/GenZSL}.

Poster

#W-217

Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation

Zhihua Liu · Amrutha Saseendran · Lei Tong · Xilin He · Fariba Yousefi · Nikolay Burlutskiy · Dino Oglic · Tom Diethe · Philip Teare · Huiyu Zhou · Chen Jin

Open-set image segmentation poses a significant challenge because existing methods often demand extensive training or fine-tuning and generally struggle to segment unified objects consistently across diverse text reference expressions. Motivated by this, we propose Segment Anyword, a novel training-free visual concept prompt learning approach for open-set language grounded segmentation that relies on token-level cross-attention maps from a frozen diffusion model to produce segmentation surrogates or mask prompts, which are then refined into targeted object masks. Initial prompts typically lack coherence and consistency as the complexity of the image-text increases, resulting in suboptimal mask fragments. To tackle this issue, we further introduce a novel linguistic-guided visual prompt regularization that binds and clusters visual prompts based on sentence dependency and syntactic structural information, enabling the extraction of robust, noise-tolerant mask prompts, and significant improvements in segmentation accuracy. The proposed approach is effective, generalizes across different open-set segmentation tasks, and achieves state-of-the-art results of 52.5 (+6.8 relative) mIoU on Pascal Context 59, 67.73 (+25.73 relative) cIoU on gRefCOCO, and 67.4 (+1.1 relative to fine-tuned methods) mIoU on GranDf, which is the most complex open-set grounded segmentation task in the field.

Poster

#W-218

Balanced Learning for Domain Adaptive Semantic Segmentation

Wangkai Li · Rui Sun · Bohao Liao · Zhaoyang Li · Tianzhu Zhang

Unsupervised domain adaptation (UDA) for semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain.Despite the effectiveness of self-training techniques in UDA, they struggle to learn each class in a balanced manner due to inherent class imbalance and distribution shift in both data and label space between domains.To address this issue, we propose Balanced Learning for Domain Adaptation (BLDA), a novel approach to directly assess and alleviate class bias without requiring prior knowledge about the distribution shift.First, we identify over-predicted and under-predicted classes by analyzing the distribution of predicted logits.Subsequently, we introduce a post-hoc approach to align the logits distributions across different classes using shared anchor distributions.To further consider the network's need to generate unbiased pseudo-labels during self-training, we estimate logits distributions online and incorporate logits correction terms into the loss function.Moreover, we leverage the resulting cumulative density as domain-shared structural knowledge to connect the source and target domains.Extensive experiments on two standard UDA semantic segmentation benchmarks demonstrate that BLDA consistently improves performance, especially for under-predicted classes, when integrated into various existing methods.

Poster

#W-219

CSV-Occ: Fusing Multi-frame Alignment for Occupancy Prediction with Temporal Cross State Space Model and Central Voting Mechanism

Ziming Zhu · Yu Zhu · Jiahao Chen · Xiaofeng Ling · Huanlei Chen · Lihua Sun

Recently, image-based 3D semantic occupancy prediction has become a hot topic in 3D scene understanding for autonomous driving. Compared with the bounding box form of 3D object detection, the ability to describe the fine-grained contours of any obstacles in the scene is the key insight of voxel occupancy representation, which facilitates subsequent tasks of autonomous driving. In this work, we propose CSV-Occ to address the following two challenges: (1) Existing methods fuse temporal information based on the attention mechanism, but are limited by high complexity. We extend the state space model to support multi-input sequence interaction and conduct temporal modeling in a cascaded architecture, thereby reducing the computational complexity from quadratic to linear. (2) Existing methods are limited by semantic ambiguity, resulting in the centers of foreground objects often being predicted as empty voxels. We enable the model to explicitly vote for the instance center to which the voxels belong and spontaneously learn to utilize the other voxel features of the same instance to update the semantics of the internal vacancies of the objects from coarse to fine. Experiments on the Occ3D-nuScenes dataset show that our method achieves state-of-the-art in camera-based 3D semantic occupancy prediction and also performs well on lidar point cloud semantic segmentation on the nuScenes dataset. Therefore, we believe that CSV-Occ is beneficial to the community and industry of autonomous vehicles.

Poster

#W-220

Understanding Complexity in VideoQA via Visual Program Generation

Cristobal Eyzaguirre · Igor Vasiljevic · Achal Dave · Jiajun Wu · Rareș Ambruș · Thomas Kollar · Juan Carlos Niebles · Pavel Tokmakov

We propose a data-driven approach to analyzing query complexity in Video Question Answering (VideoQA). Previous efforts in benchmark design have relied on human expertise to design challenging questions, yet we experimentally show that humans struggle to predict which questions are difficult for machine learning models. Our automatic approach leverages recent advances in code generation for visual question answering, using the complexity of generated code as a proxy for question difficulty. We demonstrate that this measure correlates significantly better with model performance than human estimates. To operationalize this insight, we propose an algorithm for estimating question complexity from code. It identifies fine-grained primitives that correlate with the hardest questions for any given set of models, making it easy to scale to new approaches in the future. Finally, to further illustrate the utility of our method, we extend it to automatically generate complex questions, constructing a new benchmark that is 1.9 times harder than the popular NExT-QA.

Poster

#W-221

DVI:A Derivative-based Vision Network for INR

RUNZHAO YANG · Xiaolong Wu · Zhihong Zhang · Fabian Zhang · Tingxiong Xiao · Zongren Li · Kunlun He · Jinli Suo

Recent advancements in computer vision have seen Implicit Neural Representations (INR) becoming a dominant representation form for data due to their compactness and expressive power. To solve various vision tasks with INR data, vision networks can either be purely INR-based, but are thereby limited by simplistic operations and performance constraints, or include raster-based methods, which then tend to lose crucial structural information of the INR during the conversion process. To address these issues, we propose DVI, a novel Derivative-based Vision network for INR, capable of handling a variety of vision tasks across various data modalities, while achieving the best performance among the existing methods by incorporating state of the art raster-based methods into a INR based architecture. DVI excels by extracting semantic information from the high order derivative map of the INR, then seamlessly fusing it into a pre-existing raster-based vision network, enhancing its performance with deeper, task-relevant semantic insights. Extensive experiments on five vision tasks across three data modalities demonstrate DVI's superiority over existing methods. Additionally, our study encompasses comprehensive ablation studies to affirm the efficacy of each element of DVI, the influence of different derivative computation techniques and the impact of derivative orders. Reproducible codes are provided in the supplementary materials.

Poster

#W-300

RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning

Yuanhuiyi Lyu · Xu Zheng · Lutao Jiang · Yibo Yan · Xin Zou · Huiyu Zhou · Linfeng Zhang · Xuming Hu

Recent text-to-image generative models, e.g., Stable Diffusion V3 and Flux, have achieved notable progress. However, these models are strongly restricted to their limited knowledge, a.k.a., their own fixed parameters, that are trained with closed datasets. This leads to significant hallucinations or distortions when facing fine-grained and unseen novel real-world objects, e.g., the appearance of the Tesla Cybertruck. To this end, we present the first real-object-based retrieval-augmented generation framework (RealRAG), which augments fine-grained and unseen novel object generation by learning and retrieving real-world images to overcome the knowledge gaps of generative models. Specifically, to integrate missing memory for unseen novel object generation, we train a reflective retriever by self-reflective contrastive learning, which injects the generator's knowledge into the sef-reflective negatives, ensuring that the retrieved augmented images compensate for the model's missing knowledge. Furthermore, the real-object-based framework integrates fine-grained visual knowledge for the generative models, tackling the distortion problem and improving the realism for fine-grained object generation. Our Real-RAG is superior in its modular application to all types of state-of-the-art text-to-image generative models and also delivers remarkable performance boosts with all of them, such as a gain of 16.18\% FID score with the auto-regressive model on the Stanford Car benchmark.

Poster

#W-301

PiD: Generalized AI-Generated Images Detection with Pixelwise Decomposition Residuals

Xinghe Fu · Zhiyuan Yan · Zheng Yang · Taiping Yao · Yandan Zhao · Shouhong Ding · Xi Li

Fake images, created by recently advanced generative models, have become increasingly indistinguishable from real ones, making their detection crucial, urgent, and challenging. This paper introduces PiD (Pixelwise Decomposition Residuals), a novel detection method that focuses on residual signals within images. Generative models are designed to optimize high-level semantic content (principal components), often overlooking low-level signals (residual components). PiD leverages this observation by disentangling residual components from images, encouraging the model to uncover more underlying and general forgery clues independent of semantic content. Compared to prior approaches that rely on reconstruction techniques or high-frequency information, PiD is computationally efficient and does not rely on any generative models for reconstruction. Specifically, PiD operates at the pixel level, mapping the pixel vector to another color space (e.g., YUV) and then quantizing the vector. The pixel vector is mapped back to the RGB space and the quantization loss is taken as the residual for AIGC detection. Our experiment results are striking and highly surprising: PiD achieves 98% accuracy on the widely used GenImage benchmark, highlighting the effectiveness and generalization performance.

Poster

#W-302

Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting

Hongbi ZHOU · Zhangkai NI

3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis. However, existing methods struggle to adaptively optimize the distribution of Gaussian primitives based on scene characteristics, making it challenging to balance reconstruction quality and efficiency. Inspired by human perception, we propose scene-adaptive perceptual densification for Gaussian Splatting (Perceptual-GS), a novel framework that integrates perceptual sensitivity into the 3DGS training process to address this challenge. We first introduce a perception-aware representation that models human visual sensitivity while constraining the number of Gaussian primitives. Building on this foundation, we develop a perceptual sensitivity-adaptive distribution to allocate finer Gaussian granularity to visually critical regions, enhancing reconstruction quality and robustness. Extensive evaluations on multiple datasets, including BungeeNeRF for large-scale scenes, demonstrate that Perceptual-GS achieves state-of-the-art performance in reconstruction quality, efficiency, and robustness. The code is publicly available at: https://github.com/eezkni/Perceptual-GS

Poster

#W-303

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin · Zhelun Shi · Jiwen Yu · Xijun Wang · Enshen Zhou · Lijun Li · Zhenfei Yin · Xihui Liu · Lu Sheng · Jing Shao · LEI BAI · Ruimao Zhang

Recent advancements in predictive models have demonstrated exceptional capabilities in predicting the future state of objects and scenes. However, the lack of categorization based on inherent characteristics continues to hinder the progress of predictive model development. Additionally, existing benchmarks are unable to effectively evaluate higher-capability, highly embodied predictive models from an embodied perspective. In this work, we classify the functionalities of predictive models into a hierarchy and take the first step in evaluating World Simulators by proposing a dual evaluation framework called WorldSimBench. WorldSimBench includes Explicit Perceptual Evaluation and Implicit Manipulative Evaluation, encompassing human preference assessments from the visual perspective and action-level evaluations in embodied tasks, covering three representative embodied scenarios: Open-Ended Embodied Environment, Autonomous, Driving, and Robot Manipulation. In the Explicit Perceptual Evaluation, we introduce the HF-Embodied Dataset, a video assessment dataset based on fine-grained human feedback, which we use to train a Human Preference Evaluator that aligns with human perception and explicitly assesses the visual fidelity of World Simulater. In the Implicit Manipulative Evaluation, we assess the video-action consistency of World Simulators by evaluating whether the generated situation-aware video can be accurately translated into the correct control signals in dynamic environments. Our comprehensive evaluation offers key insights that can drive further innovation in video generation models, positioning World Simulators as a pivotal advancement toward embodied artificial intelligence.

Poster

#W-304

FeatSharp: Your Vision Model Features, Sharper

Mike Ranzinger · Greg Heinrich · Pavlo Molchanov · Bryan Catanzaro · Andrew Tao

The feature maps of vision encoders are fundamental to myriad modern AI tasks, ranging from core perception algorithms (e.g. semantic segmentation, object detection, depth perception, etc.) to modern multimodal understanding in vision-language models (VLMs). Currently, in computer vision, the frontier of general purpose vision backbones is Vision Transformers (ViT), typically trained using contrastive loss (e.g. CLIP). A key problem with most off-the-shelf ViTs, particularly CLIP, is that these models are inflexibly low resolution. Most run at $224 \times 224$px, while the "high-resolution" versions are around $378-448$px, but still inflexible. We introduce a novel method to coherently and cheaply upsample the feature maps of low-resolution vision encoders while picking up on fine-grained details that would otherwise be lost due to resolution. We demonstrate the effectiveness of this approach on core perception tasks as well as within agglomerative model training using RADIO as a way of providing richer targets for distillation. Code available at https://github.com/NVlabs/FeatSharp

Poster

#W-305

GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing

Akashah Shabbir · Ilmuz Zaman Mohammed Zumri · Mohammed Bennamoun · Fahad Khan · Salman Khan

Recent advances in large multimodal models (LMMs) have recognized fine-grained grounding as an imperative factor of visual understanding and dialogue. However, the benefits of such representation in LMMs are limited to the natural image domain, and these models perform poorly for remote sensing (RS). The distinct overhead viewpoint, scale variation, and presence of small objects in high-resolution RS imagery present a unique challenge in region-level comprehension. Moreover, the development of the grounding conversation capability of LMMs within RS is hindered by the lack of granular, RS domain-specific grounded data. Addressing these limitations, we propose GeoPixel - the first end-to-end high-resolution RS-LMM that supports pixel-level grounding. This capability allows fine-grained visual perception by generating interleaved masks in conversation. GeoPixel supports up to 4K HD resolution in any aspect ratio, ideal for high-precision RS image analysis. To support the grounded conversation generation (GCG) in RS imagery, we curate a visually grounded dataset GeoPixelD through a semi-automated pipeline that utilizes set-of-marks prompting and spatial priors tailored for RS data to methodically control the data generation process. GeoPixel demonstrates superior performance in pixel-level comprehension, surpassing existing LMMs in both single-target and multi-target segmentation tasks. Our methodological ablation studies validate the effectiveness of each component in the overall architecture. Our code and data will be publicly released.

Poster

#W-306

EFDTR: Learnable Elliptical Fourier Descriptor Transformer for Instance Segmentation

Jiawei Cao · Chaochen Gu · Hao Cheng · Xiaofeng Zhang · Kaijie Wu · Changsheng Lu

Polygon-based object representations efficiently model object boundaries but are limited by high optimization complexity, which hinders their adoption compared to more flexible pixel-based methods. In this paper, we introduce a novel vertex regression loss grounded in Fourier elliptic descriptors, which removes the need for rasterization or heuristic approximations and resolves ambiguities in boundary point assignment through frequency-domain matching.To advance polygon-based instance segmentation, we further propose EFDTR (\textbf{E}lliptical \textbf{F}ourier \textbf{D}escriptor \textbf{Tr}ansformer), an end-to-end learnable framework that leverages the expressiveness of Fourier-based representations. The model achieves precise contour predictions through a two-stage approach: the first stage predicts elliptical Fourier descriptors for global contour modeling, while the second stage refines contours for fine-grained accuracy. Experimental results on the COCO dataset show that EFDTR outperforms existing polygon-based methods, offering a promising alternative to pixel-based approaches. Code is available at \url{https://github.com/chrisclear3/EFDTR}.

Spotlight Poster

#W-307

TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting

Qihe Huang · Zhengyang Zhou · Kuo Yang · Zhongchao Yi · Xu Wang · Yang Wang

Long-term time series forecasting (LTSF) has traditionally relied on large parameters to capture extended temporal dependencies, resulting in substantial computational costs and inefficiencies in both memory usage and processing time. However, time series data, unlike high-dimensional images or text, often exhibit temporal pattern similarity and low-rank structures, especially in long-term horizons. By leveraging this structure, models can be guided to focus on more essential, concise temporal data, improving both accuracy and computational efficiency. In this paper, we introduce TimeBase, an ultra-lightweight network to harness the power of minimalism in LTSF. TimeBase 1) extracts core basis temporal components and 2) transforms traditional point-level forecasting into efficient segment-level forecasting, achieving optimal utilization of both data and parameters. Extensive experiments on diverse real-world datasets show that TimeBase achieves remarkable efficiency and secures competitive forecasting performance. Additionally, TimeBase can also serve as a very effective plug-and-play complexity reducer for any patch-based forecasting models. Code is available at \url{https://github.com/hqh0728/TimeBase}.

Poster

#W-308

Separating Knowledge and Perception with Procedural Data

Adrian Rodriguez-Munoz · Manel Baradad · Phillip Isola · Antonio Torralba

We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory---an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within 1\% on NIGHTS visual similarity, outperforms by 8\% and 15\% on CUB200 and Flowers102 fine-grained classification, and is within 10\% on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an $R^2$ on COCO within 10\% of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.

Poster

#W-309

Semantics-aware Test-time Adaptation for 3D Human Pose Estimation

Qiuxia Lin · Glory Rongyu CHEN · Kerui Gu · Angela Yao

This work highlights a semantics misalignment in 3D human pose estimation. For the task of test-time adaptation, the misalignment manifests as overly smoothed and unguided predictions. The smoothing settles predictions towards some average pose. Furthermore, when there are occlusions or truncations, the adaptation becomes fully unguided. To this end, we pioneer the integration of a semantics-aware motion prior for the test-time adaptation of 3D pose estimation. We leverage video understanding and a well-structured motion-text space to adapt the model motion prediction to adhere to video semantics during test time. Additionally, we incorporate a missing 2D pose completion based on the motion-text similarity. The pose completion strengthens the motion prior's guidance for occlusions and truncations. Our method significantly improves state-of-the-art 3D human pose estimation TTA techniques, with more than 12% decrease in PA-MPJPE on 3DPW and 3DHP.

Poster

#W-310

Are High-Quality AI-Generated Images More Difficult for Models to Detect?

Yao Xiao · Binbin Yang · Weiyan Chen · Jiahao Chen · Zijie Cao · ZiYi Dong · Xiangyang Ji · Liang Lin · Wei Ke · Pengxu Wei

The remarkable evolution of generative models has enabled the generation of high-quality, visually attractive images, often perceptually indistinguishable from real photographs to human eyes. This has spurred significant attention on AI-generated image (AIGI) detection. Intuitively, higher image quality should increase detection difficulty. However, our systematic study on cutting-edge text-to-image generators reveals a counterintuitive finding: AIGIs with higher quality scores, as assessed by human preference models, tend to be more easily detected by existing models. To investigate this, we examine how the text prompts for generation and image characteristics influence both quality scores and detector accuracy. We observe that images from short prompts tend to achieve higher preference scores while being easier to detect. Furthermore, through clustering and regression analyses, we verify that image characteristics like saturation, contrast, and texture richness collectively impact both image quality and detector accuracy. Finally, we demonstrate that the performance of off-the-shelf detectors can be enhanced across diverse generators and datasets by selecting input patches based on the predicted scores of our regression models, thus substantiating the broader applicability of our findings. Code and data are available at \href{https://github.com/Coxy7/AIGI-Detection-Quality-Paradox}{GitHub}.

Poster

#W-311

Position: Democratic AI is Possible. The Democracy Levels Framework Shows How It Might Work.

Aviv Ovadya · Kyle Redman · Luke Thorburn · Quan Ze Chen · Oliver Smith · Flynn Devine · Andrew Konya · Smitha Milli · Manon Revel · Kevin Feng · Amy Zhang · Bilva Chandra · Michiel Bakker · Atoosa Kasirzadeh

This position paper argues that effectively "democratizing AI" requires democratic governance and alignment of AI, and that this is particularly valuable for decisions with systemic societal impacts. Initial steps—such as Meta's Community Forums and Anthropic's Collective Constitutional AI—have illustrated a promising direction, where democratic processes could be used to meaningfully improve public involvement and trust in critical decisions. To more concretely explore what increasingly democratic AI might look like, we provide a "Democracy Levels" framework and associated tools that: (i) define milestones toward meaningfully democratic AI—which is also crucial for substantively pluralistic, human-centered, participatory, and public-interest AI, (ii) can help guide organizations seeking to increase the legitimacy of their decisions on difficult AI governance and alignment questions, and (iii) support the evaluation of such efforts.

Poster

#W-312

Few-Shot Learner Generalizes Across AI-Generated Image Detection

Shiyu Wu · Jing Liu · Jing Li · Yequan Wang

Current fake image detectors trained on large synthetic image datasets perform satisfactorily on limited studied generative models. However, these detectors suffer a notable performance decline over unseen models. Besides, collecting adequate training data from online generative models is often expensive or infeasible. To overcome these issues, we propose Few-Shot Detector (FSD), a novel AI-generated image detector which learns a specialized metric space for effectively distinguishing unseen fake images using very few samples. Experiments show that FSD achieves state-of-the-art performance by $+11.6\%$ average accuracy on the GenImage dataset with only $10$ additional samples. More importantly, our method is better capable of capturing the intra-category commonality in unseen images without further training. Our code is available at https://github.com/teheperinko541/Few-Shot-AIGI-Detector.

Poster

#W-313

CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty

Harry Zhang · Luca Carlone

We introduce CUPS, a novel method for learning sequence-to-sequence 3D human shapes and poses from RGB videos with uncertainty quantification. To improve on top of prior work, we develop a method to {generate and score multiple hypotheses during training}, effectively integrating uncertainty quantification into the learning process. This process results in a deep uncertainty function that is trained end-to-end with the 3D pose estimator. Post-training, the learned deep uncertainty model is used as the conformity score, which can be used to calibrate a conformal predictor in order to {assess} the quality of the output prediction. Since the data in human pose-shape learning is not fully exchangeable, we also {present} two practical bounds for the coverage gap in conformal prediction, developing theoretical backing for the uncertainty bound of our model. Our results indicate thatby taking advantage of deep uncertainty with conformal prediction, our method achieves state-of-the-art performance across variousmetrics and datasets while inheriting the probabilistic guarantees of conformal prediction. Interactive 3D visualization, code, and data will be available at https://sites.google.com/view/champpp.

Poster

#W-314

Contrastive Visual Data Augmentation

Yu Zhou · Bingxuan Li · Mohan Tang · Xiaomeng Jin · Te-Lin Wu · Kuan-Hao Huang · Heng Ji · Kai-Wei Chang · Nanyun Peng

Large multimodal models (LMMs) often struggle to recognize novel concepts, as they rely on pre-trained knowledge and have limited ability to capture subtle visual details. Domain-specific knowledge gaps in training also make them prone to confusing visually similar, commonly misrepresented, or low-resource concepts. To help LMMs better align nuanced visual features with language, improving their ability to recognize and reason about novel or rare concepts, we propose a Contrastive visual Data Augmentation (CoDA) strategy. CoDA extracts key contrastive textual and visual features of target concepts against the known concepts they are misrecognized as, and then uses multimodal generative models to produce targeted synthetic data. Automatic filtering of extracted features and augmented images is implemented to guarantee their quality, as verified by human annotators. We show the effectiveness and efficiency of CoDA on low-resource concept and diverse scene recognition datasets including INaturalist and SUN. We additionally collect NovelSpecies, a benchmark dataset consisting of newly discovered animal species that are guaranteed to be unseen by LMMs. LLaVA-1.6 1-shot updating results on these three datasets show CoDA significantly improves SOTA visual data augmentation strategies by 12.3% (NovelSpecies), 5.1% (SUN), and 6.0% (iNat) absolute gains in accuracy.

Poster

#W-315

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Xu Wan · Wenyue Xu · Chao Yang · Mingyang Sun

Recent advancements in Large Language Models (LLMs) and Reinforcement Learning (RL) have shown significant promise in decision-making tasks. Nevertheless, for large-scale industrial decision problems, both approaches face distinct challenges: LLMs lack real-time long-sequence decision-making capabilities, while RL struggles with sample efficiency in vast action spaces. To bridge this gap, we propose Agents Co-Evolution (ACE), a synergistic framework between LLMs and RL agent for large-scale decision-making scenarios. ACE introduces a dual-role trajectory refinement mechanism where LLMs act as both Policy Actor and Value Critic during RL's training: the Actor refines suboptimal actions via multi-step reasoning and environment validation, while the Critic performs temporal credit assignment through trajectory-level reward shaping. Concurrently, RL agent enhance LLMs' task-specific decision-making via prioritized experience replay.Through extensive experiments across multiple power grid operation challenges with action spaces exceeding 60K discrete actions, ACE demonstrates superior performance over existing RL methods and LLM-based methods.

Poster

#W-316

Causal Invariance-aware Augmentation for Brain Graph Contrastive Learning

Minqi Yu · Jinduo Liu · Junzhong Ji

Deep models are increasingly used to analyze brain graphs for the diagnosis and understanding of brain diseases. However, due to the multi-site data aggregation and individual differences, brain graph datasets exhibit widespread distribution shifts, which impair the model’s generalization ability to the test set, thereby limiting the performance of existing methods. To address these issues, we propose a Causally Invariance-aware Augmentation for brain Graph Contrastive Learning, called CIA-GCL. This method first generates a brain graph by extracting node features based on the topological structure. Then, a learnable brain invariant subgraph is identified based on a causal decoupling approach to capture the maximum label-related invariant information with invariant learning. Around this invariant subgraph, we design a novel invariance-aware augmentation strategy to generate meaningful augmented samples for graph contrast learning. Finally, the extracted invariant subgraph is utilized for brain disease classification, effectively mitigating distribution shifts while also identifying critical local graph structures, enhancing the model’s interpretability. Experiments on three real-world brain disease datasets demonstrate that our method achieves state-of-the-art performance, effectively generalizes to multi-site brain datasets, and provides certain interpretability.

Poster

#W-317

PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models

Alex Velez-Arce · Marinka Zitnik

Existing biomedical benchmarks do not provide end-to-end infrastructure for training, evaluation, and inference of models that integrate multimodal biological data and a broad range of machine learning tasks in therapeutics. We present PyTDC, an open-source machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models. PyTDC unifies distributed, heterogeneous, continuously updated data sources and model weights and standardizes benchmarking and inference endpoints. This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task. We find state-of-the-art methods in graph representation learning and domain-specific methods from graph theory perform poorly on this task. Though we find a context-aware geometric deep learning method that outperforms the evaluated SoTA and domain-specific baseline methods, the model is unable to generalize to unseen cell types or incorporate additional modalities, highlighting PyTDC's capacity to facilitate an exciting avenue of research developing multimodal, context-aware, foundation models for open problems in biomedical AI.

Poster

#W-318

EmoGrowth: Incremental Multi-label Emotion Decoding with Augmented Emotional Relation Graph

Kaicheng Fu · Changde Du · Jie Peng · Kunpeng Wang · Shuangchen Zhao · Xiaoyu Chen · Huiguang He

Emotion recognition systems face significant challenges in real-world applications, where novel emotion categories continually emerge and multiple emotions often co-occur. This paper introduces multi-label fine-grained class incremental emotion decoding, which aims to develop models capable of incrementally learning new emotion categories while maintaining the ability to recognize multiple concurrent emotions. We propose an Augmented Emotional Semantics Learning (AESL) framework to address two critical challenges: past- and future-missing partial label problems. AESL incorporates an augmented Emotional Relation Graph (ERG) for reliable soft label generation and affective dimension-based knowledge distillation for future-aware feature learning. We evaluate our approach on three datasets spanning brain activity and multimedia domains, demonstrating its effectiveness in decoding up to 28 fine-grained emotion categories. Results show that AESL significantly outperforms existing methods while effectively mitigating catastrophic forgetting. Our code is available at https://github.com/ChangdeDu/EmoGrowth.

Poster

#W-319

ADIOS: Antibody Development via Opponent Shaping

Sebastian Towers · Aleksandra Kalisz · Philippe Robert · Alicia Higueruelo · Francesca Vianello · Chloe Tsai · Harrison Steel · Jakob Foerster

Anti-viral therapies are typically designed to target only the current strains of a virus, a myopic response. However, therapy-induced selective pressures drive the emergence of new viral strains, against which the original myopic therapies are no longer effective. This evolutionary response presents an opportunity: our therapies could both defend against and actively influence viral evolution. This motivates our method ADIOS: Antibody Development vIa Opponent Shaping. ADIOS is a meta-learning framework where the process of antibody therapy design, the outer loop, accounts for the virus's adaptive response, the inner loop. With ADIOS, antibodies are not only robust against potential future variants, they also influence, i.e. shape, which future variants emerge. In line with the opponent shaping literature, we refer to our optimised antibodies as shapers. To demonstrate the value of ADIOS, we build a viral evolution simulator using the Absolut! framework, in which shapers successfully target both current and future viral variants, outperforming myopic antibodies. Furthermore, we show that shapers modify the distribution over viral evolutionary trajectories to result in weaker variants.We believe that our ADIOS paradigm will facilitate the discovery of long-lived vaccines and antibody therapies while also generalising to other domains. Specifically, domains such as antimicrobial resistance, cancer treatment, and others with evolutionarily adaptive opponents. Our code is available at https://github.com/olakalisz/adios.

Poster

#W-320

Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing

Jie Peng · Jenna Ballard · Mohan Zhang · Sukwon Yun · Jiayi Xin · Qi Long · Yanyong Zhang · Tianlong Chen

Medical multi-modal learning requires an effective fusion capability of various heterogeneous modalities.One vital challenge is how to effectively fuse modalities when their data quality varies across different modalities and patients.For example, in the TCGA benchmark, the performance of the same modality can differ between types of cancer. Moreover, data collected at different times, locations, and with varying reagents can introduce inter-modal data quality differences ($i.e.$, $\textbf{Modality Batch Effect}$).In response, we propose ${\textbf{A}}$daptive ${\textbf{M}}$odality Token Re-Balan${\textbf{C}}$ing ($\texttt{AMC}$), a novel top-down dynamic multi-modal fusion approach.The core of $\texttt{AMC}$ is to quantify the significance of each modality (Top) and then fuse them according to the modality importance (Down).Specifically, we access the quality of each input modality and then replace uninformative tokens with inter-modal tokens, accordingly.The more important a modality is, the more informative tokens are retained from that modality.The self-attention will further integrate these mixed tokens to fuse multi-modal knowledge.Comprehensive experiments on both medical and general multi-modal datasets demonstrate the effectiveness and generalizability of $\texttt{AMC}$.

Poster

#W-321

Maximum Entropy Reinforcement Learning with Diffusion Policy

Xiaoyi Dong · Jian Cheng · Xi Zhang

The Soft Actor-Critic (SAC) algorithm with a Gaussian policy has become a mainstream implementation for realizing the Maximum Entropy Reinforcement Learning (MaxEnt RL) objective, which incorporates entropy maximization to encourage exploration and enhance policy robustness. While the Gaussian policy performs well on simpler tasks, its exploration capacity and potential performance in complex multi-goal RL environments are limited by its inherent unimodality. In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective, developing a method named MaxEnt RL with Diffusion Policy (MaxEntDP). Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy. Experimental results on Mujoco benchmarks show that MaxEntDP outperforms the Gaussian policy and other generative models within the MaxEnt RL framework, and performs comparably to other state-of-the-art diffusion-based online RL algorithms. Our code is available at https://github.com/diffusionyes/MaxEntDP.

Poster

#W-400

Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting

Zhining Liu · Ze Yang · Xiao Lin · Ruizhong Qiu · Tianxin Wei · Yada Zhu · Hendrik Hamann · Jingrui He · Hanghang Tong

Time-series forecasting plays a critical role in many real-world applications. Although increasingly powerful models have been developed and achieved superior results on benchmark datasets, through a fine-grained sample-level inspection, we find that (i) no single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases. These findings prompt us to explore how to adaptively leverage the distinct strengths of various forecasting models for different samples. We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models. TimeFuse utilizes meta-features to characterize input time series and trains a learnable fusor to predict optimal model fusion weights for any given input. The fusor can leverage samples from diverse datasets for joint training, allowing it to adapt to a wide variety of temporal patterns and thus generalize to new inputs, even from unseen datasets. Extensive experiments demonstrate the effectiveness of TimeFuse in various long-/short-term forecasting tasks, achieving near-universal improvement over the state-of-the-art individual models. Code is available at https://github.com/ZhiningLiu1998/TimeFuse.

Poster

#W-402

Reinforcement Learning Control of a Physical Robot Device for Assisted Human Walking without a Simulator

junmin zhong · Emiliano Quinones Yumbla · Seyed Yousef Soltanian · Ruofan Wu · Wenlong Zhang · Jennie Si

This study presents an innovative reinforcement learning (RL) control approach to facilitate soft exosuit-assisted human walking. Our goal is to address the ongoing challenges in developing reliable RL-based methods for controlling physical devices. To overcome key obstacles—such as limited data, the absence of a simulator for human-robot interaction during walking, the need for low computational overhead in real-time deployment, and the demand for rapid adaptation to achieve personalized control while ensuring human safety—we propose an online Adaptation from an offline Imitating Expert Policy (AIP) approach. Our offline learning mimics human expert actions through real human walking demonstrations without robot assistance. The resulted policy is then used to initialize online actor-critic learning, the goal of which is to optimally personalize robot assistance. In addition to being fast and robust, our online RL method also posses important properties such as learning convergence, dynamic stability, and solution optimality. We have successfully demonstrated our simpleand robust framework for safe robot control on all five tested human participants, without selectively presenting results. The qualitative performance guarantees provided by our online RL, along with the consistent experimental validation of AIP control, represent the first demonstration of online adaptation for softsuit control personalization and serve as important evidence for the use of online RL in controlling a physical device to solve a real-life problem.

Poster

#W-403

Empowering World Models with Reflection for Embodied Video Prediction

Xiaowei Chi · Chun-Kai Fan · Hengyuan Zhang · Xingqun Qi · Rongyu Zhang · Anthony Chen · Chi-Min Chan · Wei Xue · Qifeng Liu · Shanghang Zhang · Yike Guo

Video generation models have made significant progress in simulating future states, showcasing their potential as world simulators in embodied scenarios. However, existing models often lack robust understanding, limiting their ability to perform multi-step predictions or handle Out-of-Distribution (OOD) scenarios. To address this challenge, we propose the Reflection of Generation (RoG), a set of intermediate reasoning strategies designed to enhance video prediction. It leverages the complementary strengths of pre-trained vision-language and video generation models, enabling them to function as a world model in embodied scenarios. To support RoG, we introduce Embodied Video Anticipation Benchmark(EVA-Bench), a comprehensive benchmark that evaluates embodied world models across diverse tasks and scenarios, utilizing both in-domain and OOD datasets. Building on this foundation, we devise a world model, Embodied Video Anticipator (EVA), that follows a multistage training paradigm to generate high-fidelity video frames and apply an autoregressive strategy to enable adaptive generalization for longer video sequences. Extensive experiments demonstrate the efficacy of EVA in various downstream tasks like video generation and robotics, thereby paving the way for large-scale pre-trained models in real-world video prediction applications. The video demos are available at https://sites.google.com/view/icml-eva.

Poster

#W-404

Hierarchical Equivariant Policy via Frame Transfer

Haibo Zhao · Dian Wang · Yizhe Zhu · Xupeng Zhu · Owen Howell · Linfeng Zhao · Yaoyao Qian · Robin Walters · Robert Platt

Recent advances in hierarchical policy learning highlight the advantages of decomposing systems into high-level and low-level agents, enabling efficient long-horizon reasoning and precise fine-grained control. However, the interface between these hierarchy levels remains underexplored, and existing hierarchical methods often ignore domain symmetry, resulting in the need for extensive demonstrations to achieve robust performance. To address these issues, we propose Hierarchical Equivariant Policy (HEP), a novel hierarchical policy framework. We propose a frame transfer interface for hierarchical policy learning, which uses the high-level agent's output as a coordinate frame for the low-level agent, providing a strong inductive bias while retaining flexibility. Additionally, we integrate domain symmetries into both levels and theoretically demonstrate the system's overall equivariance. HEP achieves state-of-the-art performance in complex robotic manipulation tasks, demonstrating significant improvements in both simulation and real-world settings.

Poster

#W-405

VIP: Vision Instructed Pre-training for Robotic Manipulation

Zhuoling Li · LiangLiang Ren · Jinrong Yang · Yong Zhao · Xiaoyang Wu · Zhenhua Xu · Xiang Bai · Hengshuang Zhao

The effectiveness of scaling up training data in robotic manipulation is still limited. A primary challenge in manipulation is the tasks are diverse, and the trained policy would be confused if the task targets are not specified clearly. Existing works primarily rely on text instruction to describe targets. However, we reveal that current robotic data cannot train policies to understand text instruction effectively, and vision is much more comprehensible. Therefore, we introduce utilizing vision instruction to specify targets. A straightforward implementation is training a policy to predict the intermediate actions linking the current observation and a future image. Nevertheless, a single future image does not describe the task target in insufficient detail. To handle this problem, we propose to use sparse point flows to provide more detailed information. Extensive tasks are designed based on real and simulated environments to evaluate the effectiveness of our vision instructed pre-training (VIP) method. The results indicate VIP improves the performance on diverse tasks significantly, and the derived policy can complete competitive tasks like ``opening the lid of a tightly sealed bottle''.

Poster

#W-406

Test-Time Adaptation for Online Vision-Language Navigation with Feedback-based Reinforcement Learning

Sung June Kim · Gyeongrok Oh · Heeju Ko · Daehyun Ji · Dongwook Lee · Byung-Jun Lee · Sujin Jang · Sangpil Kim

Navigating in an unfamiliar environment during deployment poses a critical challenge for a vision-language navigation (VLN) agent. Yet, test-time adaptation (TTA) remains relatively underexplored in robotic navigation, leading us to the fundamental question: what are the key properties of TTA for online VLN? In our view, effective adaptation requires three qualities: 1) flexibility in handling different navigation outcomes, 2) interactivity with external environment, and 3) maintaining a harmony between plasticity and stability. To address this, we introduce FeedTTA, a novel TTA framework for online VLN utilizing feedback-based reinforcement learning. Specifically, FeedTTA learns by maximizing binary episodic feedback, a practical setup in which the agent receives a binary scalar after each episode that indicates the success or failure of the navigation. Additionally, we propose a gradient regularization technique that leverages the binary structure of FeedTTA to achieve a balance between plasticity and stability during adaptation. Our extensive experiments on challenging VLN benchmarks demonstrate the superior adaptability of FeedTTA, even outperforming the state-of-the-art offline training methods in REVERIE benchmark with a single stream of learning.

Poster

#W-407

TS-SNN: Temporal Shift Module for Spiking Neural Networks

Kairong Yu · Tianqing Zhang · Qi Xu · Gang Pan · Hongwei Wang

Spiking Neural Networks (SNNs) are increasingly recognized for their biological plausibility and energy efficiency, positioning them as strong alternatives to Artificial Neural Networks (ANNs) in neuromorphic computing applications. SNNs inherently process temporal information by leveraging the precise timing of spikes, but balancing temporal feature utilization with low energy consumption remains a challenge. In this work, we introduce Temporal Shift module for Spiking Neural Networks (TS-SNN), which incorporates a novel Temporal Shift (TS) module to integrate past, present, and future spike features within a single timestep via a simple yet effective shift operation. A residual combination method prevents information loss by integrating shifted and original features. The TS module is lightweight, requiring only one additional learnable parameter, and can be seamlessly integrated into existing architectures with minimal additional computational cost. TS-SNN achieves state-of-the-art performance on benchmarks like CIFAR-10 (96.72\%), CIFAR-100 (80.28\%), and ImageNet (70.61\%) with fewer timesteps, while maintaining low energy consumption. This work marks a significant step forward in developing efficient and accurate SNN architectures.

Spotlight Poster

#W-408

Discovering Symbolic Cognitive Models from Human and Animal Behavior

Pablo Samuel Castro · Nenad Tomasev · Ankit Anand · Navodita Sharma · Rishika Mohanta · Aparna Dev · Kuba Perlin · Siddhant Jain · Kyle Levin · Noemi Elteto · Will Dabney · Alexander Novikov · Glenn Turner · Maria Eckstein · Nathaniel Daw · Kevin Miller · Kimberly Stachenfeld

Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cognitive process. Identifying an appropriate model typically requires a great deal of effort and ingenuity on the part of a human scientist.Here, we adapt FunSearch (Romera-Paredes et al. 2024), a recently developed tool that uses Large Language Models (LLMs) in an evolutionary algorithm, to automatically discover symbolic cognitive models that accurately capture human and animal behavior.We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each.The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. Broadly, these results demonstrate the viability of using LLM-powered program synthesis to propose novel scientific hypotheses regarding mechanisms of human and animal cognition.

Poster

#W-409

Hybrid Spiking Vision Transformer for Object Detection with Event Cameras

Qi Xu · Jie Deng · Jiangrong Shen · Biwu Chen · Huajin Tang · Gang Pan

Event-based object detection has attracted increasing attention for its high temporal resolution, wide dynamic range, and asynchronous address-event representation. Leveraging these advantages, spiking neural networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich spatiotemporal dynamics. To further enhance the performance of event-based object detection, this study proposes a novel hybrid spike vision Transformer (HsVT) model. The HsVT model integrates a spatial feature extraction module to capture local and global features, and a temporal feature extraction module to model time dependencies and long-term patterns in event sequences. This combination enables HsVT to capture spatiotemporal features, improving its capability in handling complex event-based object detection tasks. To support research in this area, we developed the Fall Detection dataset as a benchmark for event-based object detection tasks. The Fall DVS detection dataset protects facial privacy and reduces memory usage thanks to its event-based representation. Experimental results demonstrate that HsVT outperforms existing SNN methods and achieves competitive performance compared to ANN-based models, with fewer parameters and lower energy consumption.

Poster

#W-410

Neural Representational Consistency Emerges from Probabilistic Neural-Behavioral Representation Alignment

Yu Zhu · Chunfeng Song · Wanli Ouyang · Shan Yu · Tiejun Huang

Individual brains exhibit striking structural and physiological heterogeneity, yet neural circuits can generate remarkably consistent functional properties across individuals, an apparent paradox in neuroscience. While recent studies have observed preserved neural representations in motor cortex through manual alignment across subjects, the zero-shot validation of such preservation and its generalization to more cortices remain unexplored. Here we present PNBA (Probabilistic Neural-Behavioral Representation Alignment), a new framework that leverages probabilistic modeling to address hierarchical variability across trials, sessions, and subjects, with generative constraints preventing representation degeneration. By establishing reliable cross-modal representational alignment, PNBA reveals robust preserved neural representations in monkey primary motor cortex (M1) and dorsal premotor cortex (PMd) through zero-shot validation. We further establish similar representational preservation in mouse primary visual cortex (V1), reflecting a general neural basis. These findings resolve the paradox of neural heterogeneity by establishing zero-shot preserved neural representations across cortices and species, enriching neural coding insights and enabling zero-shot behavior decoding.

Poster

#W-411

Accurate Identification of Communication Between Multiple Interacting Neural Populations

Belle Liu · Jacob I Sacks · Matthew Golub

Neural recording technologies now enable simultaneous recording of population activity across multiple brain regions, motivating the development of data-driven models of communication between recorded brain regions. Existing models can struggle to disentangle communication from the effects of unrecorded regions and local neural population dynamics. Here, we introduce Multi-Region Latent Factor Analysis via Dynamical Systems (MR-LFADS), a sequential variational autoencoder composed of region-specific recurrent networks. MR-LFADS features structured information bottlenecks, data-constrained communication, and unsupervised inference of unobserved inputs--features that specifically support disentangling of inter-regional communication, inputs from unobserved regions, and local population dynamics. MR-LFADS outperforms existing approaches at identifying communication across dozens of simulations of task-trained multi-region networks. Applied to large-scale electrophysiology, MR-LFADS predicts brain-wide effects of circuit perturbations that were not seen during model fitting. These validations on synthetic and real neural data suggest that MR-LFADS could serve as a powerful tool for uncovering the principles of brain-wide information processing.

Poster

#W-412

Efficient ANN-SNN Conversion with Error Compensation Learning

chang liu · Jiangrong Shen · Xuming Ran · Mingkun Xu · Qi Xu · Yi Xu · Gang Pan

Artificial neural networks (ANNs) have demonstrated outstanding performance in numerous tasks, but deployment in resource-constrained environments remains a challenge due to their high computational and memory requirements. Spiking neural networks (SNNs) operate through discrete spike events and offer superior energy efficiency, providing a bio-inspired alternative. However, current ANN-to-SNN conversion often results in significant accuracy loss and increased inference time due to conversion errors such as clipping, quantization, and uneven activation. This paper proposes a novel ANN-to-SNN conversion framework based on error compensation learning. We introduce a learnable threshold clipping function, dual-threshold neurons, and an optimized membrane potential initialization strategy to mitigate the conversion error. Together, these techniques address the clipping error through adaptive thresholds, dynamically reduce the quantization error through dual-threshold neurons, and minimize the non-uniformity error by effectively managing the membrane potential. Experimental results on CIFAR-10, CIFAR-100, ImageNet datasets show that our method achieves high-precision and ultra-low latency among existing conversion methods. Using only two time steps, our method significantly reduces the inference time while maintains competitive accuracy of 94.75% on CIFAR-10 dataset under ResNet-18 structure. This research promotes the practical application of SNNs on low-power hardware, making efficient real-time processing possible.

Poster

#W-413

MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data

Yuqin Dai · Zhouheng Yao · Chunfeng Song · Qihao Zheng · Weijian Mai · Kunyu Peng · Shuai Lu · Wanli Ouyang · Jian Yang · Jiamin Wu

Brain decoding aims to reconstruct visual perception of human subject from fMRI signals, which is crucial for understanding brain's perception mechanisms. Existing methods are confined to the single-subject paradigm due to substantial brain variability, which leads to weak generalization across individuals and incurs high training costs, exacerbated by limited availability of fMRI data. To address these challenges, we propose MindAligner, an explicit functional alignment framework for cross-subject brain decoding from limited fMRI data. The proposed MindAligner enjoys several merits. First, we learn a Brain Transfer Matrix (BTM) that projects the brain signals of an arbitrary new subject to one of the known subjects, enabling seamless use of pre-trained decoding models. Second, to facilitate reliable BTM learning, a Brain Functional Alignment module is proposed to perform soft cross-subject brain alignment under different visual stimuli with a multi-level brain alignment loss, uncovering fine-grained functional correspondences with high interpretability. Experiments indicate that MindAligner not only outperforms existing methods in visual decoding under data-limited conditions, but also provides valuable neuroscience insights in cross-subject functional analysis. The code will be made publicly available.

Poster

#W-414

Agent Reviewers: Domain-specific Multimodal Agents with Shared Memory for Paper Review

Kai Lu · Shixiong Xu · Jinqiu Li · Kun Ding · Gaofeng Meng

Feedback from peer review is essential to improve the quality of scientific articles. However, at present, many manuscripts do not receive sufficient external feedback for refinement before or during submission. Therefore, a system capable of providing detailed and professional feedback is crucial for enhancing research efficiency. In this paper, we have compiled the largest dataset of paper reviews to date by collecting historical open-access papers and their corresponding review comments and standardizing them using LLM. We then developed a multi-agent system that mimics real human review processes, based on LLMs. This system, named Agent Reviewers, includes the innovative introduction of multimodal reviewers to provide feedback on the visual elements of papers. Additionally, a shared memory pool that stores historical papers' metadata is preserved, which supplies reviewer agents with background knowledge from different fields. Our system is evaluated using ICLR 2024 papers and achieves superior performance compared to existing AI-based review systems. Comprehensive ablation studies further demonstrate the effectiveness of each module and agent in this system.

Poster

#W-415

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior

Zhongweiyang Xu · Xulin Fan · Zhong-Qiu Wang · Xilin Jiang · Romit Roy Choudhury

Blind Speech Separation (BSS) aims to separate multiple speech sources from audio mixturesrecorded by a microphone array. The problem ischallenging because it is a blind inverse problem,i.e., the microphone array geometry, the room impulse response (RIR), and the speech sources, areall unknown. We propose ArrayDPS to solve theBSS problem in an unsupervised, array-agnostic,and generative manner. The core idea builds ondiffusion posterior sampling (DPS), but unlikeDPS where the likelihood is tractable, ArrayDPSmust approximate the likelihood by formulatinga separate optimization problem. The solution to the optimization approximates room acousticsand the relative transfer functions between microphones. These approximations, along with thediffusion priors, iterate through the ArrayDPSsampling process and ultimately yield separatedvoice sources. We only need a simple single-speaker speech diffusion model as a prior, alongwith the mixtures recorded at the microphones; nomicrophone array information is necessary. Evaluation results show that ArrayDPS outperformsall baseline unsupervised methods while beingcomparable to supervised methods in terms ofSDR. Audio demos and codes are provided at:https://arraydps.github.io/ArrayDPSDemo/ andhttps://github.com/ArrayDPS/ArrayDPS.

Poster

#W-416

PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction

Aaron Wenteler · Martina Occhetta · Nikhil Branson · Victor Curean · Magdalena Huebner · William Dee · William Connell · Siu Chung · Alex Hawkins-Hooker · Yasha Ektefaie · César Miguel Valdez Córdova · Amaya Gallagher-Syed

In silico modeling of transcriptional responses to perturbations is crucial for advancing our understanding of cellular processes and disease mechanisms. We present PertEval-scFM, a standardized framework designed to evaluate models for perturbation effect prediction. We apply PertEval-scFM to benchmark zero-shot single-cell foundation model (scFM) embeddings against baseline models to assess whether these contextualized representations enhance perturbation effect prediction. Our results show that scFM embeddings offer limited improvement over simple baseline models in the zero-shot setting, particularly under distribution shift. Overall, this study provides a systematic evaluation of zero-shot scFM embeddings for perturbation effect prediction, highlighting the challenges of this task and the limitations of current-generation scFMs. Our findings underscore the need for specialized models and high-quality datasets that capture a broader range of cellular states. Source code and documentation can be found at: https://github.com/aaronwtr/PertEval.

Poster

#W-417

SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model

Zhao Yang · jiwei zhu · Bing Su

Inspired by the success of unsupervised pre-training paradigms, researchers have applied these approaches to DNA pre-training. However, we argue that these approaches alone yield suboptimal results because pure DNA sequences lack sufficient information, since their functions are regulated by genomic profiles like chromatin accessibility. Here, we demonstrate that supervised training for genomic profile prediction serves as a more effective alternative to pure sequence pre-training. Furthermore, considering the multi-species and multi-profile nature of genomic profile prediction, we introduce our Species-Profile Adaptive Collaborative Experts (SPACE) that leverages Mixture of Experts (MoE) to better capture the relationships between DNA sequences across different species and genomic profiles, thereby learning more effective DNA representations. Through extensive experiments across various tasks, our model achieves state-of-the-art performance, establishing that DNA models trained with supervised genomic profiles serve as powerful DNA representation learners.

Poster

#W-418

Protein Structure Tokenization: Benchmarking and New Recipe

Xinyu Yuan · Zichen Wang · Marcus Collins · Huzefa Rangwala

Recent years have witnessed a surge in the development of protein structural tokenization methods, which chunk protein 3D structures into discrete or continuous representations. Structure tokenization enables the direct application of powerful techniques like language modeling for protein structures, and large multimodal models to integrate structures with protein sequences and functional texts. Despite the progress, the capabilities and limitations of these methods remain poorly understood due to the lack of a unified evaluation framework. We first introduce StructTokenBench, a framework that comprehensively evaluates the quality and efficiency of structure tokenizers, focusing on fine-grained local substructures rather than global structures, as typical in existing benchmarks. Our evaluations reveal that no single model dominates all benchmarking perspectives. Observations of codebook under-utilization led us to develop AminoAseed, a simple yet effective strategy that enhances codebook gradient updates and optimally balances codebook size and dimension for improved tokenizer utilization and quality. Compared to the leading model ESM3, our method achieves an average of 6.31\% performance improvement across 24 supervised tasks, with sensitivity and utilization rates increased by 12.83\% and 124.03\%, respectively. Source code and model weights are available at https://github.com/KatarinaYuan/StructTokenBench.

Poster

#W-419

BoxLM: Unifying Structures and Semantics of Medical Concepts for Diagnosis Prediction in Healthcare

Yanchao Tan · Hang Lv · Yunfei Zhan · Guofang Ma · Bo Xiong · Carl Yang

Language Models (LMs) have advanced diagnosis prediction by leveraging the semantic understanding of medical concepts in Electronic Health Records (EHRs). Despite these advancements, existing LM-based methods often fail to capture the structures of medical concepts (e.g., hierarchy structure from domain knowledge). In this paper, we propose BoxLM, a novel framework that unifies the structures and semantics of medical concepts for diagnosis prediction. Specifically, we propose a structure-semantic fusion mechanism via box embeddings, which integrates both ontology-driven and EHR-driven hierarchical structures with LM-based semantic embeddings, enabling interpretable medical concept representations.Furthermore, in the box-aware diagnosis prediction module, an evolve-and-memorize patient box learning mechanism is proposed to model the temporal dynamics of patient visits, and a volume-based similarity measurement is proposed to enable accurate diagnosis prediction. Extensive experiments demonstrate that BoxLM consistently outperforms state-of-the-art baselines, especially achieving strong performance in few-shot learning scenarios, showcasing its practical utility in real-world clinical settings.

Poster

#W-420

Drug-TTA: Test-Time Adaptation for Drug Virtual Screening via Multi-task Meta-Auxiliary Learning

Ao Shen · Ming'zhi Yuan · Yingfan MA · Jie Du · qiao Huang · Manning Wang

Virtual screening is a critical step in drug discovery, aiming at identifying potential drugs that bind to a specific protein pocket from a large database of molecules. Traditional docking methods are time-consuming, while learning-based approaches supervised by high-precision conformational or affinity labels are limited by the scarcity of training data. Recently, a paradigm of feature alignment through contrastive learning has gained widespread attention. This method does not require explicit binding affinity scores, but it suffers from the issue of overly simplistic construction of negative samples, which limits their generalization to more difficult test cases. In this paper, we propose Drug-TTA, which leverages a large number of self-supervised auxiliary tasks to adapt the model to each test instance. Specifically, we incorporate the auxiliary tasks into both the training and the inference process via meta-learning to improve the performance of the primary task of virtual screening. Additionally, we design a multi-scale feature based Auxiliary Loss Balance Module (ALBM) to balance the auxiliary tasks to improve their efficiency. Extensive experiments demonstrate that Drug-TTA achieves state-of-the-art (SOTA) performance in all five virtual screening tasks under a zero-shot setting, showing an average improvement of 9.86\% in AUROC metric compared to the baseline without test-time adaptation.

Poster

#W-421

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

David Dai · Peilin Chen · Malinda Lu · Daniel A. Li · Haowen Wei · Hejie Cui · Paul Pu Liang

Recent advances in clinical AI have enabled remarkable progress across many clinical domains. However, existing benchmarks and models are primarily limited to a small set of modalities and tasks, which hinders the development of large-scale multimodal methods that can make holistic assessments of patient health and well-being. To bridge this gap, we introduce Clinical Large-scale Integrative Multimodal Benchmark (CLIMB), a comprehensive clinical benchmark unifying diverse clinical data across imaging, language, temporal, and graph modalities. CLIMB comprises 4.51 million patient samples totaling 19.01 terabytes distributed across 2D imaging, 3D video, time series, graphs, and multimodal data. Through extensive empirical evaluation, we demonstrate that multitask pretraining significantly improves performance on understudied domains, achieving up to 29% improvement in ultrasound and 23% in ECG analysis over single-task learning. Pretraining on CLIMB also effectively improves models' generalization capability to new tasks, and strong unimodal encoder performance translates well to multimodal performance when paired with task-appropriate fusion strategies. Our findings provide a foundation for new architecture designs and pretraining strategies to adavance clinical AI research. Code is released at https://github.com/DDVD233/climb.

Poster

#W-500

Optimal Information Retention for Time-Series Explanations

Jinghang Yue · Jing Wang · Lu Zhang · Shuo Zhang · Da Li · Zhaoyang Ma · Youfang Lin

Explaining deep models for time-series data is crucial for identifying key patterns in sensitive domains, such as healthcare and finance. However, due to the lack of unified optimization criterion, existing explanation methods often suffer from redundancy and incompleteness, where irrelevant patterns are included or key patterns are missed in explanations. To address this challenge, we propose the Optimal Information Retention Principle, where conditional mutual information defines minimizing redundancy and maximizing completeness as optimization objectives. We then derive the corresponding objective function theoretically. As a practical framework, we introduce an explanation framework ORTE, learning a binary mask to eliminate redundant information while mining temporal patterns of explanations. We decouple the discrete mapping process to ensure the stability of gradient propagation, while employing contrastive learning to achieve precise filtering of explanatory patterns through the mask, thereby realizing a trade-off between low redundancy and high completeness. Extensive quantitative and qualitative experiments on synthetic and real-world datasets demonstrate that the proposed principle significantly improves the accuracy and completeness of explanations compared to baseline methods. The code is available at https://github.com/moon2yue/ORTE_public.

Poster

#W-501

Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching

Joan Serrà · Recep Oguz Araz · Dmitry Bogdanov · Yuki Mitsufuji

Detecting musical versions (different renditions of the same piece) is a challenging task with important applications. Because of the ground truth nature, existing approaches match musical versions at the track level (e.g., whole song). However, most applications require to match them at the segment level (e.g., 20s chunks). In addition, existing approaches resort to classification and triplet losses, disregarding more recent losses that could bring meaningful improvements. In this paper, we propose a method to learn from weakly annotated segments, together with a contrastive loss variant that outperforms well-studied alternatives. The former is based on pairwise segment distance reductions, while the latter modifies an existing loss following decoupling, hyper-parameter, and geometric considerations. With these two elements, we do not only achieve state-of-the-art results in the standard track-level evaluation, but we also obtain a breakthrough performance in a segment-level evaluation. We believe that, due to the generality of the challenges addressed here, the proposed methods may find utility in domains beyond audio or musical version matching.

Poster

#W-502

FLAM: Frame-Wise Language-Audio Modeling

Yusong Wu · Christos Tsirigotis · Ke Chen · Cheng-Zhi Anna Huang · Aaron Courville · Oriol Nieto · Prem Seetharaman · Justin Salamon

Recent multi-modal audio-language models (ALMs) excel at text-audio retrieval but struggle with frame-wise audio understanding. Prior works use temporal-aware labels or unsupervised training to improve frame-wise capabilities, but they still lack fine-grained labeling capability to pinpoint when an event occurs. While traditional sound event detection models can precisely localize events, they are limited to pre-defined categories, making them ineffective for real-world scenarios with out-of-distribution events. In this work, we introduce FLAM, an open-vocabulary contrastive audio-language model capable of localizing specific sound events. FLAM employs a memory-efficient and calibrated frame-wise objective with logit adjustment to address spurious correlations, such as event dependencies and label imbalances during training. To enable frame-wise supervision, we leverage a large-scale dataset with diverse audio events, LLM-generated captions and simulation. Experimental results and case studies demonstrate that FLAM significantly improves the open-vocabulary localization capability while maintaining strong performance in global retrieval and downstream tasks.

Poster

#W-503

The Case for Learned Provenance-based System Behavior Baseline

Yao Zhu · Zhenyuan LI · yangyang wei · Shouling Ji

Provenance graphs describe data flows and causal dependencies of host activities, enabling to track the data propagation and manipulation throughout the systems, which provide a foundation for intrusion detection. However, these Provenance-based Intrusion Detection Systems (PIDSes) face significant challenges in storage, representation, and analysis, which impede the efficacy of machine learning models such as Graph Neural Networks (GNNs) in processing and learning from these graphs. This paper presents a novel learning-based anomaly detection method designed to efficiently embed and analyze large-scale provenance graphs. Our approach integrates dynamic graph processing with adaptive encoding, facilitating compact embeddings that effectively address out-of-vocabulary (OOV) elements and adapt to normality shifts in dynamic real-world environments. Subsequently, we incorporate this refined baseline into a tag-propagation framework for real-time detection. Our evaluation demonstrates the method's accuracy and adaptability in anomaly path mining, significantly advancing the state-of-the-art in handling and analyzing provenance graphs for anomaly detection.

Poster

#W-504

Enhancing Graph Contrastive Learning for Protein Graphs from Perspective of Invariance

YUSONG WANG · Shiyin Tan · Jialun Shen · Yicheng Xu · Haobo Song · Qi Xu · Prayag Tiwari · Mingkun Xu

Graph Contrastive Learning (GCL) improves Graph Neural Network (GNN)-based protein representation learning by enhancing its generalization and robustness. Existing GCL approaches for protein representation learning rely on 2D topology, where graph augmentation is solely based on topological features, ignoring the intrinsic biological properties of proteins. Besides, 3D structure-based protein graph augmentation remains unexplored, despite proteins inherently exhibiting 3D structures. To bridge this gap, we propose novel biology-aware graph augmentation strategies from the perspective of invariance and integrate them into the protein GCL framework. Specifically, we introduce Functional Community Invariance (FCI)-based graph augmentation, which employs spectral constraints to preserve topology-driven community structures while incorporating residue-level chemical similarity as edge weights to guide edge sampling and maintain functional communities. Furthermore, we propose 3D Protein Structure Invariance (3-PSI)-based graph augmentation, leveraging dihedral angle perturbations and secondary structure rotations to retain critical 3D structural information of proteins while diversifying graph views.Extensive experiments on four different protein-related tasks demonstrate the superiority of our proposed GCL protein representation learning framework.

Poster

#W-506

Fast and Low-Cost Genomic Foundation Models via Outlier Removal

Haozheng Luo · Chenghao Qiu · Maojiang Su · Zhihan Zhou · Zoe Mehta · Guo Ye · Jerry Yao-Chieh Hu · Han Liu

To address the challenge of scarce computational resources in genomic modeling, we introduce GERM, a genomic foundation model optimized for accessibility and adaptability. GERM improves upon models like DNABERT-2 by eliminating outliers that hinder low-rank adaptation and post-training quantization, enhancing both efficiency and robustness. We replace the vanilla attention layer with an outlier-free mechanism inspired by associative memory models. By removing outliers during both pre-training and fine-tuning, this approach accelerates adaptation, reduces computational costs, and enhances quantization robustness within acceptable loss margins. Additionally, we propose GERM-T, a strategy that employs small-step continual learning within the outlier-free framework, leveraging original checkpoints to avoid retraining from scratch. Empirically, GERM improves fine-tuning performance by 37.98% and quantization by 64.34% over the baseline model. It also reduces average kurtosis by 92.14% and maximum infinity norm by 82.77%. Compared to leading methods, GERM consistently delivers superior performance, offering a practical solution for genomic modeling in resource-constrained settings.

Poster

#W-507

Analytical Lyapunov Function Discovery: An RL-based Generative Approach

Haohan Zou · Jie Feng · Hao Zhao · Yuanyuan Shi

Despite advances in learning-based methods, finding valid Lyapunov functions for nonlinear dynamical systems remains challenging. Current neural network approaches face two main issues: challenges in scalable verification and limited interpretability. To address these, we propose an end-to-end framework using transformers to construct analytical Lyapunov functions (local), which simplifies formal verification, enhances interpretability, and provides valuable insights for control engineers. Our framework consists of a transformer-based trainer that generates candidate Lyapunov functions and a falsifier that verifies candidate expressions and refines the model via risk-seeking policy gradient. Unlike Alfarano et al. (2024), which utilizes pre-training and seeks global Lyapunov functions for low-dimensional systems, our model is trained from scratch via reinforcement learning (RL) and succeeds in finding local Lyapunov functions for high-dimensional and non-polynomial systems. Given the symbolic nature of the Lyapunov function candidates, we employ efficient optimization methods for falsification during training and formal verification tools for the final verification. We demonstrate the efficiency of our approach on a range of nonlinear dynamical systems with up to ten dimensions and show that it can discover Lyapunov functions not previously identified in the control literature. Full implementation is available on Github.

Poster

#W-508

OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Hongliang Lu · Zhonglin Xie · Yaoyu Wu · Can Ren · Yuxuan Chen · Zaiwen Wen

Despite the rapid development of large language models (LLMs), a fundamental challenge persists: the lack of high-quality optimization modeling datasets hampers LLMs' robust modeling of practical optimization problems from natural language descriptions (NL). This data scarcity also contributes to the generalization difficulties experienced by learning-based methods.To address these challenges, we propose a scalable framework for synthesizing a high-quality dataset, named OptMATH. Starting from curated seed data with mathematical formulations (MF), this framework automatically generates problem data (PD) with controllable complexity. Then, a back-translation step is employed to obtain NL. To verify the correspondence between the NL and the PD, a forward modeling step followed by rejection sampling is used. The accepted pairs constitute the training part of OptMATH. Then a collection of rejected pairs is identified and further filtered. This collection serves as a new benchmark for optimization modeling, containing difficult instances whose lengths are much longer than these of NL4OPT and MAMO.Through extensive experiments, we demonstrate that models of various sizes (0.5B-32B parameters) trained on OptMATH achieve superior results on multiple modeling benchmarks, thereby validating the effectiveness and scalability of our approach. The OptMATH dataset and related resources are available at \url{https://github.com/optsuite/OptMATH}.

Poster

#W-509

Stochastic Deep Restoration Priors for Imaging Inverse Problems

Yuyang Hu · Albert Peng · Weijie Gan · Peyman Milanfar · Mauricio Delbracio · Ulugbek Kamilov

Deep neural networks trained as image denoisers are widely used as priors for solving imaging inverse problems. We introduce Stochastic deep Restoration Priors (ShaRP), a novel framework that stochastically leverages an ensemble of deep restoration models beyond denoisers to regularize inverse problems. By using generalized restoration models trained on a broad range of degradations beyond simple Gaussian noise, ShaRP effectively addresses structured artifacts and enables self-supervised training without fully sampled data. We prove that ShaRP minimizes an objective function involving a regularizer derived from the score functions of minimum mean square error (MMSE) restoration operators. We also provide theoretical guarantees for learning restoration operators from incomplete measurements. ShaRP achieves state-of-the-art performance on tasks such as magnetic resonance imaging reconstruction and single-image super-resolution, surpassing both denoiser- and diffusion-model-based methods without requiring retraining.

Poster

#W-510

ROS: A GNN-based Relax-Optimize-and-Sample Framework for Max-$k$-Cut Problems

Yeqing Qiu · Ye XUE · Akang Wang · Yiheng Wang · Qingjiang Shi · Zhiquan Luo

The Max-$k$-Cut problem is a fundamental combinatorial optimization challenge that generalizes the classic $\mathcal{NP}$-complete Max-Cut problem. While relaxation techniques are commonly employed to tackle Max-$k$-Cut, they often lack guarantees of equivalence between the solutions of the original problem and its relaxation. To address this issue, we introduce the Relax-Optimize-and-Sample (ROS) framework. In particular, we begin by relaxing the discrete constraints to the continuous probability simplex form. Next, we pre-train and fine-tune a graph neural network model to efficiently optimize the relaxed problem. Subsequently, we propose a sampling-based construction algorithm to map the continuous solution back to a high-quality Max-$k$-Cut solution. By integrating geometric landscape analysis with statistical theory, we establish the consistency of function values between the continuous solution and its mapped counterpart. Extensive experimental results on random regular graphs and the Gset benchmark demonstrate that the proposed ROS framework effectively scales to large instances with up to $20,000$ nodes in just a few seconds, outperforming state-of-the-art algorithms. Furthermore, ROS exhibits strong generalization capabilities across both in-distribution and out-of-distribution instances, underscoring its effectiveness for large-scale optimization tasks.

Poster

#W-511

An in depth look at the Procrustes-Wasserstein distance: properties and barycenters

Davide Adamo · Marco Corneli · Manon Vuillien · Emmanuelle Vila

Due to its invariance to rigid transformations such as rotations and reflections, Procrustes-Wasserstein (PW) was introduced in the literature as an optimal transport (OT) distance, alternative to Wasserstein and more suited to tasks such as the alignment and comparison of point clouds. Having that application in mind, we carefully build a space of discrete probability measures and show that over that space PW actually is a distance. Algorithms to solve the PW problems already exist, however we extend the PW framework by discussing and testing several initialization strategies. We then introduce the notion of PW barycenter and detail an algorithm to estimate it from the data. The result is a new method to compute representative shapes from a collection of point clouds. We benchmark our method against existing OT approaches, demonstrating superior performance in scenarios requiring precise alignment and shape preservation. We finally show the usefulness of the PW barycenters in an archaeological context. Our results highlight the potential of PW in advancing 2D and 3D point cloud analysis for machine learning and computational geometry applications.

Poster

#W-512

MVA: Linear Attention with High-order Query-Keys Integration and Multi-level Vocabulary Decomposition

ning wang · Zekun Li · Tongxin Bai · Man Yao · Zhen Qin · Guoqi Li

Linear attention offers the advantages of linear inference time and fixed memory usage compared to Softmax attention. However, training large-scale language models with linear attention from scratch remains prohibitively expensive and exhibits significant performance gaps compared to Softmax-based models. To address these challenges, we focus on transforming pre-trained Softmax-based language models into linear attention models. We unify mainstream linear attention methods using a high-order QK integration theory and a multi-level vocabulary decomposition. Specifically, the QK integration theory explains the efficacy of combining linear and sparse attention from the perspective of information collection across different frequency bands. The multi-level vocabulary decomposition exponentially expands memory capacity by recursively exploiting compression loss from compressed states. Through detailed error analysis, we demonstrate superior approximation of Softmax attention achieved by our approach. To further improve performance and reduce training costs, we adopt a soft integration strategy with attention scores, effectively combining a sliding window mechanism. With less than 100M tokens, our method fine-tunes models to achieve linear complexity while retaining 99\% of their original performance. Compared to state-of-the-art linear attention model and method, our approach improves MMLU scores by 1.2 percentage points with minimal fine-tuning. Furthermore, even without the sliding window mechanism, our method achieves state-of-the-art performance on all test sets with 10B tokens.

Poster

#W-513

Safe-EF: Error Feedback for Non-smooth Constrained Optimization

Rustem Islamov · Yarden As · Ilyas Fatkhullin

Federated learning faces severe communication bottlenecks due to the high dimensionality of model updates. Communication compression with contractive compressors (e.g., Top-$K$) is often preferable in practice but can degrade performance without proper handling. Error feedback (EF) mitigates such issues but has been largely restricted for smooth, unconstrained problems, limiting its real-world applicability where non-smooth objectives and safety constraints are critical. We advance our understanding of EF in the canonical non-smooth convex setting by establishing new lower complexity bounds for first-order algorithms with contractive compression. Next, we propose Safe-EF, a novel algorithm that matches our lower bound (up to a constant) while enforcing safety constraints essential for practical applications. Extending our approach to the stochastic setting, we bridge the gap between theory and practical implementation. Extensive experiments in a reinforcement learning setup, simulating distributed humanoid robot training, validate the effectiveness of Safe-EF in ensuring safety and reducing communication complexity.

Poster

#W-514

Causality Inspired Federated Learning for OOD Generalization

Jiayuan Zhang · Xuefeng Liu · Jianwei Niu · Shaojie Tang · Haotian Yang · Xinghao Wu

The out-of-distribution (OOD) generalization problem in federated learning (FL) has recently attracted significant research interest. A common approach, derived from centralized learning, is to extract causal features which exhibit causal relationships with the label. However, in FL, the global feature extractor typically captures only invariant causal features shared across clients and thus discards many other causal features that are potentially useful for OOD generalization. To address this problem, we propose FedUni, a simple yet effective architecture trained to extract all possible causal features from any input. FedUni consists of a comprehensive feature extractor, designed to identify a union of all causal feature types in the input, followed by a feature compressor, which discards potential \textit{inactive} causal features. With this architecture, FedUni can benefit from collaborative training in FL while avoiding the cost of model aggregation (i.e., extracting only invariant features). In addition, to further enhance the feature extractor's ability to capture causal features, FedUni add a causal intervention module on the client side, which employs a counterfactual generator to generate counterfactual examples that simulate distributions shifts. Extensive experiments and theoretical analysis demonstrate that our method significantly improves OOD generalization performance.

Poster

#W-515

Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning

Ze'ev Zukerman · Bassel Hamoud · Kfir Levy

Distributed learning methods have gained substantial momentum in recent years, with communication overhead often emerging as a critical bottleneck. Gradient compression techniques alleviate communication costs but involve an inherent trade-off between the empirical efficiency of biased compressors and the theoretical guarantees of unbiased compressors. In this work, we introduce a novel Multilevel Monte Carlo (MLMC) compression scheme that leverages biased compressors to construct statistically unbiased estimates. This approach effectively bridges the gap between biased and unbiased methods, combining the strengths of both. To showcase the versatility of our method, we apply it to popular compressors, like Top-$k$ and bit-wise compressors, resulting in enhanced variants. Furthermore, we derive an adaptive version of our approach to further improve its performance. We validate our method empirically on distributed deep learning tasks.

Poster

#W-516

Efficient Optimization with Orthogonality Constraint: a Randomized Riemannian Submanifold Method

Andi Han · Pierre-Louis Poirion · Akiko Takeda

Optimization with orthogonality constraints frequently arises in various fields such as machine learning. Riemannian optimization offers a powerful framework for solving these problems by equipping the constraint set with a Riemannian manifold structure and performing optimization intrinsically on the manifold. This approach typically involves computing a search direction in the tangent space and updating variables via a retraction operation. However, as the size of the variables increases, the computational cost of the retraction can become prohibitively high, limiting the applicability of Riemannian optimization to large-scale problems. To address this challenge and enhance scalability, we propose a novel approach that restricts each update on a random submanifold, thereby significantly reducing the per-iteration complexity. We introduce two sampling strategies for selecting the random submanifolds and theoretically analyze the convergence of the proposed methods. We provide convergence results for general nonconvex functions and functions that satisfy Riemannian Polyak–Łojasiewicz condition as well as for stochastic optimization settings. Additionally, we demonstrate how our approach can be generalized to quotient manifolds derived from the orthogonal manifold. Extensive experiments verify the benefits of the proposed method, across a wide variety of problems.

Poster

#W-517

Attention-Only Transformers via Unrolled Subspace Denoising

Peng Wang · Yifu Lu · Yaodong Yu · Druv Pai · Qing Qu · Yi Ma

Despite the popularity of transformers in practice, their architectures are empirically designed and neither mathematically justified nor interpretable. Moreover, as indicated by many empirical studies, some components of transformer architectures may be redundant. To derive a fully interpretable transformer architecture with only necessary components, we contend that the goal of representation learning is to compress a set of noisy initial token representations towards a mixture of low-dimensional subspaces. To compress these noisy token representations, an associated denoising operation naturally takes the form of a multi-head (subspace) self-attention. By unrolling such iterative denoising operations into a deep network, we arrive at a highly compact architecture that consists of \textit{only} self-attention operators with skip connections at each layer. Moreover, we show that each layer performs highly efficient denoising: it improves the signal-to-noise ratio of token representations \textit{at a linear rate} with respect to the number of layers. Despite its simplicity, extensive experiments on vision and language tasks demonstrate that such a transformer achieves performance close to that of standard transformer architectures such as GPT-2 and CRATE.

Poster

#W-518

Recommendations with Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization

Suryanarayana Sankagiri · Jalal Etesami · Matthias Grossglauser

In this paper, we consider a recommender system that elicits user feedback through pairwise comparisons instead of ratings. We study the problem of learning personalised preferences from such comparison data via collaborative filtering. Similar to the classical matrix completion setting, we assume that users and items are endowed with low-dimensional latent features. These features give rise to user-item utilities, and the comparison outcomes are governed by a discrete choice model over these utilities. The task of learning these features is then formulated as a maximum likelihood problem over the comparison dataset. Despite the resulting optimization problem being nonconvex, we show that gradient-based methods converge exponentially to the latent features, given a warm start. Importantly, this result holds in a sparse data regime, where each user compares only a few pairs of items. Our main technical contribution is to extend key concentration results commonly used in matrix completion to our model. Simulations reveal that the empirical performance of the method exceeds theoretical predictions, even when some assumptions are relaxed. Our work demonstrates that learning personalised recommendations from comparison data is both computationally and statistically efficient.

Poster

#W-519

Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed

Savelii Chezhegov · Klyukin Yaroslav · Andrei Semenov · Aleksandr Beznosikov · Alexander Gasnikov · Samuel Horváth · Martin Takac · Eduard Gorbunov

Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between AdaGrad/Adam and Clip-SGD, the current understanding of the high-probability convergence of AdaGrad/Adam-type methods is limited in this case. In this work, we prove that AdaGrad/Adam (and their delayed version) can have provably bad high-probability convergence if the noise is heavy-tailed. We also show that gradient clipping fixes this issue, i.e., we derive new high-probability convergence bounds with polylogarithmic dependence on the confidence level for AdaGrad-Norm and Adam-Norm with clipping and with/without delay for smooth convex/non-convex stochastic optimization with heavy-tailed noise. We extend our results to the case of AdaGrad/Adam with delayed stepsizes. Our empirical evaluations highlight the superiority of clipped versions of AdaGrad/Adam in handling the heavy-tailed noise.

Poster

#W-521

Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Stochastic Bias and Remedies

Qiang Li · Michal Yemini · Hoi To Wai

This paper studies the convergence of clipped stochastic gradient descent (SGD) algorithms with decision-dependent data distribution. Our setting is motivated by privacy preserving optimization algorithms that interact with performative data where the prediction models can influence future outcomes. This challenging setting involves the non-smooth clipping operator and non-gradient dynamics due to distribution shifts. We make two contributions in pursuit for a performative stable solution with these algorithms. First, we characterize the stochastic bias with projected clipped SGD (PCSGD) algorithm which is caused by the clipping operator that prevents PCSGD from reaching a stable solution. When the loss function is strongly convex, we quantify the lower and upper bounds for this stochastic bias and demonstrate a bias amplification phenomenon with the sensitivity of data distribution. When the loss function is non-convex, we bound the magnitude of stationarity bias. Second, we propose remedies to mitigate the bias either by utilizing an optimal step size design for PCSGD, or to apply the recent DiceSGD algorithm [Zhang et al., 2024]. Our analysis is also extended to show that the latter algorithm is free from stochastic bias in the performative setting. Numerical experiments verify our findings.

Spotlight Poster

#W-600

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee · Youngdo Lee · Takuma Seno · Donghu Kim · Peter Stone · Jaegul Choo

Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization.In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains.

Poster

#W-601

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

Minh-Tung Luu · Younghwan Lee · Donghoon Lee · Sunho Kim · MinJun Kim · Chang Yoo

Designing effective reward functions remains a fundamental challenge in reinforcement learning (RL), as it often requires extensive human effort and domain expertise. While RL from human feedback has been successful in aligning agents with human intent, acquiring high-quality feedback is costly and labor-intensive, limiting its scalability. Recent advancements in foundation models present a promising alternative--leveraging AI-generated feedback to reduce reliance on human supervision in reward learning. Building on this paradigm, we introduce ERL-VLM, an enhanced rating-based RL method that effectively learns reward functions from AI feedback. Unlike prior methods that rely on pairwise comparisons, ERL-VLM queries large vision-language models (VLMs) for absolute ratings of individual trajectories, enabling more expressive feedback and improved sample efficiency. Additionally, we propose key enhancements to rating-based RL, addressing instability issues caused by data imbalance and noisy labels. Through extensive experiments across both low-level and high-level control tasks, we demonstrate that ERL-VLM significantly outperforms existing VLM-based reward generation methods. Our results demonstrate the potential of AI feedback for scaling RL with minimal human intervention, paving the way for more autonomous and efficient reward learning.

Poster

#W-602

Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

Motoki Omura · Kazuki Ota · Takayuki Osa · Yusuke Mukuta · Tatsuya Harada

For continuous action spaces, actor-critic methods are widely used in online reinforcement learning (RL). However, unlike RL algorithms for discrete actions, which generally model the optimal value function using the Bellman optimality operator, RL algorithms for continuous actions typically model Q-values for the current policy using the Bellman operator. These algorithms for continuous actions rely exclusively on policy updates for improvement, which often results in low sample efficiency. This study examines the effectiveness of incorporating the Bellman optimality operator into actor-critic frameworks. Experiments in a simple environment show that modeling optimal values accelerates learning but leads to overestimation bias. To address this, we propose an annealing approach that gradually transitions from the Bellman optimality operator to the Bellman operator, thereby accelerating learning while mitigating bias. Our method, combined with TD3 and SAC, significantly outperforms existing approaches across various locomotion and manipulation tasks, demonstrating improved performance and robustness to hyperparameters related to optimality.

Poster

#W-603

Constrained Exploitability Descent: An Offline Reinforcement Learning Method for Finding Mixed-Strategy Nash Equilibrium

Runyu Lu · Yuanheng Zhu · Dongbin Zhao

This paper proposes Constrained Exploitability Descent (CED), a model-free offline reinforcement learning (RL) algorithm for solving adversarial Markov games (MGs). CED combines the game-theoretical approach of Exploitability Descent (ED) with policy constraint methods from offline RL. While policy constraints can perturb the optimal pure-strategy solutions in single-agent scenarios, we find the side effect less detrimental in adversarial games, where the optimal policy can be a mixed-strategy Nash equilibrium. We theoretically prove that, under the uniform coverage assumption on the dataset, CED converges to a stationary point in deterministic two-player zero-sum Markov games. We further prove that the min-player policy at the stationary point follows the property of mixed-strategy Nash equilibrium in MGs. Compared to the model-based ED method that optimizes the max-player policy, our CED method no longer relies on a generalized gradient. Experiments in matrix games, a tree-form game, and an infinite-horizon soccer game verify that CED can find an equilibrium policy for the min-player as long as the offline dataset guarantees uniform coverage. Besides, CED achieves a significantly lower NashConv compared to an existing pessimism-based method and can gradually improve the behavior policy even under non-uniform data coverages. When combined with neural networks, CED also outperforms behavior cloning and offline self-play in a large-scale two-team robotic combat game.

Poster

#W-604

Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning

Zhihe Yang · Yunjian Xu · Yang Zhang

Safe offline reinforcement learning (RL), which aims to learn the safety-guaranteed policy without risky online interaction with environments, has attracted growing recent attention for safety-critical scenarios. However, existing approaches encounter out-of-distribution problems during the testing phase, which can result in potentially unsafe outcomes. This issue arises due to the infinite possible combinations of reward-related and cost-related states. In this work, we propose State Decoupling with Q-supervised Contrastive representation (SDQC), a novel framework that decouples the global observations into reward- and cost-related representations for decision-making, thereby improving the generalization capability for unfamiliar global observations. Compared with the classical representation learning methods, which typically require model-based estimation (e.g., bisimulation), we theoretically prove that our Q-supervised method generates a coarser representation while preserving the optimal policy, resulting in improved generalization performance. Experiments on DSRL benchmark problems provide compelling evidence that SDQC surpasses other baseline algorithms, especially for its exceptional ability to achieve almost zero violations in more than half of the tasks, while the state-of-the-art algorithm can only achieve the same level of success in a quarter of the tasks. Further, we demonstrate that SDQC possesses superior generalization ability when confronted with unseen environments.

Poster

#W-605

CERTAIN: Context Uncertainty-aware One-Shot Adaptation for Context-based Offline Meta Reinforcement Learning

Hongtu Zhou · Ruiling Yang · Yakun Zhu · Haoqi Zhao · Hai Zhang · Di Zhang · Junqiao Zhao · Chen Ye · Changjun Jiang

Existing context-based offline meta-reinforcement learning (COMRL) methods primarily focus on task representation learning and given-context adaptation performance. They often assume that the adaptation context is collected using task-specific behavior policies or through multiple rounds of collection. However, in real applications, the context should be collected by a policy in a one-shot manner to ensure efficiency and safety. We find that intrinsic context ambiguity across multiple tasks and out-of-distribution (OOD) issues due to distribution shift significantly affect the performance of one-shot adaptation, which has been largely overlooked in most COMRL research. To address this problem, we propose using heteroscedastic uncertainty in representation learning to identify ambiguous and OOD contexts, and train an uncertainty-aware context collecting policy for effective one-shot online adaptation. The proposed method can be integrated into various COMRL frameworks, including classifier-based, reconstrution-based and contrastive learning-based approaches. Empirical evaluations on benchmark tasks show that our method can improve one-shot adaptation performance by up to 36% and zero-shot adaptation performance by up to 34% compared to existing baseline COMRL methods.

Poster

#W-606

Habitizing Diffusion Planning for Efficient and Effective Decision Making

Haofei Lu · Yifei Shen · Dongsheng Li · Junliang Xing · Dongqi Han

Diffusion models have shown great promise in decision-making, also known as diffusion planning. However, the slow inference speeds limit their potential for broader real-world applications. Here, we introduce Habi, a general framework that transforms powerful but slow diffusion planning models into fast decision-making models, which mimics the cognitive process in the brain that costly goal-directed behavior gradually transitions to efficient habitual behavior with repetitive practice. Even using a laptop CPU, the habitized model can achieve an average 800+ Hz decision-making frequency (faster than previous diffusion planners by orders of magnitude) on standard offline reinforcement learning benchmarks D4RL, while maintaining comparable or even higher performance compared to its corresponding diffusion planner. Our work proposes a fresh perspective of leveraging powerful diffusion models for real-world decision-making tasks. We also provide robust evaluations and analysis, offering insights from both biological and engineering perspectives for efficient and effective decision-making.

Poster

#W-607

Off-Policy Evaluation under Nonignorable Missing Data

Han Wang · Yang Xu · Wenbin Lu · Rui Song

Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighting value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.

Poster

#W-608

Strategic Planning: A Top-Down Approach to Option Generation

Max Ruiz Luyten · Antonin Berthon · Mihaela van der Schaar

Real-world human decision-making often relies on strategic planning, where high-level goals guide the formulation of sub-goals and subsequent actions, as evidenced by domains such as healthcare, business, and urban policy. Despite notable successes in controlled settings, conventional reinforcement learning (RL) follows a bottom-up paradigm, which can struggle to adapt to real-world complexities such as sparse rewards and limited exploration budgets. While methods like hierarchical RL and environment shaping provide partial solutions, they frequently rely on either ad-hoc designs (e.g. choose the set of high-level actions) or purely data-driven discovery of high-level actions that still requires significant exploration. In this paper, we introduce a top-down framework for RL that explicitly leverages human-like strategy to reduce sample complexity, guide exploration, and enable high-level decision-making. We first formalize the Strategy Problem, which frames policy generation as finding distributions over policies that balance specificity and value. Building on this definition, we propose the Strategist agent—an iterative framework that leverages large language models (LLMs) to synthesize domain knowledge into a structured representation of actionable strategies and sub-goals. We further develop a reward shaping methodology that translates these strategies expressed in natural language into quantitative feedback for RL methods. Empirically, we demonstrate a significantly faster convergence than conventional PPO. Taken together, our findings highlight that top-down strategic exploration opens new avenues for enhancing RL on real-world decision problems.

Poster

#W-609

Action-Dependent Optimality-Preserving Reward Shaping

Grant Forbes · Jianxun Wang · Leonardo Villalobos-Arias · Arnav Jhala · David Roberts

Recent RL research has utilized reward shaping--particularly complex shaping rewards such as intrinsic motivation (IM)--to encourage agent exploration in sparse-reward environments. While often effective, ``reward hacking'' can lead to the shaping reward being optimized at the expense of the extrinsic reward, resulting in a suboptimal policy. Potential-Based Reward Shaping (PBRS) techniques such as Generalized Reward Matching (GRM) and Policy-Invariant Explicit Shaping (PIES) have mitigated this. These methods allow for implementing IM without altering optimal policies. In this work we show that they are effectively unsuitable for complex, exploration-heavy environments with long-duration episodes. To remedy this, we introduce Action-Dependent Optimality Preserving Shaping (ADOPS), a method of converting intrinsic rewards to an optimality-preserving form that allows agents to utilize IM more effectively in the extremely sparse environment of Montezuma's Revenge. We also prove ADOPS accommodates reward shaping functions that cannot be written in a potential-based form: while PBRS-based methods require the cumulative discounted intrinsic return be independent of actions, ADOPS allows for intrinsic cumulative returns to be dependent on agents' actions while still preserving the optimal policy set. We show how action-dependence enables ADOPS's to preserve optimality while learning in complex, sparse-reward environments where other methods struggle.

Poster

#W-610

Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner

Chenyou Fan · Chenjia Bai · Zhao Shan · Haoran He · Yang Zhang · Zhen Wang

Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via multi-task imitation, or require task-specific reward labels to facilitate policy optimization via Reinforcement Learning (RL). They are costly due to the substantial human efforts required to collect expert data or design reward functions. To address these challenges, we aim to develop a versatile diffusion planner capable of leveraging large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks. In this paper, we propose SODP, a two-stage framework that leverages Sub-Optimal data to learn a Diffusion Planner, which is generalizable for various downstream tasks. Specifically, in the pre-training stage, we train a foundation diffusion planner that extracts general planning capabilities by modeling the versatile distribution of multi-task trajectories, which can be sub-optimal and has wide data coverage. Then for downstream tasks, we adopt RL-based fine-tuning with task-specific rewards to quickly refine the diffusion planner, which aims to generate action sequences with higher task-specific returns. Experimental results from multi-task domains including Meta-World and Adroit demonstrate that SODP outperforms state-of-the-art methods with only a small amount of data for reward-guided fine-tuning.

Poster

#W-611

Wasserstein Policy Optimization

David Pfau · Ian Davies · Diana Borsa · João Madeira Araujo · Brendan Tracey · Hado van Hasselt

We introduce Wasserstein Policy Optimization (WPO), an actor-critic algorithm for reinforcement learning in continuous action spaces. WPO can be derived as an approximation to Wasserstein gradient flow over the space of all policies projected into a finite-dimensional parameter space (e.g., the weights of a neural network), leading to a simple and completely general closed-form update. The resulting algorithm combines many properties of deterministic and classic policy gradient methods. Like deterministic policy gradients, it exploits knowledge of the gradient of the action-value function with respect to the action. Like classic policy gradients, it can be applied to stochastic policies with arbitrary distributions over actions -- without using the reparameterization trick. We show results on the DeepMind Control Suite and a magnetic confinement fusion task which compare favorably with state-of-the-art continuous control methods.

Spotlight Poster

#W-612

Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration

Shiqing Gao · Jiaxin Ding · Luoyi Fu · Xinbing Wang

Constrained Reinforcement Learning (CRL) aims to maximize cumulative rewards while satisfying constraints. However, existing CRL algorithms often encounter significant constraint violations during training, limiting their applicability in safety-critical scenarios. In this paper, we identify the underestimation of the cost value function as a key factor contributing to these violations. To address this issue, we propose the Memory-driven Intrinsic Cost Estimation (MICE) method, which introduces intrinsic costs to mitigate underestimation and control bias to promote safer exploration. Inspired by flashbulb memory, where humans vividly recall dangerous experiences to avoid risks, MICE constructs a memory module that stores previously explored unsafe states to identify high-cost regions. The intrinsic cost is formulated as the pseudo-count of the current state visiting these risk regions. Furthermore, we propose an extrinsic-intrinsic cost value function that incorporates intrinsic costs and adopts a bias correction strategy. Using this function, we formulate an optimization objective within the trust region, along with corresponding optimization methods. Theoretically, we provide convergence guarantees for the proposed cost value function and establish the worst-case constraint violation for the MICE update. Extensive experiments demonstrate that MICE significantly reduces constraint violations while preserving policy performance comparable to baselines.

Spotlight Poster

#W-613

Temporal Difference Flows

Jesse Farebrother · Matteo Pirotta · Andrea Tirinzoni · REMI MUNOS · Alessandro Lazaric · Ahmed Touati

Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks, including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.

Poster

#W-614

Learning Fused State Representations for Control from Multi-View Observations

Zeyu Wang · Yao-Hui Li · Xin Li · Hongyu Zang · Romain Laroche · Riashat Islam

Multi-View Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision. Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks. However, it is not straightforward to learn compact and task-relevant representations, particularly in the presence of redundancy, distracting information, or missing views. In this paper, we propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. Furthermore, we propose a multiview-based mask and latent reconstruction auxiliary task that exploits shared information across views and improves MFSC’s robustness in missing views by introducing a mask token. Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance. The project code is available at https://github.com/zpwdev/MFSC.

Poster

#W-615

Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks

Luise Ge · Michael Lanier · Anindya Sarkar · Bengisu Guresti · Chongjie Zhang · Yevgeniy Vorobeychik

Many dynamic decision problems, such as robotic control, involve a series of tasks, many of which are unknown at training time.Typical approaches for these problems, such as multi-task and meta reinforcement learning, do not generalize well when the tasks are diverse. On the other hand, approaches that aim to tackle task diversity, such as using task embedding as policy context and task clustering, typically lack performance guarantees and require a large number of training tasks. To address these challenges, we propose a novel approach for learning a policy committee that includes at least one near-optimal policy with high probability for tasks encountered during execution. While we show that this problem is in general inapproximable, we present two practical algorithmic solutions.The first yields provable approximation and task sample complexity guarantees when tasks are low-dimensional (the best we can do due to inapproximability), whereas the second is a general and practical gradient-based approach. In addition, we provide a provable sample complexity bound for few-shot learning. Our experiments on MuJoCo and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning, often by a large margin. Our code is available at https://github.com/CERL-WUSTL/PACMAN.

Poster

#W-616

Model-Based Exploration in Monitored Markov Decision Processes

Alireza Kazemipour · Matthew Taylor · Michael Bowling

A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or malfunctioning, or rewards may be inaccessible during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed to model such settings. However, existing Mon-MDP algorithms have several limitations: they do not fully exploit the problem structure, cannot leverage a known monitor, lack worst-case guarantees for "unsolvable" Mon-MDPs without specific initialization, and offer only asymptotic convergence proofs. This paper makes three contributions. First, we introduce a model-based algorithm for Mon-MDPs that addresses these shortcomings. The algorithm employs two instances of model-based interval estimation: one to ensure that observable rewards are reliably captured, and another to learn the minimax-optimal policy. Second, we empirically demonstrate the advantages. We show faster convergence than prior algorithms in more than four dozen benchmarks, and even more dramatic improvements when the monitoring process is known. Third, we present the first finite-sample bound on performance. We show convergence to a minimax-optimal policy even when some rewards are never observable.

Poster

#W-617

ADDQ: Adaptive distributional double Q-learning

Leif Döring · Benedikt Wille · Maximilian Birr · Mihail Bîrsan · Martin Slowik

Bias problems in the estimation of Q-values are a well-known obstacle that slows down convergence of Q-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We introduce an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework ADDQ is simple to implement, existing DRL implementations can be improved with a few lines of code. We provide theoretical backup and experimental results in tabular, Atari, and MuJoCo environments, comparisons with state-of-the-art methods, and a proof of convergence in the tabular case.

Poster

#W-618

Zero-Shot Offline Imitation Learning via Optimal Transport

Thomas Rupf · Marco Bagatella · Nico Gürtler · Jonas Frey · Georg Martius

Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks. The code is available at https://github.com/martius-lab/zilot.

Poster

#W-619

Gap-Dependent Bounds for Federated $Q$-Learning

Haochen Zhang · Zhong Zheng · Lingzhou Xue

We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $\sqrt{T}$-type regret bounds and communication cost bounds with a $\log T$ term scaling with the number of agents $M$, states $S$, and actions $A$, where $T$ is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a $\log T$-type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on $MSA$ from the $\log T$ term. Notably, our gap-dependent communication cost bound also yields a better global switching cost when $M=1$, removing $SA$ from the $\log T$ term.

Poster

#W-620

Neural Event-Triggered Control with Optimal Scheduling

Luan Yang · Jingdong Zhang · Qunxi Zhu · Wei Lin

Learning-enabled controllers with stability certificate functions have demonstrated impressive empirical performance in addressing control problems in recent years. Nevertheless, directly deploying the neural controllers onto actual digital platforms requires impractically excessive communication resources due to a continuously updating demand from the closed-loop feedback controller. We introduce a framework aimed at learning the event-triggered controller (ETC) with optimal scheduling, i.e., minimal triggering times, to address this challenge in resource-constrained scenarios. Our proposed framework, denoted by Neural ETC, includes two practical algorithms: the path integral algorithm based on directly simulating the event-triggered dynamics, and the Monte Carlo algorithm derived from new theoretical results regarding lower bound of inter-event time. Furthermore, we propose a projection operation with an analytical expression that ensures theoretical stability and schedule optimality for Neural ETC. Compared to the conventional neural controllers, our empirical results show that the Neural ETC significantly reduces the required communication resources while enhancing the control performance in constrained communication resources scenarios.

Poster

#W-621

Understanding High-Dimensional Bayesian Optimization

Leonard Papenmeier · Matthias Poloczek · Luigi Nardi

Recent work reported that simple Bayesian optimization (BO) methods perform well for high-dimensional real-world tasks, seemingly contradicting prior work and tribal knowledge. This paper investigates why. We identify underlying challenges that arise in high-dimensional BO and explain why recent methods succeed. Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings.

Poster

#W-700

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Guozheng Ma · Lu Li · Zilin Wang · Li Shen · Pierre-Luc Bacon · Dacheng Tao

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.

Spotlight Poster

#W-701

Policy-labeled Preference Learning: Is Preference Enough for RLHF?

Taehyun Cho · Seokhun Ju · Seungyub Han · Dohyeong Kim · Kyungjae Lee · Jungwoo Lee

To design reward that align with human goals, Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent technique for learning reward functions from human preferences and optimizing models using reinforcement learning algorithms. However, existing RLHF methods often misinterpret trajectories as being generated by an optimal policy, causing inaccurate likelihood estimation and suboptimal learning. To address this, we propose Policy-labeled Preference Learning (PPL) within the Direct Preference Optimization (DPO) framework, which resolves these likelihood mismatch problems by modeling human preferences with regret, reflecting the efficiency of executed policies. Additionally, we introduce a contrastive KL regularization term derived from regret-based principles to enhance sequential contrastive learning. Experiments in high-dimensional continuous control environments demonstrate PPL's significant improvements in offline RLHF performance and its effectiveness in online settings.

Poster

#W-702

Reinforcement Learning with Adaptive Reward Modeling for Expensive-to-Evaluate Systems

Hongyuan Su · Yu Zheng · Yuan Yuan · Yuming Lin · Depeng Jin · Yong Li

Training reinforcement learning (RL) agents requires extensive trials and errors, which becomes prohibitively time-consuming in systems with costly reward evaluations.To address this challenge, we propose adaptive reward modeling (AdaReMo) which accelerates RL training by decomposing the complicated reward function into multiple localized fast reward models approximating direct reward evaluation with neural networks.These models dynamically adapt to the agent’s evolving policy by fitting the currently explored subspace with the latest trajectories, ensuring accurate reward estimation throughout the entire training process while significantly reducing computational overhead.We empirically show that AdaReMo not only achieves over 1,000 times speedup but also improves the performance by 14.6% over state-of-the-art approaches across three expensive-to-evaluate systems---molecular generation, epidemic control, and spatial planning.Code and data for the project are provided at https://github.com/tsinghua-fib-lab/AdaReMo.

Poster

#W-703

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Bryan L. M. de Oliveira · Luana G. B. Martins · Bruno Brandão · Murilo L. da Luz · Telma Woerle de Lima Soares · Luckeciano C. Melo

Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that transforms the classic 8-tile puzzle into a visual RL task with images drawn from arbitrarily large datasets. SPGym's key innovation lies in its ability to precisely control representation learning complexity through adjustable grid sizes and image pools, while maintaining fixed environment dynamics, observation, and action spaces. This design enables researchers to isolate and scale the visual representation challenge independently of other learning components. Through extensive experiments with model-free and model-based RL algorithms, we uncover fundamental limitations in current methods' ability to handle visual diversity. As we increase the pool of possible images, all algorithms exhibit in- and out-of-distribution performance degradation, with sophisticated representation learning techniques often underperforming simpler approaches like data augmentation. These findings highlight critical gaps in visual representation learning for RL and establish SPGym as a valuable tool for driving progress in robust, generalizable decision-making systems.

Poster

#W-704

IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic

Stefano Viel · Luca Viano · Volkan Cevher

This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template which learns a policy from expert demonstrations with a primal-dual style algorithm which alternates cost and policy updates. Within the policy updates the SOAR framework prescribe to use an actor critic method with multiple critics to estimate the critic uncertainty and therefore build an optimistic critic fundamental to drive exploration.When instantiated to the tabular setting, we get a provable algorithms dubbed FRA with guarantees matching the best known results in $\epsilon$.Practically, the SOAR template is shown to boost consistently the performance of primal dual IL algorithms building on actor critic routines for the policy updates. Approximately, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by a half.

Poster

#W-705

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

Bo Yue · Jian Li · Guiliang Liu

Optimizing objective functions subject to constraints is fundamental in many real-world applications. However, these constraints are often not readily defined and must be inferred from expert agent behaviors, a problem known as Inverse Constraint Inference. Inverse Constrained Reinforcement Learning (ICRL) is a common solver for recovering feasible constraints in complex environments, relying on training samples collected from interactive environments. However, the efficacy and efficiency of current sampling strategies remain unclear. We propose a strategic exploration framework for sampling with guaranteed efficiency to bridge this gap. By defining the feasible cost set for ICRL problems, we analyze how estimation errors in transition dynamics and the expert policy influence the feasibility of inferred constraints. Based on this analysis, we introduce two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimations or 2) strategically constraining the exploration policy around plausibly optimal ones. Both algorithms are theoretically grounded with tractable sample complexity, and their performance is validated empirically across various environments.

Poster

#W-706

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Sunwoo Lee · Jaebak Hwang · Yonghyeon Jo · Seungyul Han

Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering system-wide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL. Our code is available at https://github.com/sunwoolee0504/WALL.

Poster

#W-707

Learning Progress Driven Multi-Agent Curriculum

Wenshuai Zhao · Zhiyuan Li · Joni Pajarinen

The number of agents can be an effective curriculum variable for controlling the difficulty of multi-agent reinforcement learning (MARL) tasks. Existing work typically uses manually defined curricula such as linear schemes. We identify two potential flaws while applying existing reward-based automatic curriculum learning methods in MARL: (1) The expected episode return used to measure task difficulty has high variance; (2) Credit assignment difficulty can be exacerbated in tasks where increasing the number of agents yields higher returns which is common in many MARL tasks. To address these issues, we propose to control the curriculum by using a TD-error based learning progress measure and by letting the curriculum proceed from an initial context distribution to the final task specific one. Since our approach maintains a distribution over the number of agents and measures learning progress rather than absolute performance, which often increases with the number of agents, we alleviate problem (2). Moreover, the learning progress measure naturally alleviates problem (1) by aggregating returns. In three challenging sparse-reward MARL benchmarks, our approach outperforms state-of-the-art baselines.

Poster

#W-708

Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping

Guangyi Liu · Suzan Iloglu · Michael Caldara · Joseph Durham · Michael Zavlanos

In Amazon robotic warehouses, the destination-to-chute mapping problem is crucial for efficient package sorting. Often, however, this problem is complicated by uncertain and dynamic package induction rates, which can lead to increased package recirculation. To tackle this challenge, we introduce a Distributionally Robust Multi-Agent Reinforcement Learning (DRMARL) framework that learns a destination-to-chute mapping policy that is resilient to adversarial variations in induction rates. Specifically, DRMARL relies on group distributionally robust optimization (DRO) to learn a policy that performs well not only on average but also on each individual subpopulation of induction rates within the group that capture, for example, different seasonality or operation modes of the system. This approach is then combined with a novel contextual bandit-based estimator of the worst-case induction distribution for each state-action pair, significantly reducing the cost of exploration and thereby increasing the learning efficiency and scalability of our framework. Extensive simulations demonstrate that DRMARL achieves robust chute mapping in the presence of varying induction distributions, reducing package recirculation by an average of 80% in the simulation scenario.

Spotlight Poster

#W-709

Graph Diffusion for Robust Multi-Agent Coordination

Xianghua Zeng · Hang Su · Zhengyi Wang · Zhiyuan LIN

Offline multi-agent reinforcement learning (MARL) struggles to estimate out-of-distribution states and actions due to the absence of real-time environmental feedback. While diffusion models show promise in addressing these challenges, their application primarily focuses on independently diffusing the historical trajectories of individual agents, neglecting crucial multi-agent coordination dynamics and reducing policy robustness in dynamic environments. In this paper, we propose MCGD, a novel Multi-agent Coordination framework based on Graph Diffusion models to improve the effectiveness and robustness of collaborative policies. Specifically, we begin by constructing a sparse coordination graph that includes continuous node attributes and discrete edge attributes to effectively identify the underlying dynamics of multi-agent interactions. Next, we derive transition probabilities between edge categories and present adaptive categorical diffusion to capture the structure diversity of multi-agent coordination. Leveraging this coordination structure, we define neighbor-dependent forward noise and develop anisotropic diffusion to enhance the action diversity of each agent. Extensive experiments across various multi-agent environments demonstrate that MCGD significantly outperforms existing state-of-the-art baselines in coordination performance and policy robustness in dynamic environments.

Poster

#W-710

Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Andreas Kontogiannis · Konstantinos Papathanasiou · Yi Shen · Giorgos Stamou · Michael Zavlanos · George Vouros

Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents' exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE$^2$ algorithm. In SMPE$^2$, agents enhance their own policy's discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE$^2$ outperforms a plethora of state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.

Poster

#W-711

HYGMA: Hypergraph Coordination Networks with Dynamic Grouping for Multi-Agent Reinforcement Learning

Chiqiang Liu · Dazi Li

Cooperative multi-agent reinforcement learning faces significant challenges in effectively organizing agent relationships and facilitating information exchange, particularly when agents need to adapt their coordination patterns dynamically. This paper presents a novel framework that integrates dynamic spectral clustering with hypergraph neural networks to enable adaptive group formation and efficient information processing in multi-agent systems. The proposed framework dynamically constructs and updates hypergraph structures through spectral clustering on agents' state histories, enabling higher-order relationships to emerge naturally from agent interactions. The hypergraph structure is enhanced with attention mechanisms for selective information processing, providing an expressive and efficient way to model complex agent relationships. This architecture can be implemented in both value-based and policy-based paradigms through a unified objective combining task performance with structural regularization. Extensive experiments on challenging cooperative tasks demonstrate that our method significantly outperforms state-of-the-art approaches in both sample efficiency and final performance. The code is available at: https://github.com/mysteryelder/HYGMA.

Poster

#W-712

Reidentify: Context-Aware Identity Generation for Contextual Multi-Agent Reinforcement Learning

Zhiwei XU · Kun Hu · Xin Xin · Weiliang Meng · Yiwei Shi · Hangyu Mao · Bin Zhang · dapeng Li · Jiangjin Yin

Generalizing multi-agent reinforcement learning (MARL) to accommodate variations in problem configurations remains a critical challenge in real-world applications, where even subtle differences in task setups can cause pre-trained policies to fail. To address this, we propose Context-Aware Identity Generation (CAID), a novel framework to enhance MARL performance under the Contextual MARL (CMARL) setting. CAID dynamically generates unique agent identities through the agent identity decoder built on a causal Transformer architecture. These identities provide contextualized representations that align corresponding agents across similar problem variants, facilitating policy reuse and improving sample efficiency. Furthermore, the action regulator in CAID incorporates these agent identities into the action-value space, enabling seamless adaptation to varying contexts. Extensive experiments on CMARL benchmarks demonstrate that CAID significantly outperforms existing approaches by enhancing both sample efficiency and generalization across diverse context variants.

Spotlight Poster

#W-713

Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Kunal Jha · Wilka Carvalho · Yancheng Liang · Simon Du · Max Kleiman-Weiner · Natasha Jaques

Zero-shot coordination (ZSC), the ability to adapt to a new partner in a cooperative task, is a critical component of human-compatible AI. While prior work has focused on training agents to cooperate on a single task, these specialized models do not generalize to new tasks, even if they are highly similar. Here, we study how reinforcement learning on a distribution of environments with a single partner enables learning general cooperative skills that support ZSC with many new partners on many new problems. We introduce two Jax-based, procedural generators that create billions of solvable coordination challenges. We develop a new paradigm called Cross-Environment Cooperation (CEC), and show that it outperforms competitive baselines quantitatively and qualitatively when collaborating with real people. Our findings suggest that learning to collaborate across many unique scenarios encourages agents to develop general norms, which prove effective for collaboration with different partners. Together, our results suggest a new route toward designing generalist cooperative agents capable of interacting with humans without requiring human data.

Poster

#W-714

Improving Transformer World Models for Data-Efficient RL

Antoine Dedieu · Joseph Ortiz · Xinghua Lou · Carter Wendelken · J Swaroop Guntupalli · Wolfgang Lehrach · Miguel Lazaro-Gredilla · Kevin Murphy

We present an approach to model-based RL that achieves a new state of the art performance on the challenging Craftax-classic benchmark, an open-world 2D survival game that requires agents to exhibit a wide range of general abilities---such as strong generalization, deep exploration, and long-term reasoning. With a series of careful design choices aimed at improving sample efficiency, our MBRL algorithm achieves a reward of 69.66% after only 1M environment steps, significantly outperforming DreamerV3, which achieves $53.2\%$, and, for the first time, exceeds human performance of 65.0%. Our method starts by constructing a SOTA model-free baseline, using a novel policy architecture that combines CNNs and RNNs.We then add three improvements to the standard MBRL setup: (a) "Dyna with warmup", which trains the policy on real and imaginary data, (b) "nearest neighbor tokenizer" on image patches, which improves the scheme to create the transformer world model (TWM) inputs, and (c) "block teacher forcing", which allows the TWM to reason jointly about the future tokens of the next timestep.

Poster

#W-715

Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction

Yiting He · Zhishuai Liu · Weixin Wang · Pan Xu

Off-dynamics reinforcement learning (RL), where training and deployment transition dynamics are different, can be formulated as learning in a robust Markov decision process (RMDP) where uncertainties in transition dynamics are imposed. Existing literature mostly assumes access to generative models allowing arbitrary state-action queries or pre-collected datasets with a good state coverage of the deployment environment, bypassing the challenge of exploration. In this work, we study a more realistic and challenging setting where the agent is limited to online interaction with the training environment. To capture the intrinsic difficulty of exploration in online RMDPs, we introduce the supremal visitation ratio, a novel quantity that measures the mismatch between the training dynamics and the deployment dynamics. We show that if this ratio is unbounded, online learning becomes exponentially hard. We propose the first computationally efficient algorithm that achieves sublinear regret in online RMDPs with $f$-divergence based transition uncertainties. We also establish matching regret lower bounds, demonstrating that our algorithm achieves optimal dependence on both the supremal visitation ratio and the number of interaction episodes. Finally, we validate our theoretical results through comprehensive numerical experiments.

Poster

#W-716

Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning

Cheol Kim · Jai Moondra · Shresth Verma · Madeleine Pollack · Lingkai Kong · Milind Tambe · Swati Gupta

In many real-world applications of Reinforcement Learning (RL), deployed policies have varied impacts on different stakeholders, creating challenges in reaching consensus on how to effectively aggregate their preferences. Generalized $p$-means form a widely used class of social welfare functions for this purpose, with broad applications in fair resource allocation, AI alignment, and decision-making. This class includes well-known welfare functions such as Egalitarian, Nash, and Utilitarian welfare. However, selecting the appropriate social welfare function is challenging for decision-makers, as the structure and outcomes of optimal policies can be highly sensitive to the choice of $p$. To address this challenge, we study the concept of an $\alpha$-approximate portfolio in RL, a set of policies that are approximately optimal across the family of generalized $p$-means for all $p \in [-\infty, 1]$. We propose algorithms to compute such portfolios and provide theoretical guarantees on the trade-offs among approximation factor, portfolio size, and computational efficiency. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of our approach in summarizing the policy space induced by varying $p$ values, empowering decision-makers to navigate this landscape more effectively.

Poster

#W-717

Flow-based Domain Randomization for Learning and Sequencing Robotic Skills

Aidan Curtis · Eric Li · Michael S Noseworthy · Nishad Gothoskar · Sachin Chitta · Hui Li · Leslie Kaelbling · Nicole Carey

Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies learned in simulation. By randomizing properties of the environment during training, the learned policy can be robust to uncertainty along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate the problem of automatically discovering this sampling distribution via entropy-regularized reward maximization of a neural sampling distribution in the form of a normalizing flow. We show that this architecture is more flexible and results in better robustness than existing approaches to learning simple parameterized sampling distributions. We demonstrate that these policies can be used to learn robust policies for contact-rich assembly tasks. Additionally, we explore how these sampling distributions, in combination with a privileged value function, can be used for out-of-distribution detection in the context of an uncertainty-aware multi-step manipulation planner.

Spotlight Poster

#W-718

Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport

Tuan Dam · Pascal Stenger · Lukas Schneider · Joni Pajarinen · Carlo D'Eramo · Odalric-Ambrym Maillard

This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) tailored for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions, to introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of $L^1$-Wasserstein barycenter with $\alpha$-divergence, by drawing a crucial connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence of $\mathcal{O}(n^{-1/2})$, with $n$ as the number of visited trajectories, to the optimal policy and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines.

Poster

#W-719

Subgoal-Guided Policy Heuristic Search with Learned Subgoals

Jake Tuero · Michael Buro · Levi Lelis

Policy tree search is a family of tree search algorithms that use a policy to guide the search. These algorithms provide guarantees on the number of expansions required to solve a given problem that are based on the quality of the policy. While these algorithms have shown promising results, the process in which they are trained requires complete solution trajectories to train the policy. Search trajectories are obtained during a trial-and-error search process. When the training problem instances are hard, learning can be prohibitively costly, especially when starting from a randomly initialized policy. As a result, search samples are wasted in failed attempts to solve these hard instances. This paper introduces a novel method for learning subgoal-based policies for policy tree search algorithms. The subgoals and policies conditioned on subgoals are learned from the trees that the search expands while attempting to solve problems, including the search trees of failed attempts. We empirically show that our policy formulation and training method improve the sample efficiency of learning a policy and heuristic function in this online setting.

Poster

#W-720

Online Robust Reinforcement Learning Through Monte-Carlo Planning

Tuan Dam · Kishan Panaganti · Brahim Driss · Adam Wierman

Monte Carlo Tree Search (MCTS) is a powerful framework for solving complex decision-making problems, yet it often relies on the assumption that the simulator and the real-world dynamics are identical. Although this assumption helps achieve the success of MCTS in games like Chess, Go, and Shogi, the real-world scenarios incur ambiguity due to their modeling mismatches in low-fidelity simulators. In this work, we present a new robust variant of MCTS that mitigates dynamical model ambiguities. Our algorithm addresses transition dynamics and reward distribution ambiguities to bridge the gap between simulation-based planning and real-world deployment. We incorporate a robust power mean backup operator and carefully designed exploration bonuses to ensure finite-sample convergence at every node in the search tree. We show that our algorithm achieves a convergence rate of $\mathcal{O}(n^{-1/2})$ for the value estimation at the root node, comparable to that of standard MCTS. Finally, we provide empirical evidence that our method achieves robust performance in planning problems even under significant ambiguity in the underlying reward distribution and transition dynamics.

Poster

#W-721

DiLQR: Differentiable Iterative Linear Quadratic Regulator via Implicit Differentiation

Shuyuan Wang · Philip D. Loewen · Michael Forbes · Bhushan Gopaluni · Wei Pan

While differentiable control has emerged as a powerful paradigm combining model-free flexibility with model-based efficiency, the iterative Linear Quadratic Regulator (iLQR) remains underexplored as a differentiable component. The scalability of differentiating through extended iterations and horizons poses significant challenges, hindering iLQR from being an effective differentiable controller. This paper introduces DiLQR, a framework that facilitates differentiation through iLQR, allowing it to serve as a trainable and differentiable module, either as or within a neural network. A novel aspect of this framework is the analytical solution that it provides for the gradient of an iLQR controller through implicit differentiation, which ensures a constant backward cost regardless of iteration, while producing an accurate gradient. We evaluate our framework on imitation tasks on famous control benchmarks. Our analytical method demonstrates superior computational performance, achieving up to $\textbf{128x}$ speedup and a minimum of $\textbf{21x}$ speedup compared to automatic differentiation. Our method also demonstrates superior learning performance ($\mathbf{10^6x}$) compared to traditional neural network policies and better model loss with differentiable controllers that lack exact analytical gradients. Furthermore, we integrate our module into a larger network with visual inputs to demonstrate the capacity of our method for high-dimensional, fully end-to-end tasks. Codes can be found on the project homepage~\url{https://sites.google.com/view/dilqr/}.

Poster

#W-800

Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

Ian Gemp · Andreas Haupt · Luke Marris · Siqi Liu · Georgios Piliouras

Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist. Furthermore, equilibria can be approximated empirically by performing gradient descent on an upper bound of exploitability. Our experiments reveal novel solutions to classic repeated normal-form games, find fair solutions in a repeated asymmetric coordination game, and prioritize safe long-term behavior in a robot warehouse environment. In the prisoner's dilemma, our algorithm leverages transient imitation to find a policy profile that deviates from observed human play only slightly, yet achieves higher per-player utility while also being three orders of magnitude less exploitable.

Poster

#W-801

Beyond Self-Interest: How Group Strategies Reshape Content Creation in Recommendation Platforms?

Yaolong Yu · Fan Yao · Sinno Jialin Pan

We employ a game-theoretic framework to study the impact of a specific strategic behavior among creators---group behavior---on recommendation platforms. In this setting, creators within a group collaborate to maximize their collective utility.We show that group behavior has a limited effect on the game's equilibrium when the group size is small. However, when the group size is large, group behavior can significantly alter content distribution and user welfare.Specifically, in a top-$K$ recommendation system with exposure-based rewards, we demonstrate that user welfare can suffer a significant loss due to group strategies, and user welfare does not necessarily increase with larger values of $K$ or more random matching, contrasting sharply with the individual creator case. Furthermore, we investigate user welfare guarantees through the lens of the Price of Anarchy (PoA). In the general case, we establish a negative result on the bound of PoA with exposure rewards, proving that it can be arbitrarily large. We then investigate a user engagement rewarding mechanism, which mitigates the issues caused by large group behavior, showing that $\text{PoA}\leq K+1$ in the general case and $\text{PoA}\leq 2$ in the binary case. Empirical results from simulations further support the effectiveness of the user engagement rewarding mechanism.

Spotlight Poster

#W-803

Reducing Variance of Stochastic Optimization for Approximating Nash Equilibria in Normal-Form Games

Linjian Meng · Wubing Chen · Wenbin Li · Tianpei Yang · Youzhi Zhang · Yang Gao

Nash equilibrium (NE) plays an important role in game theory. How to efficiently compute an NE in NFGs is challenging due to its complexity and non-convex optimization property. Machine Learning (ML), the cornerstone of modern artificial intelligence, has demonstrated remarkable empirical performance across various applications including non-convex optimization. To leverage non-convex stochastic optimization techniques from ML for approximating an NE, various loss functions have been proposed. Among these, only one loss function is unbiased, allowing for unbiased estimation under the sampled play. Unfortunately, this loss function suffers from high variance, which degrades the convergence rate. To improve the convergence rate by mitigating the high variance associated with the existing unbiased loss function, we propose a novel surrogate loss function named Nash Advantage Loss (NAL). NAL is theoretically proved unbiased and exhibits significantly lower variance than the existing unbiased loss function. Experimental results demonstrate that the algorithm minimizing NAL achieves a significantly faster empirical convergence rates compared to other algorithms, while also reducing the variance of estimated loss value by several orders of magnitude.

Poster

#W-804

Function Encoders: A Principled Approach to Transfer Learning in Hilbert Spaces

Tyler Ingebrand · Adam Thorpe · Ufuk Topcu

A central challenge in transfer learning is designing algorithms that can quickly adapt and generalize to new tasks without retraining. Yet, the conditions of when and how algorithms can effectively transfer to new tasks is poorly characterized. We introduce a geometric characterization of transfer in Hilbert spaces and define three types of inductive transfer: interpolation within the convex hull, extrapolation to the linear span, and extrapolation outside the span. We propose a method grounded in the theory of function encoders to achieve all three types of transfer. Specifically, we introduce a novel training scheme for function encoders using least-squares optimization, prove a universal approximation theorem for function encoders, and provide a comprehensive comparison with existing approaches such as transformers and meta-learning on four diverse benchmarks. Our experiments demonstrate that the function encoder outperforms state-of-the-art methods on four benchmark tasks and on all three types of transfer.

Poster

#W-805

Permutation Equivariant Neural Networks for Symmetric Tensors

Edward Pearce-Crump

Incorporating permutation equivariance into neural networks has proven to be useful in ensuring that models respect symmetries that exist indata. Symmetric tensors, which naturally appearin statistics, machine learning, and graph theory,are essential for many applications in physics,chemistry, and materials science, amongst others. However, existing research on permutationequivariant models has not explored symmetrictensors as inputs, and most prior work on learningfrom these tensors has focused on equivarianceto Euclidean groups. In this paper, we presenttwo different characterisations of all linear permutation equivariant functions between symmetricpower spaces of $\mathbb{R}^{n}$. We show on two tasks thatthese functions are highly data efficient comparedto standard MLPs and have potential to generalisewell to symmetric tensors of different sizes.

Poster

#W-806

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

Jingfeng Wu · Peter Bartlett · Matus Telgarsky · Bin Yu

In overparameterized logistic regression, gradient descent (GD) iterates diverge in norm while converging in direction to the maximum $\ell_2$-margin solution---a phenomenon known as the implicit bias of GD. This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression. We first demonstrate that the excess logistic risk vanishes for early stopped GD but diverges to infinity for GD iterates at convergence. This suggests that early stopped GD is well-calibrated, whereas asymptotic GD is statistically inconsistent. Second, we show that to attain a small excess zero-one risk, polynomially many samples are sufficient for early stopped GD, while exponentially many samples are necessary for any interpolating estimator, including asymptotic GD. This separation underscores the statistical benefits of early stopping in the overparameterized regime. Finally, we establish nonasymptotic bounds on the norm and angular differences between early stopped GD and $\ell_2$-regularized empirical risk minimizer, thereby connecting the implicit regularization of GD with explicit $\ell_2$-regularization.

Poster

#W-807

A Rescaling-Invariant Lipschitz Bound Based on Path-Metrics for Modern ReLU Network Parameterizations

Antoine Gonon · Nicolas Brisebarre · Elisa Riccietti · Rémi Gribonval

Robustness with respect to weight perturbations underpins guarantees for generalization, pruning and quantization. Existingguarantees rely on *Lipschitz bounds in parameter space*, cover only plain feed-forward MLPs, and break under the ubiquitous neuron-wise rescaling symmetry of ReLU networks. We prove a new Lipschitz inequality expressed through the $\ell^{1}$-*path-metric* of the weights. The bound is (i) *rescaling-invariant* by construction and (ii) applies to any ReLU-DAG architecture with any combination of convolutions, skip connections, pooling, and frozen (inference-time) batch-normalization —thus encompassing ResNets, U-Nets, VGG-style CNNs, and more. By respecting the network’s natural symmetries, the new bound strictly sharpens prior parameter-space bounds and can be computed in two forward passes. To illustrate its utility, we derive from it a symmetry-aware pruning criterion andshow—through a proof-of-concept experiment on a ResNet-18 trained on ImageNet—that its pruning performance matches that of classical magnitude pruning, while becoming totally immune to arbitrary neuron-wise rescalings.

Poster

#W-808

Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines

Sho Sonoda · Yuka Hashimoto · Isao Ishikawa · Masahiro Ikeda

We present a constructive universal approximation theorem for learning machines equipped with joint-group-equivariant feature maps, called the joint-equivariant machines, based on the group representation theory. ``Constructive'' here indicates that the distribution of parameters is given in a closed-form expression known as the ridgelet transform. Joint-group-equivariance encompasses a broad class of feature maps that generalize classical group-equivariance. Particularly, fully-connected networks are *not* group-equivariant *but* are joint-group-equivariant. Our main theorem also unifies the universal approximation theorems for both shallow and deep networks. Until this study, the universality of deep networks has been shown in a different manner from the universality of shallow networks, but our results discuss them on common ground. Now we can understand the approximation schemes of various learning machines in a unified manner. As applications, we show the constructive universal approximation properties of four examples: depth-$n$ joint-equivariant machine, depth-$n$ fully-connected network, depth-$n$ group-convolutional network, and a new depth-$2$ network with quadratic forms whose universality has not been known.

Poster

#W-809

Eigen Analysis of Conjugate Kernel and Neural Tangent Kernel

Xiangchao Li · Xiao Han · Qing Yang

In this paper, we investigate deep feedforward neural networks with random weights. The input data matrix $\boldsymbol{X}$ is drawn from a Gaussian mixture model. We demonstrate that certain eigenvalues of the conjugate kernel and neural tangent kernel may lie outside the support of their limiting spectral measures in the high-dimensional regime. The existence and asymptotic positions of such isolated eigenvalues are rigorously analyzed. Furthermore, we provide a precise characterization of the entrywise limit of the projection matrix onto the eigenspace associated with these isolated eigenvalues. Our findings reveal that the eigenspace captures inherent group features present in $\boldsymbol{X}$. This study offers a quantitative analysis of how group features from the input data evolve through hidden layers in randomly weighted neural networks.

Poster

#W-810

Towards characterizing the value of edge embeddings in Graph Neural Networks

Dhruv Rohatgi · Tanya Marwah · Zachary Lipton · Jianfeng Lu · Ankur Moitra · Andrej Risteski

Graph neural networks (GNNs) are the dominant approach to solving machine learning problems defined over graphs. Despite much theoretical and empirical work in recent years, our understanding of finer-grained aspects of architectural design for GNNs remains impoverished. In this paper, we consider the benefits of architectures that maintain and update edge embeddings. On the theoretical front, under a suitable computational abstraction for a layer in the model, as well as memory constraints on the embeddings, we show that there are natural tasks on graphical models for which architectures leveraging edge embeddings can be much shallower. Our techniques are inspired by results on time-space tradeoffs in theoretical computer science. Empirically, we show architectures that maintain edge embeddings almost always improve on their node-based counterparts---frequently significantly so in topologies that have "hub" nodes.

Poster

#W-811

Optimal Task Order for Continual Learning of Multiple Tasks

Ziyan Li · Naoki Hiratani

Continual learning of multiple tasks remains a major challenge for neural networks. Here, we investigate how task order influences continual learning and propose a strategy for optimizing it. Leveraging a linear teacher-student model with latent factors, we derive an analytical expression relating task similarity and ordering to learning performance. Our analysis reveals two principles that hold under a wide parameter range: (1) tasks should be arranged from the least representative to the most typical, and (2) adjacent tasks should be dissimilar. We validate these rules on both synthetic data and real-world image classification datasets (Fashion-MNIST, CIFAR-10, CIFAR-100), demonstrating consistent performance improvements in both multilayer perceptrons and convolutional neural networks. Our work thus presents a generalizable framework for task-order optimization in task-incremental continual learning.

Spotlight Poster

#W-812

Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

Fabiola Ricci · Lorenzo Bardone · Sebastian Goldt

Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep convolutional neural networks from natural images resemble those learnt by independent component analysis (ICA), a simple unsupervised method that seeks the most non-Gaussian projections of its inputs. This similarity suggests that ICA provides a simple, yet principled model for studying feature learning. Here, we leverage this connection to investigate the interplay between data structure and optimisation in feature learning for the most popular ICA algorithm, FastICA, and stochastic gradient descent (SGD), which is used to train deep networks. We rigorously establish that FastICA requires at least $n\gtrsim d^4$ samples to recover a single non-Gaussian direction from $d$-dimensional inputs on a simple synthetic data model. We show that vanilla online SGD outperforms FastICA, and prove that the optimal sample complexity $n\gtrsim d^2$ can be reached by smoothing the loss, albeit in a data-dependent way. We finally demonstrate the existence of a search phase for FastICA on ImageNet, and discuss how the strong non-Gaussianity of said images compensates for the poor sample complexity of FastICA.

Spotlight Poster

#W-813

Algorithms with Calibrated Machine Learning Predictions

Judy Hanwen Shen · Ellen Vitercik · Anders Wikum

The field of algorithms with predictions incorporates machine learning advice in the design of online algorithms to improve real-world performance. A central consideration is the extent to which predictions can be trusted—while existing approaches often require users to specify an aggregate trust level, modern machine learning models can provide estimates of prediction-level uncertainty. In this paper, we propose calibration as a principled and practical tool to bridge this gap, demonstrating the benefits of calibrated advice through two case studies: the ski rental and online job scheduling problems. For ski rental, we design an algorithm that achieves near-optimal prediction-dependent performance and prove that, in high-variance settings, calibrated advice offers more effective guidance than alternative methods for uncertainty quantification. For job scheduling, we demonstrate that using a calibrated predictor leads to significant performance improvements over existing methods. Evaluations on real-world data validate our theoretical findings, highlighting the practical impact of calibration for algorithms with predictions.

Poster

#W-815

On the Similarities of Embeddings in Contrastive Learning

Chungpa Lee · Sehee Lim · Kibok Lee · Jy-yong Sohn

Contrastive learning (CL) operates on a simple yet effective principle: embeddings of positive pairs are pulled together, while those of negative pairs are pushed apart. Although various forms of contrastive loss have been proposed and analyzed from different perspectives, prior works lack a comprehensive framework that systematically explains a broad class of these objectives. In this paper, we present a unified framework for understanding CL, which is based on analyzing the cosine similarity between embeddings of positive and negative pairs. In full-batch settings, we show that perfect alignment of positive pairs is unattainable when similarities of negative pairs fall below a certain threshold, and that this misalignment can be alleviated by incorporating within-view negative pairs. In mini-batch settings, we demonstrate that smaller batch sizes incur stronger separation among negative pairs within batches, which leads to higher variance in similarities of negative pairs. To address this limitation of mini-batch CL, we introduce an auxiliary loss term that reduces the variance of similarities of negative pairs in CL. Empirical results demonstrate that incorporating the proposed loss consistently improves the performance of CL methods in small-batch training.

Poster

#W-816

Measuring Diversity: Axioms and Challenges

Mikhail Mironov · Liudmila Prokhorenkova

This paper addresses the problem of quantifying diversity for a set of objects. First, we conduct a systematic review of existing diversity measures and explore their undesirable behavior in certain cases. Based on this review, we formulate three desirable properties (axioms) of a reliable diversity measure: monotonicity, uniqueness, and continuity. We show that none of the existing measures has all three properties and thus these measures are not suitable for quantifying diversity. Then, we construct two examples of measures that have all the desirable properties, thus proving that the list of axioms is not self-contradictory. Unfortunately, the constructed examples are too computationally expensive (NP-hard) for practical use. Thus, we pose an open problem of constructing a diversity measure that has all the listed properties and can be computed in practice or proving that all such measures are NP-hard to compute.

Poster

#W-817

World Model Implanting for Test-time Adaptation of Embodied Agents

Minjong Yoo · Jinwoo Jang · Sihyung Yoon · Honguk Woo

In embodied AI, a persistent challenge is enabling agents to robustly adapt to novel domains without requiring extensive data collection or retraining. To address this, we present a world model implanting framework (WorMI) that combines the reasoning capabilities of large language models (LLMs) with independently learned, domain-specific world models through test-time composition. By allowing seamless implantation and removal of the world models, the embodied agent's policy achieves and maintains cross-domain adaptability. In the WorMI framework, we employ a prototype-based world model retrieval approach, utilizing efficient trajectory-based abstract representation matching, to incorporate relevant models into test-time composition. We also develop a world-wise compound attention method that not only integrates the knowledge from the retrieved world models but also aligns their intermediate representations with the reasoning model's representation within the agent's policy. This framework design effectively fuses domain-specific knowledge from multiple world models, ensuring robust adaptation to unseen domains. We evaluate our WorMI on the VirtualHome and ALFWorld benchmarks, demonstrating superior zero-shot and few-shot performance compared to several LLM-based approaches across a range of unseen domains. These results highlight the framework’s potential for scalable, real-world deployment in embodied agent scenarios where adaptability and data efficiency are essential.

Poster

#W-818

Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

Jinmin He · Kai Li · Yifan Zang · Haobo Fu · Qiang Fu · Junliang Xing · Jian Cheng

Offline multi-task reinforcement learning aims to learn a unified policy capable of solving multiple tasks using only pre-collected task-mixed datasets, without requiring any online interaction with the environment. However, it faces significant challenges in effectively sharing knowledge across tasks. Inspired by the efficient knowledge abstraction observed in human learning, we propose Goal-Oriented Skill Abstraction (GO-Skill), a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance. Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library. To mitigate class imbalances between broadly applicable and task-specific skills, we introduce a skill enhancement phase to refine the extracted skills. Furthermore, we integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks. Extensive experiments on diverse robotic manipulation tasks within the MetaWorld benchmark demonstrate the effectiveness and versatility of GO-Skill.

Poster

#W-819

Benchmarking Quantum Reinforcement Learning

Nico Meyer · Christian Ufrecht · George Yammine · Georgios Kontes · Christopher Mutschler · Daniel Scherer

Benchmarking and establishing proper statistical validation metrics for reinforcement learning (RL) remain ongoing challenges, where no consensus has been established yet. The emergence of quantum computing and its potential applications in quantum reinforcement learning (QRL) further complicate benchmarking efforts. To enable valid performance comparisons and to streamline current research in this area, we propose a novel benchmarking methodology, which is based on a statistical estimator for sample complexity and a definition of statistical outperformance. Furthermore, considering QRL, our methodology casts doubt on some previous claims regarding its superiority. We conducted experiments on a novel benchmarking environment with flexible levels of complexity. While we still identify possible advantages, our findings are more nuanced overall. We discuss the potential limitations of these results and explore their implications for empirical research on quantum advantage in QRL.

Poster

#W-820

Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization

Yucheng Yang · Tianyi Zhou · Mykola Pechenizkiy · Meng Fang

Practical reinforcement learning (RL) usually requires agents to be optimized for multiple potentially conflicting criteria, e.g. speed vs. safety. Although Multi-Objective RL (MORL) algorithms have been studied in previous works, their trained agents often cover limited Pareto optimal solutions and they lack precise controllability of the delicate trade-off among multiple objectives. Hence, the resulting agent is not versatile in aligning with customized requests from different users. To bridge the gap, we develop the ``Preference controllable (PC) RL'' framework, which trains a preference-conditioned meta-policy that takes user preference as input controlling the generated trajectories within the preference region on the Pareto frontier. The PCRL framework is compatible with advanced Multi-Objective Optimization~(MOO) algorithms that are rarely seen in previous MORL approaches. We also proposed a novel preference-regularized MOO algorithm specifically for PCRL. We provide a comprehensive theoretical analysis to justify its convergence and preference controllability.We evaluate PCRL with different MOO algorithms against state-of-the-art MORL baselines in various challenging environments with up to six objectives. In these experiments, our proposed method exhibits significantly better controllability than existing approaches and can generate Pareto solutions with better diversity and utilities.

Poster

#W-821

An Analysis of Quantile Temporal-Difference Learning

Mark Rowland · Remi Munos · Mohammad Gheshlaghi Azar · Yunhao Tang · Georg Ostrovski · Anna Harutyunyan · Karl Tuyls · Marc G. Bellemare · Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis.

Poster

#W-900

The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli

Doron Cohen · Aryeh Kontorovich · Roi Weiss

We revisit the recently introduced Local Glivenko-Cantelli setting, which studies distribution-dependent uniform convergence rates of the Empirical Mean Estimator (EME). In this work, we investigate generalizations of this setting where arbitrary estimators are allowed rather than just the EME. Can a strictly larger class of measures be learned? Can better risk decay rates be obtained? We provide exhaustive answers to these questions—which are both negative, provided the learner is barred from exploiting some infinite-dimensional pathologies. On the other hand, allowing such exploits does lead to a strictly larger class of learnable measures.

Poster

#W-901

Design Considerations in Offline Preference-based RL

Alekh Agarwal · Christoph Dann · Teodor Vanislavov Marinov

Offline algorithms for Reinforcement Learning from Human Preferences (RLHF), which use only a fixed dataset of sampled responses given an input, and preference feedback among these responses, have gained increasing prominence in the literature on aligning language models. In this paper, we study how the different design choices made in methods such as DPO, IPO, SLiC and many variants influence the quality of the learned policy, from a theoretical perspective. Our treatment yields insights into the choices of loss function, the policy which is used to normalize log-likelihoods, and also the role of the data sampling policy. Notably, our results do not rely on the standard reparameterization-style arguments used to motivate some of the algorithms in this family, which allows us to give a unified treatment to a broad class of methods. We also conduct a small empirical study to verify some of the theoretical findings on a standard summarization benchmark.

Poster

#W-902

Adversarial Robust Generalization of Graph Neural Networks

Chang Cao · Han Li · Yulong Wang · Rui Wu · Hong Chen

While Graph Neural Networks (GNNs) have shown outstanding performance in node classification tasks, they are vulnerable to adversarial attacks, which are imperceptible changes to input samples. Adversarial training, as a widely used tool to enhance the adversarial robustness of GNNs, has presented remarkable effectiveness in node classification tasks. However, the generalization properties for explaining their behaviors remain not well understood from the theoretical viewpoint. To fill this gap, we develop a high probability generalization bound of general GNNs in adversarial learning through covering number analysis. We estimate the covering number of the GNN model class based on the entire perturbed feature matrix by constructing a cover for the perturbation set. Our results are generally applicable to a series of GNNs. We demonstrate their applicability by investigating the generalization performance of several popular GNN models under adversarial attacks, which reveal the architecture-related factors influencing the generalization gap. Our experimental results on benchmark datasets provide evidence that supports the established theoretical findings.

Poster

#W-903

Ehrenfeucht-Haussler Rank and Chain of Thought

Pablo Barcelo · Alexander Kozachinskiy · Tomasz Steifer

The notion of _rank_ of a Boolean function has been a cornerstone in PAC learning, enabling quasipolynomial-time learning algorithms for polynomial-size decision trees. We present a novel characterization of rank, grounded in the well-known Transformer architecture. We show that the rank of a function $f$ corresponds to the minimum number of _Chain of Thought_ (CoT) steps required by a single-layer Transformer with hard attention to compute $f$. Based on this characterization we establish tight bounds on the number of CoT steps required for specific problems, showing that $\ell$-fold function composition necessitates exactly $\ell$ CoT steps. Furthermore, we analyze the problem of identifying the position of the $k$-th occurrence of 1 in a Boolean sequence, proving that it requires $k$ CoT steps.

Poster

#W-904

Transfer Q-Learning with Composite MDP Structures

Jinhang Chai · Elynn Chen · Lin Yang

To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. This relaxes the common assumption of purely low-rank transition models, allowing for more realistic scenarios where tasks share core dynamics but maintain individual variations. We introduce UCB-TQL (Upper Confidence Bound Transfer Q-Learning), designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions. When applying UCB-TQL to a target task after training on a source task with sufficient trajectories, we achieve a regret bound of $\tilde{\mathcal{O}}(\sqrt{eH^5N})$ that scales independently of the ambient dimension. Here, $N$ represents the number of trajectories in the target task, while $e$ quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.

Spotlight Poster

#W-905

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently

Yuanhe Zhang · Fanghui Liu · Yudong Chen

This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA) (Hu et al., 2022) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately—applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of such algorithms. LoRA-One achieves significant empirical improvements over LoRA and its variants across benchmarks in natural language understanding, mathematical reasoning, and code generation. Code is available at: https://github.com/YuanheZ/LoRA-One.

Poster

#W-906

Weak-to-Strong Generalization Even in Random Feature Networks, Provably

Marko Medvedev · Kaifeng Lyu · Dingli Yu · Sanjeev Arora · Zhiyuan Li · Nati Srebro

Weak-to-Strong Generalization (Burns et al.,2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a complex and pretrained learner like GPT-4, can arise even in simple non-pretrained models, simply due to the size advantage of the student. But, we also show that there are inherint limits to the extent of such weak to strong generalization. We consider students and teachers that are random feature models, described by two-layer networks with a random and fixed bottom layer and trained top layer. A ‘weak’ teacher, with a small number of units (i.e. random features), is trained on the population, and a ‘strong’ student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. We then show the quantitative limits of weak-to-strong generalization in this model, and in fact in a much broader class of models, for arbitrary teacher and student feature spaces and a broad class of learning rules, including when the student features are pre-trained or otherwise more informative. In particular, we show that in such models the student’serror can only approach zero if the teacher’s error approaches zero, and a strong student cannot “boost” a slightly-better-then-chance teacher to obtain a small error.

Poster

#W-907

Generalization and Robustness of the Tilted Empirical Risk

Gholamali Aminian · Amir R. Asadi · Tian Li · Ahmad Beirami · Gesine Reinert · Samuel Cohen

The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, Li et al. (2021) proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error of the tilted empirical risk in the robustness regime under \textit{negative tilt}. Our first contribution is to provide uniform and information-theoretic bounds on the {\it tilted generalization error}, defined as the difference between the population risk and the tilted empirical risk, under negative tilt for unbounded loss function under bounded $(1+\epsilon)$-th moment of loss function for some $\epsilon\in(0,1]$ with a convergence rate of $O(n^{-\epsilon/(1+\epsilon)})$ where $n$ is the number of training samples, revealing a novel application for TER under no distribution shift. Secondly, we study the robustness of the tilted empirical risk with respect to noisy outliers at training time and provide theoretical guarantees under distribution shift for the tilted empirical risk. We empirically corroborate our findings in simple experimental setups where we evaluate our bounds to select the value of tilt in a data-driven manner.

Poster

#W-908

Stability and Generalization Analysis of Decentralized SGD: Sharper Bounds Beyond Lipschitzness and Smoothness

Shuang Zeng · Yunwen Lei

Decentralized SGD (D-SGD) is a popular optimization method to train large-scale machine learning models. In this paper, we study the generalization behavior of D-SGD for both smooth and nonsmooth problems by leveraging the algorithm stability. For convex and smooth problems, we develop stability bounds involving the training errors to show the benefit of optimization in generalization. This improves the existing results by removing the Lipschitzness assumption and implying fast rates in a low-noise condition. We also develop the first optimal stability-based generalization bounds for D-SGD applied to nonsmooth problems. We further develop optimization error bounds which imply minimax optimal excess risk rates. Our novelty in the analysis consists of an error decomposition to use the co-coercivity of functions as well as the control of a neighboring-consensus error.

Poster

#W-909

Understanding Generalization in Quantum Machine Learning with Margins

TAK HUR · Daniel Kyungdeock Park

Understanding and improving generalization capabilities is crucial for both classical and quantum machine learning (QML). Recent studies have revealed shortcomings in current generalization theories, particularly those relying on uniform bounds, across both classical and quantum settings. In this work, we present a margin-based generalization bound for QML models, providing a more reliable framework for evaluating generalization. Our experimental studies on the quantum phase recognition dataset demonstrate that margin-based metrics are strong predictors of generalization performance, outperforming traditional metrics like parameter count. By connecting this margin-based metric to quantum information theory, we demonstrate how to enhance the generalization performance of QML through a classical-quantum hybrid approach when applied to classical data.

Poster

#W-910

Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings

Minh Hieu Nong · Antoine Ledent

Contrastive Representation Learning (CRL) has achieved impressive success in various domains in recent years. Nevertheless, the theoretical understanding of the generalization behavior of CRL has remained limited. Moreover, to the best of our knowledge, the current literature only analyzes generalization bounds under the assumption that the data tuples used for contrastive learning are independently and identically distributed. However, in practice, we are often limited to a fixed pool of reusable labeled data points, making it inevitable to recycle data across tuples to create sufficiently large datasets. Therefore, the tuple-wise independence condition imposed by previous works is invalidated. In this paper, we provide a generalization analysis for the CRL framework under non-$i.i.d.$ settings that adheres to practice more realistically. Drawing inspiration from the literature on U-statistics, we derive generalization bounds which indicate that the required number of samples in each class scales as the logarithm of the covering number of the class of learnable feature representations associated to that class. Next, we apply our main results to derive excess risk bounds for common function classes such as linear maps and neural networks.

Poster

#W-911

A Unified Framework for Generalization Error Analysis of Learning with Arbitrary Discrete Weak Features

Kosuke Sugiyama · Masato Uchida

In many real-world applications, predictive tasks inevitably involve low-quality input features (Weak Features; WFs) which arise due to factors such as misobservations, missingness, or partial observations. While several methods have been proposed to estimate the true values of specific types of WFs and to solve a downstream task, a unified theoretical framework that comprehensively addresses these methods remains underdeveloped. In this paper, we propose a unified framework called Weak Features Learning (WFL), which accommodates arbitrary discrete WFs and a broad range of learning algorithms, and we demonstrate its validity. Furthermore, we introduce a class of algorithms that learn both the estimation model for WFs and the predictive model for a downstream task and perform a generalization error analysis under finite-sample conditions. Our results elucidate the interdependencies between the estimation errors of WFs and the prediction error of a downstream task, as well as the theoretical conditions necessary for the learning approach to achieve consistency. This work establishes a unified theoretical foundation, providing generalization error analysis and performance guarantees, even in scenarios where WFs manifest in diverse forms.

Poster

#W-912

On the Convergence of Continuous Single-timescale Actor-critic

Xuyang Chen · Lin Zhao

Actor-critic algorithms have been instrumental in boosting the performance of numerous challenging applications involving continuous control, such as highly robust and agile robot motion control. However, their theoretical understanding remains largely underdeveloped. Existing analyses mostly focus on finite state-action spaces and on simplified variants of actor-critic, such as double-loop updates with i.i.d. sampling, which are often impractical for real-world applications.We consider the canonical and widely adopted single-timescale updates with Markovian sampling in continuous state-action space. Specifically, we establish finite-time convergence by introducing a novel Lyapunov analysis framework, which provides a unified convergence characterization of both the actor and the critic. Our approach is less conservative than previous methods and offers new insights into the coupled dynamics of actor-critic updates.

Poster

#W-913

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

Yijun Dong · Yicheng Li · Yunai Li · Jason Lee · Qi Lei

Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces $\mathcal{V}_s, \mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap \mathcal{V}_w$, while reduced by a factor of $\mathrm{dim}(\mathcal{V}_s)/N$ in the subspace of discrepancy $\mathcal{V}_w \setminus \mathcal{V}_s$ with $N$ pseudo-labels for W2S. Our analysis further casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported by experiments on synthetic regression problems, as well as real vision and NLP tasks.

Poster

#W-914

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Rudrajit Das · Inderjit Dhillon · Alessandro Epasto · Adel Javanmard · Jieming Mao · Vahab Mirrokni · Sujay Sanghavi · Peilin Zhong

The performance of a model trained with noisy labels is often improved by simply *retraining* the model with its *own predicted hard labels* (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable binary classification setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with local label differential privacy (DP), which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at no extra privacy cost; we call this consensus-based retraining. For example, when training ResNet-18 on CIFAR-100 with $\epsilon=3$ label DP, we obtain more than $6$% improvement in accuracy with consensus-based retraining.

Poster

#W-915

Revisiting Differentially Private Algorithms for Decentralized Online Learning

Xiaoyu Wang · Wenhao Yang · Chang Yao · Mingli Song · Yuanyu Wan

Although the differential privacy (DP) of decentralized online learning has garnered considerable attention recently, existing algorithms are unsatisfactory due to their inability to achieve $(\epsilon, 0)$-DP over all $T$ rounds, recover the optimal regret in the non-private case, and maintain the lightweight computation under complex constraints. To address these issues, we first propose a new decentralized online learning algorithm satisfying $(\epsilon, 0)$-DP over $T$ rounds, and show that it can achieve $\widetilde{O}(n(\rho^{-1/4}+\epsilon^{-1}\rho^{1/4})\sqrt{T})$ and $\widetilde{O}(n (\rho^{-1/2}+\epsilon^{-1}))$ regret bounds for convex and strongly convex functions respectively, where $n$ is the number of local learners and $\rho$ is the spectral gap of the communication matrix. As long as $\epsilon=\Omega(\sqrt{\rho})$, these bounds nearly match existing lower bounds in the non-private case, which implies that $(\epsilon, 0)$-DP of decentralized online learning may be ensured nearly for free. Our key idea is to design a block-decoupled accelerated gossip strategy that can be incorporated with the classical tree-based private aggregation, and also enjoys a faster average consensus among local learners. Furthermore, we develop a projection-free variant of our algorithm to keep the efficiency under complex constraints. As a trade-off, the above regret bounds degrade to $\widetilde{O}(n(T^{3/4}+\epsilon^{-1}T^{1/4}))$ and $\widetilde{O}(n(T^{2/3}+\epsilon^{-1}))$ respectively, which however are even better than the existing private centralized projection-free online algorithm.

Poster

#W-916

Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems

Mohammadreza Daneshvaramoli · Helia Karisani · Adam Lechowicz · Bo Sun · Cameron Musco · Mohammad Hajiesmaili

This paper introduces a family of learning-augmented algorithms for online knapsack problems that achieve near Pareto-optimal consistency-robustness trade-offs through a simple combination of trusted learning-augmented and worst-case algorithms. Our approach relies on succinct, practical predictions—single values or intervals estimating the minimum value of any item in an offline solution. Additionally, we propose a novel fractional-to-integral conversion procedure, offering new insights for online algorithm design.

Poster

#W-917

Improved Online Confidence Bounds for Multinomial Logistic Bandits

Joongkyu Lee · Min-hwan Oh

In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly minimax-optimal regret in MNL bandits. However, their results still depend on the norm-boundedness of the unknown parameter $B$ and the maximum size of possible outcomes $K$. To address this, we first derive an online confidence bound of $\mathcal{O} (\sqrt{d \log t} + B )$, which is a significant improvement over the previous bound of $\mathcal{O} (B \sqrt{d} \log t \log K )$ (Lee & Oh, 2024). This is mainly achieved by establishing tighter self-concordant properties of the MNL loss and introducing a novel intermediary term to bound the estimation error. Using this new online confidence bound, we propose a constant-time algorithm, **OFU-MNL++**, which achieves a variance-dependent regret bound of $\mathcal{O} \Big( d \log T \sqrt{ \sum_{t=1}^T \sigma_t^2 } \Big) $ for sufficiently large $T$, where $\sigma_t^2$ denotes the variance of the rewards at round $t$, $d$ is the dimension of the contexts, and $T$ is the total number of rounds. Furthermore, we introduce an Maximum Likelihood Estimation (MLE)-based algorithm that achieves an anytime, **OFU-M$^2$NL**, $\operatorname{poly}(B)$-free regret of $\mathcal{O} \Big( d \log (BT) \sqrt{ \sum_{t=1}^T \sigma_t^2 } \Big) $.

Poster

#W-918

A Parametric Contextual Online Learning Theory of Brokerage

François Bachoc · Tommaso Cesari · Roberto Colomboni

We study the role of contextual information in the online learning problem of brokerage between traders.In this sequential problem, at each time step, two traders arrive with secret valuations about an asset they wish to trade.The learner (a broker) suggests a trading (or brokerage) price based on contextual data about the asset and the market conditions.Then, the traders reveal their willingness to buy or sell based on whether their valuations are higher or lower than the brokerage price.A trade occurs if one of the two traders decides to buy and the other to sell, i.e., if the broker's proposed price falls between the smallest and the largest of their two valuations.We design algorithms for this problem and prove optimal theoretical regret guarantees under various standard assumptions.

Poster

#W-919

Feasible Action Search for Bandit Linear Programs via Thompson Sampling

Aditya Gangrade · Aldo Pacchiano · Clay Scott · Venkatesh Saligrama

We study the 'feasible action search' (FAS) problem for linear bandits, wherein a learner attempts to discover a feasible point for a set of linear constraints $\Phi_* a \ge 0,$ without knowledge of the matrix $\Phi_* \in \mathbb{R}^{m \times d}$. A FAS learner selects a sequence of actions $a_t,$ and uses observations of the form $\Phi_* a_t + \mathrm{noise}$ to either find a point with nearly optimal 'safety margin', or detect that the constraints are infeasible, where the safety margin of an action measures its (signed) distance from the constraint boundary. While of interest in its own right, the FAS problem also directly addresses a key deficiency in the extant theory of 'safe linear bandits' (SLBs), by discovering a safe initialisation for low-regret SLB methods.We propose and analyse a novel efficient FAS-learner. Our method, FAST, is based on Thompson Sampling. It applies a _coupled_ random perturbation to an estimate of $\Phi_*,$ and plays a maximin point of a game induced by this perturbed matrix. We prove that FAST stops in $\tilde{O}(d^3/\varepsilon^2 M_*^2)$ steps, and incurs $\tilde{O}(d^3/|M_*|)$ safety costs, to either correctly detect infeasibility, or output a point that is at least $(1-\varepsilon) M_*$-safe, where $M_*$ is the _optimal safety margin_ of $\Phi_*$. Further, instantiating prior SLB methods with the output of FAS yields the first SLB methods that incur $\tilde{O}(\sqrt{d^3 T/M_*^2})$ regret and $O(1)$ risk without a priori knowledge of a safe action. The main technical novelty lies in the extension of Thompson Sampling to this multiobjective setting, for which we both propose a coupled noise design, and provide an analysis that avoids convexity considerations.

Poster

#W-920

Tokenized Bandit for LLM Decoding and Alignment

Suho Shin · Chenghao Yang · Haifeng Xu · MohammadTaghi Hajiaghayi

We introduce the tokenized linear bandit (TLB) and multi-armed bandit (TMAB), variants of linear and stochastic multi-armed bandit problems inspired by LLM decoding and alignment. In these problems, at each round $t \in [T]$, a user submits a query (context), and the decision maker (DM) sequentially selects a token irrevocably from a token set. Once the sequence is complete, the DM observes a random utility from the user, whose expectation is presented by a sequence function mapping the chosen token sequence to a nonnegative real value that depends on the query.In both problems, we first show that learning is impossible without any structure on the sequence function.We introduce a natural assumption, diminishing distance with more commons (DDMC), and propose algorithms with regret $\tilde{O}(L\sqrt{T})$ and $\tilde{O}(L\sqrt{T^{2/3}})$ for TLB and TMAB, respectively.As a side product, we obtain an (almost) optimality of the greedy decoding for LLM decoding algorithm under DDMC, which justifies the unresaonable effectiveness of greedy decoding in several tasks.This also has an immediate application to decoding-time LLM alignment, when the misaligned utility can be represented as the frozen LLM's utility and a linearly realizable latent function.We finally validate our algorithm's performance empirically as well as verify our assumptions using synthetic and real-world datasets.

Poster

#W-921

Online Linear Classification with Massart Noise

Ilias Diakonikolas · Vasilis Kontonis · Christos Tzamos · Nikos Zarifis

We study the task of online learning in the presence of Massart noise. Specifically, instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\boldsymbol{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\boldsymbol{x}$ with unknown probability {\em at most} $\eta$. We focus on the fundamental class of $\gamma$-margin linear classifiers and present the first computationally efficient algorithm that achieves mistake bound $\eta T + o(T)$. We point out that the mistake bound achieved by our algorithm is qualitatively tight for computationally efficient algorithms; this follows from the fact that, even in the offline setting, achieving 0-1 error better than $\eta$ requires super-polynomial time under standard complexity assumptions.We extend our online learning model to a $k$-arm contextual bandit setting where the rewards---instead of satisfying commonly used realizability assumptions---are consistent, in expectation, with some linear ranking function with weight vector $\boldsymbol{w}^\ast$. Given a list of contexts $\boldsymbol{x}_1,\ldots \boldsymbol{x}_k$, if $\boldsymbol{w}^*\cdot \boldsymbol{x}_i > \boldsymbol{w}^* \cdot \boldsymbol{x}_j$, the expected reward of action $i$must be larger than that of $j$ by at least $\Delta$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k)~ \Delta T - o(T)$ bigger than choosing a random action at every round.

Poster Session

Poster Session 2 West

West Exhibition Hall B2-B3