Poster
The Sparse-Plus-Low-Rank Quasi-Newton Method for Entropic-Regularized Optimal Transport
Chenrui Wang · Yixuan Qiu
West Exhibition Hall B2-B3 #W-620
The entropic-regularized optimal transport (OT) has gained massive attention in machine learning due to its ability to provide scalable solutions for OT-based tasks. However, most of the existing algorithms, including the Sinkhorn algorithm and its extensions, suffer from relatively slow convergence in many cases. More recently, some second-order methods have been proposed based on the idea of Hessian sparsification. Despite their promising results, they have two major issues: first, there is limited theoretical understanding on the effect of sparsification; second, in cases where the transport plan is dense, Hessian sparsification does not perform well. In this paper, we propose a new quasi-Newton method to address these problems. First, we develop new theoretical analyses to understand the benefits of Hessian sparsification, which lays the foundation for highly flexible sparsification schemes. Then we introduce an additional low-rank term in the approximate Hessian to better handle the dense case. Finally, the convergence properties of the proposed algorithm are rigorously analyzed, and various numerical experiments are conducted to demonstrate its improved performance in solving large-scale OT problems.
Optimal transport is a powerful mathematical tool used to compare data distributions — imagine matching piles of sand to holes with minimal effort. It is used in machine learning to compare images, text, or any data with structure. A modern version called entropic-regularized optimal transport makes this task faster and more stable, but the popular algorithms used for it can still be slow in practice.This paper introduces a new algorithm that speeds things up by improving how the system handles large numerical objects in the computation. The key idea is to approximate a large matrix, called the Hessian matrix, in a smarter way. We use the sum of two simpler structures to approximate the Hessian matrix — one is a sparse matrix that contains many zero elements, and the other is a low-rank matrix that can be represented by simple vectors. In this way, we can capture both the small important details and the overall big picture of the large Hessian matrix.Our new method is faster, more reliable, and better suited for large-scale data tasks. This makes optimal transport more practical for real-world machine learning problems.