ICML Poster Dimension-Free Adaptive Subgradient Methods with Frequent Directions

Poster

Dimension-Free Adaptive Subgradient Methods with Frequent Directions

Sifan Yang · Yuanyu Wan · Peijia Li · Yibo Wang · Xiao Zhang · Zhewei Wei · Lijun Zhang

West Exhibition Hall B2-B3 #W-912

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: In this paper, we investigate the acceleration of adaptive subgradient methods through frequent directions (FD), a widely-used matrix sketching technique. The state-of-the-art regret bound exhibits a _linear_ dependence on the dimensionality $d$, leading to unsatisfactory guarantees for high-dimensional problems. Additionally, it suffers from an $O(\tau^2 d)$ time complexity per round, which scales quadratically with the sketching size $\tau$. To overcome these issues, we first propose an algorithm named FTSL, achieving a tighter regret bound that is independent of the dimensionality. The key idea is to integrate FD with adaptive subgradient methods under _the primal-dual framework_ and add the cumulative discarded information of FD back. To reduce its time complexity, we further utilize fast FD to expedite FTSL, yielding a better complexity of $O(\tau d)$ while maintaining the same regret bound. Moreover, to mitigate the computational cost for optimization problems involving matrix variables (e.g., training neural networks), we adapt FD to Shampoo, a popular optimization algorithm that accounts for the structure of decision, and give a novel analysis under _the primal-dual framework_. Our proposed method obtains an improved dimension-free regret bound. Experimental results have verified the efficiency and effectiveness of our approaches.

Lay Summary:

Adaptive subgradient methods, despite their better theoretical guarantees, are often impractical for large-scale machine learning tasks due to their high computational cost. Existing works have attempted to accelerate these methods using sketching techniques. However, their performance are significantly worse than those of standard adaptive methods, limiting their applicability in practice.In this work, we first utilize a classic sketching technique to propose an algorithm, which achieves improved theoretical guarantee. We then enhance its computational efficiency without compromising performance. Furthermore, we extend our approach to optimization problems involving matrix variables (e.g., training neural networks). By integrating our sketching technique into the existing method, we reduce the computational overhead while attaining better theoretical guarantee.

Chat is not available.