ICML Poster Constrained Online Convex Optimization with Polyak Feasibility Steps

Poster

Constrained Online Convex Optimization with Polyak Feasibility Steps

Spencer Hutchinson · Mahnoosh Alizadeh

West Exhibition Hall B2-B3 #W-910

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: In this work, we study online convex optimization with a fixed constraint function $g : \mathbb{R}^d \rightarrow \mathbb{R}$. Prior work on this problem has shown $O(\sqrt{T})$ regret and cumulative constraint satisfaction $\sum_{t=1}^{T} g(x_t) \leq 0$, while only accessing the constraint value and subgradient at the played actions $g(x_t), \partial g(x_t)$. Using the same constraint information, we show a stronger guarantee of anytime constraint satisfaction $g(x_t) \leq 0 \ \forall t \in [T]$, and matching $O(\sqrt{T})$ regret guarantees. These contributions are thanks to our approach of using Polyak feasibility steps to ensure constraint satisfaction, without sacrificing regret. Specifically, after each step of online gradient descent, our algorithm applies a subgradient descent step on the constraint function where the step-size is chosen according to the celebrated Polyak step-size. We further validate this approach with numerical experiments.

Lay Summary:

When machine-learning algorithms are deployed in the real world, they must respect system constraints that specify what they may and may not do. For instance, a control algorithm in an autonomous vehicle must never steer the car into an obstacle. We study learning under such constraints through the highly general framework of online convex optimization (OCO), which covers a broad range of adaptive learning tasks.Our key contribution is an OCO algorithm that always stays within these safety limits while relying only on local information and limited computation. It does so by incorporating Polyak feasibility steps into every update—small, principled adjustments that pull each decision back toward the safe region whenever it drifts too close to the boundary. These adjustments let the algorithm learn almost as efficiently as in an unconstrained setting, making it relevant to a wide range of safety-critical applications.

Chat is not available.