ICML Scaling Textual Gradients via Sampling-Based Momentum

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

Scaling Textual Gradients via Sampling-Based Momentum

Zixin Ding · Junyuan Hong · Jiachen (Tianhao) Wang · Zinan Lin · Zhangyang “Atlas” Wang · Yuxin Chen

[ Abstract ] [ Project Page ]

[ OpenReview]

Fri 18 Jul 2:30 p.m. PDT — 3:15 p.m. PDT

Abstract:

As prompts become central to Large Language Models (LLMs), optimizing them is vital. Textual Stochastic Gradient Descent (TSGD) offers a data-driven approach by iteratively refining prompts using LLM-suggested updates over minibatches. We empirically show that increasing training data initially improves but can later degrade TSGD's performance across NLP tasks, while also raising computational costs. To address this, we propose Textual Stochastic Gradient Descent with Momentum (TSGD-M)—a scalable method that reweights prompt sampling based on past batches. Evaluated on 9 NLP tasks across three domains, TSGD-M outperforms TSGD baselines for most tasks and reduces performance variance.

Chat is not available.

Poster in Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

Scaling Textual Gradients via Sampling-Based Momentum

Zixin Ding · Junyuan Hong · Jiachen (Tianhao) Wang · Zinan Lin · Zhangyang “Atlas” Wang · Yuxin Chen

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)