Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Tokenization Workshop (TokShop)

Beat them? Join them? Fix them? Tokenization Research in a Downstream World

Yuval Pinter

[ ]
Fri 18 Jul 9:10 a.m. PDT — 10 a.m. PDT

Abstract:

Despite the crucial position of tokenization in the NLP pipeline, as well as some recent large-scale attention it has been receiving and a growing scientific community, it appears that researchers and users of LLMs are not keen on adapting and incorporating innovation from the tokenization literature into major systems. In this bird's-eye-view talk, I will classify approaches in technical research into those seeking to innovate within the confines of current practice ("join them"), those seeking to break through these confines ("beat them"), and those that embrace a restorative philosophy ("fix them"). Drawing from previous examples of such contrasts in NLP history and the ways they played out, I will offer my recommendation for a way forward, hopefully opening up a meaningful discussion for this workshop and beyond. The talk will feature a wide variety of research artifacts, including work from my lab, covering algorithm development, evaluation studies, analysis, and theoretical contributions.

Chat is not available.