Invited Talk
in
Workshop: Tokenization Workshop (TokShop)
Beat them? Join them? Fix them? Tokenization Research in a Downstream World
Yuval Pinter
Despite the crucial position of tokenization in the NLP pipeline, as well as some recent large-scale attention it has been receiving and a growing scientific community, it appears that researchers and users of LLMs are not keen on adapting and incorporating innovation from the tokenization literature into major systems. In this bird's-eye-view talk, I will classify approaches in technical research into those seeking to innovate within the confines of current practice ("join them"), those seeking to break through these confines ("beat them"), and those that embrace a restorative philosophy ("fix them"). Drawing from previous examples of such contrasts in NLP history and the ways they played out, I will offer my recommendation for a way forward, hopefully opening up a meaningful discussion for this workshop and beyond. The talk will feature a wide variety of research artifacts, including work from my lab, covering algorithm development, evaluation studies, analysis, and theoretical contributions.