Poster
in
Workshop: Tokenization Workshop (TokShop)
You Only Train Once: Efficient Tokenizer Selection for Arithmetic in Language Models
Mucong Ding · Sean McLeish · Kazem Meidani · Igor Melnyk · Nam Nguyen · C. Bayan Bruss · Furong Huang
Keywords: [ Tokenizer Selection ] [ Model-Input Co-optimization ] [ Tokenizer Design Space ] [ Arithmetic Reasoning ] [ Efficient Pretraining ]
Tokenization fundamentally shapes how language models perceive and process input, with substantial downstream effects---especially in tasks requiring symbolic or numerical precision. Yet, selecting an optimal tokenizer from a vast design space remains computationally prohibitive, typically requiring full-scale model training for each candidate. Focusing on arithmetic reasoning, we propose You Only Train Once (YOTO), a unified training framework that jointly optimizes the language model and a parameterized distribution over candidate tokenizers. By training a single model using a merged vocabulary and sampling tokenizations adaptively, YOTO enables efficient co-adaptation between model and tokenizer. Applied to arithmetic tasks, YOTO discovers high-performing number tokenizers while dramatically reducing evaluation cost. Our results highlight a promising path toward jointly optimizing tokenizers and models in a principled, scalable manner.