Poster
Regress, Don't Guess: A Regression-like Loss on Number Tokens for Language Models
Jonas Zausinger · Lars Pennig · Anamarija Kozina · Sean Sdahl · Julian Sikora · Adrian Dendorfer · Timofey Kuznetsov · Mohamad Hagog · Nina Wiedemann · Kacper Chlodny · Vincent Limbach · Anna Ketteler · Thorben Prein · Vishwa Singh · Michael Danziger · Jannis Born
East Exhibition Hall A-B #E-2612
Large language models are great at writing documents and answering questions, but when it comes to math, they often make mistakes. A key reason is that these models do not have built-in understanding for how numbers relate to one another. For example, they treat the numbers “2” and “3” as just different words, not as digits that are close together.To address this, we developed a new way to train language models by giving the models additional feedback on numbers. Our method, called Number Token Loss (NTL), explicitly teaches models to understand that “2” and “3” are numerically close, and “2” and “9” are farther apart. It analyzes how much the model’s predictions need to shift to match the correct value, based on the numerical distance between the predicted number probabilities and true values.We tested this on math problems and found that it consistently improves performance. Importantly, our method can be used by any Language Model, is fast to compute and easy to integrate.