Skip to yearly menu bar Skip to main content


Poster

Learning to Route LLMs with Confidence Tokens

Yu-Neng Chuang · Prathusha Sarma · Parikshit Gopalan · John Boccio · Sara Bolouki · Xia Hu · Helen Zhou

East Exhibition Hall A-B #E-1208
[ ] [ ]
Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Large language models (LLMs) have demonstrated impressive performance on several tasks and are increasingly deployed in real-world applications. However, especially in high-stakes settings, it becomes vital to know when the output of an LLM may be unreliable. Depending on whether an answer is trustworthy, a system can then choose to route the question to another expert, or otherwise fall back on a safe default behavior. In this work, we study the extent to which LLMs can reliably indicate confidence in their answers, and how this notion of confidence can translate into downstream accuracy gains. We propose Self-Reflection with Error-based Feedback (Self-REF), a lightweight training strategy to teach LLMs to express confidence in whether their answers are correct in a reliable manner. Self-REF introduces confidence tokens into the LLM, from which a confidence score can be extracted. Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.

Lay Summary:

Large language models (LLMs) often give answers without indicating how confident they are. This can be risky in situations where wrong answers have serious consequences. We introduce a method called Self-REF that helps LLMs signal when their answers might be unreliable by assigning a confidence score. We demonstrate that this confidence score is valuable for learning when to route a query to another more powerful LLM, or alternatively reject the query. Compared to existing approaches on four datasets and two base LLMs, Self-REF performs the best on both LLM routing and LLM rejection learning tasks.

Chat is not available.