Invited Talk
in
Workshop: Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)
Invited Keynote #2
Dan Alistarh
Abstract:
The last few years have seen an explosion of interest in Ai efficiency. One of the holy grails of the area has been training and inferencing models in end-to-end low-precision, for instance by leveraging the quantized matrix multiplication support on modern GPUs. In this talk, I will present some of our lab’s recent work on this topic, investigating low-precision training of LLMs. Specifically, I will cover a new state-of-the-art algorithm for quantized training called QuEST, discuss the limits of current approaches characterized via scaling laws, and about fast kernel support for low-precision training.
Chat is not available.