Poster
Scaling Inference-Efficient Language Models
Song Bian · Minghao Yan · Shivaram Venkataraman
East Exhibition Hall A-B #E-2906
Computers need large AI models to solve complex tasks, but these models are often slow and expensive to use. We asked: can we build models that are both fast to run and accurate? To answer this, we studied how different design choices—such as the depth or width of a model—affect its efficiency. We trained 63 models of varying sizes and amounts of training data to discover patterns that link size, training, and speed. These insights enable us to build a new model, Morph-1B, which is up to 1.8 times faster at making predictions while still performing well on real-world benchmarks. This shows that smarter model designs can help us get the best of both worlds: accuracy and speed. Our work helps AI developers create more sustainable and accessible systems without sacrificing performance.