Skip to yearly menu bar Skip to main content


Poster
in
Workshop: CODEML: Championing Open-source DEvelopment in Machine Learning

TorchAO: PyTorch-Native Training-to-Serving Model Optimization

Andrew Or · Apurva Jain · Daniel Vega-Myhre · Jesse Cai · Charles Hernandez · Zhenrui Zhang · Driss Guessous · Vasiliy Kuznetsov · Christian Puhrsch · Mark Saroufim · Supriya Rao

[ ] [ Project Page ]
Fri 18 Jul 2:15 p.m. PDT — 3 p.m. PDT

Abstract:

We present TorchAO, a PyTorch-native model optimization framework leveraging quantization and sparsity to provide an end-to-end, training-to-serving workflow for AI models. TorchAO supports a variety of popular model optimization techniques, including FP8 quantized training, quantization-aware training (QAT), post-training quantization (PTQ), and 2:4 sparsity, and leverages a novel tensor subclass abstraction to represent a variety of widely-used, backend agnostic low precision data types, including INT4, INT8, FP8, MXFP4, MXFP6, and MXFP8.TorchAO integrates closely with the broader ecosystem at each step of the model optimization pipeline, from pre-training (TorchTitan) to finetuning (TorchTune) to serving (vLLM, SGLang, ExecuTorch), connecting an otherwise fragmented space in a single, unified workflow.Having enabled recent launches of the quantized Llama 3.2 1B/3B and LlamaGuard3-8B models, TorchAO is production ready and fully open-source at https://github.com/pytorch/ao/.

Chat is not available.