Workshop
ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
Tri Dao · Daniel Y Fu · Max Ryabinin · Daniel Hesslow · Simran Arora · Songlin Yang · Songlin Yang · Dan Biderman · Beidi Chen · Azalia Mirhoseini · Percy Liang
East Exhibition Hall A
Sat 19 Jul, 8:30 a.m. PDT
As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities, unlocking new AI applications. These new capabilities have led to a paradigm shift: large foundation models have become predominant in natural language processing and are growing increasingly common in computer vision, audio processing and even robotics. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.
In response to these challenges, diverse research directions have spawned promising works: (1) training and inference either at large scale or in resource-constrained scenarios (e.g., with higher network latency and lower bandwidth, in a collaborative manner across a fleet of contributed devices, or with a single GPU); (2) large-scale distributed training approaches, such as 3D parallelism and sharding; and (3) deep system optimizations, with custom languages such as TVM and Triton. These novel interdisciplinary research directions directly shape and impact the trajectory of research across machine learning.
Accordingly, these emerging lines of research are increasingly relevant to machine learning researchers. Indeed, researchers are key stakeholders: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models (e.g., recent award-winning papers at ICML and NeurIPS); on the other hand, novel research findings may be best demonstrated at scale --- which may require training models as efficiently as possible to make the best use of available resources.
The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. This would be the third installment of the ES-FoMo workshop at ICML. This year, we are bringing further focus on two trends observed in 2024 and early 2025: (1) test-time compute, popularized by OpenAI o1 and DeepSeek r1, and (2) the mergence of new modeling paradigms and modalities such as real-time video and decentralized training. We look forward to continuing to grow this community at ICML 2025.
Schedule
Sat 8:30 a.m. - 8:40 a.m.
|
Opening Remarks
|
🔗 |
Sat 8:40 a.m. - 9:10 a.m.
|
Hagay Lupesko: Zero to 50 ExaFLOPS in under a year - lessons from the trenches
(
Invited Talk
)
>
|
Hagay Lupesko 🔗 |
Sat 9:10 a.m. - 9:40 a.m.
|
Wanchao Liang: TorchTitan
(
Invited Talk
)
>
|
🔗 |
Sat 9:40 a.m. - 10:00 a.m.
|
Break
|
🔗 |
Sat 10:30 a.m. - 10:45 a.m.
|
FPTQuant: Function-Preserving Transforms for LLM Quantization ( Oral ) > link | Boris van Breugel · Yelysei Bondarenko · Paul Whatmough · Markus Nagel 🔗 |
Sat 10:45 a.m. - 11:00 a.m.
|
Cartridges: Lightweight and general-purpose long context representations via self-study ( Oral ) > link | Sabri Eyuboglu · Ryan Ehrlich · Simran Arora · Neel Guha · Dylan Zinsley · Emily Liu · Atri Rudra · James Zou · Azalia Mirhoseini · Christopher Re 🔗 |
Sat 11:00 a.m. - 11:15 a.m.
|
zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression ( Oral ) > link | Saibo Geng · Nathan Thomas Elian Ranchin · Yunzhen Yao · Maxime Peyrard · Chris Wendler · Michael Gastpar · Robert West 🔗 |
Sat 11:15 a.m. - 11:30 a.m.
|
Spotlight Lightning Talks
|
🔗 |
Sat 11:30 a.m. - 1:00 p.m.
|
Lunch break
|
🔗 |
Sat 1:00 p.m. - 2:30 p.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 2:30 p.m. - 3:00 p.m.
|
Avanika Narayan: Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
(
Invited Talk
)
>
|
Avanika Narayan 🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Break
|
🔗 |
Sat 3:30 p.m. - 3:45 p.m.
|
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture ( Oral ) > link | Shuchen Xue · Tianyu Xie · Tianyang Hu · Zijin Feng · Jiacheng Sun · Kenji Kawaguchi · Zhenguo Li · Zhi-Ming Ma 🔗 |
Sat 3:45 p.m. - 4:00 p.m.
|
Hardware-Efficient Attention for Fast Decoding ( Oral ) > link | Ted Zadouri · Hubert Strauss · Tri Dao 🔗 |
Sat 4:00 p.m. - 4:30 p.m.
|
Zachary Charles Invited Talk
(
Invited Talk
)
>
|
🔗 |
Sat 4:30 p.m. - 5:00 p.m.
|
Albert Gu: H-Nets
|
🔗 |
Sat 5:00 p.m. - 5:10 p.m.
|
Closing Remarks / Awards
|
🔗 |
-
|
Radio: Rate–Distortion Optimization for Large Language Model Compression ( Poster ) > link | Sean I. Young 🔗 |
-
|
Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models ( Poster ) > link |
12 presentersZhou Jiecheng · DING TANG · Rong Fu · Boni Hu · Haoran Xu · Yi Wang · zhongling su · Liang Liu · PeiZhilin · Hengjie Li · Xingcheng ZHANG · Weiming Zhang |
-
|
BlockBPE: Parallel BPE Tokenization ( Poster ) > link | Amos You 🔗 |
-
|
The Road Not Taken: Hindsight Exploration for LLMs in Multi-Turn RL ( Poster ) > link | Yuki (Huaxiaoyue) Wang · Sanjiban Choudhury 🔗 |
-
|
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference ( Poster ) > link | Sihan Chen · Dan Zhao · Jongwoo Ko · Colby Banbury · HUIPING ZHUANG · Luming Liang · Tianyi Chen 🔗 |
-
|
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis ( Poster ) > link | Gholamali Aminian · Idan Shenfeld · Amir R. Asadi · Ahmad Beirami · Youssef Mroueh 🔗 |
-
|
SPECS: Faster Test-Time Scaling through Speculative Drafts ( Poster ) > link | Mert Cemri · Nived Rajaraman · Rishabh Tiwari · Xiaoxuan Liu · Kurt Keutzer · Ion Stoica · Kannan Ramchandran · Ahmad Beirami · Ziteng Sun 🔗 |
-
|
Next-Token Prediction Should be Ambiguity-Sensitive : A Meta-Learing Perspective ( Poster ) > link | Léo Gagnon · Eric Elmoznino · Sarthak Mittal · Tom Marty · Tejas Kasetty · Dhanya Sridhar · Guillaume Lajoie 🔗 |
-
|
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models ( Poster ) > link | Mugilan Ganesan · Shane Segal · Ankur Aggarwal · Nish Sinnadurai · Sean Lie · Vithursan Thangarasa 🔗 |
-
|
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach ( Poster ) > link | Ayeong Lee · Ethan Che · Tianyi Peng 🔗 |
-
|
MTraining: Efficient Distributed Training for Ultra-Long Contexts via Dynamic Sparse Attention ( Poster ) > link | Wenxuan Li · Chengruidong Zhang · Huiqiang Jiang · Yucheng Li · Yuqing Yang · Lili Qiu 🔗 |
-
|
How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need? ( Poster ) > link | Tuan Tran · Duy Nguyen · Hoai-Chau Tran · Michael Barz · Khoa Doan · Roger Wattenhofer · Vien Ngo · Mathias Niepert · Daniel Sonntag · Paul Swoboda 🔗 |
-
|
pLSTM: parallelizable Linear Source Transition Mark networks ( Poster ) > link | Korbinian Pöppel · Richard Freinschlag · Thomas Schmied · Wei Lin · Sepp Hochreiter 🔗 |
-
|
Training Language Models to Reason Efficiently ( Poster ) > link | Daman Arora · Andrea Zanette 🔗 |
-
|
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning ( Poster ) > link | Siyan Zhao · Devaansh Gupta · Qinqing Zheng · Aditya Grover 🔗 |
-
|
Q-Adam-mini: Memory-Efficient 8-bit Quantized Optimizer for Large Language Model Training ( Poster ) > link | Yizhou Han · Chaohao Yang · Congliang Chen · Xingjian Wang · Ruoyu Sun 🔗 |
-
|
Kevin: Multi-Turn RL for Generating CUDA Kernels ( Poster ) > link | Carlo Baronio · Pietro Marsella · Ben Pan · Simon Guo · Silas Alberti 🔗 |
-
|
Adaptive Self-improvement LLM Agentic System for ML Library Development ( Poster ) > link | Genghan Zhang · Weixin Liang · Olivia Hsu · Kunle Olukotun 🔗 |
-
|
Exploring Diffusion Transformer Designs via Grafting ( Poster ) > link |
12 presentersKeshigeyan Chandrasegaran · Michael Poli · Daniel Y Fu · Dongjun Kim · Lea Hadzic · Manling Li · Agrim Gupta · Stefano Massaroli · Azalia Mirhoseini · Juan Carlos Niebles · Stefano Ermon · Li Fei-Fei |
-
|
Mitigating Over-Smoothing in Mamba2 via Spectral Domain Analysis ( Poster ) > link | Seojin Kim · Yehjin Shin · Noseong Park 🔗 |
-
|
ConMeZO: Adaptive Directional Sampling for Gradient-Free Finetuning of Language Models ( Poster ) > link | Lejs Behric · Liang Zhang · Bingcong Li · Kiran Thekumparampil 🔗 |
-
|
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning ( Spotlight ) > link |
13 presentersWei Fu · Jiaxuan Gao · Shusheng Xu · Zhiyu Mei · Chen Zhu · Xujie Shen · Chuyi He · Guo Wei · Jun Mei · Jiashu Wang · Tongkai Yang · Binhang Yuan · Yi Wu |
-
|
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning ( Poster ) > link | Xuechen Zhang · Zijian Huang · Yingcong Li · Chenshun Ni · Jiasi Chen · Samet Oymak 🔗 |
-
|
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic ( Poster ) > link | Yuheng Wu · Jianwen Xie · Denghui Zhang · Zhaozhuo Xu 🔗 |
-
|
Large Reasoning Models Know How to Think Efficiently ( Poster ) > link | Zeyu Xing · Xing Li · Huiling Zhen · Xianzhi Yu · Mingxuan Yuan · Sinno Jialin Pan 🔗 |
-
|
QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models ( Poster ) > link | Tien-Yu Chi · Hung-Yueh Chiang · Diana Marculescu · Kai-Chiang Wu 🔗 |
-
|
Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs ( Poster ) > link |
11 presentersGuoliang HE · Youhe Jiang · Wencong Xiao · Jiang Kaihua · Shuguang Wang · Jun Wang · Du Zixian · Zhuo Jiang · Xinlei Zhang · Binhang Yuan · Eiko Yoneki |
-
|
PoLAR: Polar-Decomposed Low-Rank Adapter Representation ( Poster ) > link | Kai Lion · Liang Zhang · Bingcong Li · Niao He 🔗 |
-
|
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning ( Poster ) > link | Raghav Singhal · Kaustubh Ponkshe · Rohit Vartak · Lav Varshney · Praneeth Vepakomma 🔗 |
-
|
TinyServe: Query-Aware Cache Selection for Efficient LLM Inference ( Poster ) > link | Dong Liu · Yanxuan Yu 🔗 |
-
|
SpecCoT: Accelerating Chain-of-Thought Reasoning through Speculative Exploration ( Poster ) > link | Junhan Shi · Yijia Zhu · Zhenning Shi · Dan Zhao · Qing Li · Yong Jiang 🔗 |
-
|
$\mu$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts ( Poster ) > link | Toshiaki Koike-Akino · Jing Liu · Ye Wang 🔗 |
-
|
ThinkingViT: Nested Thinking Vision Transformer for Elastic Inference ( Poster ) > link | Ali Hojjat · Janek Haberer · Soeren Pirk · Olaf Landsiedel 🔗 |
-
|
Unified Scaling Laws for Compressed Representations ( Poster ) > link | Andrei Panferov · Alexandra Volkova · Ionut-Vlad Modoranu · Vage Egiazarian · Mher Safaryan · Dan Alistarh 🔗 |
-
|
Learning to Discover Abstractions for LLM Reasoning ( Spotlight ) > link | Yuxiao Qu · Anikait Singh · Yoonho Lee · Amrith Setlur · Russ Salakhutdinov · Chelsea Finn · Aviral Kumar 🔗 |
-
|
Byzantine-Resilient Zero-Order Optimization for Scalable Federated Fine-Tuning of Large Language Models ( Poster ) > link | Maximilian Egger · Mayank Bakshi · Rawad Bitar 🔗 |
-
|
Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression ( Poster ) > link | Michael R. Metel · Boxing Chen · Mehdi Rezagholizadeh 🔗 |
-
|
LATTICE: Learning to Efficiently Compress the Memory ( Poster ) > link | Mahdi Karami · Vahab Mirrokni 🔗 |
-
|
Outlier-Free Genomic Foundation Models for Resource-Efficient Training and Low-Bit Inference ( Poster ) > link | Chenghao Qiu · Haozheng Luo · Maojiang Su · Zhihan Zhou · Zoe Mehta · Guo Ye · Jerry Yao-Chieh Hu · Han Liu 🔗 |
-
|
Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers ( Poster ) > link | Sarthak Mittal · Divyat Mahajan · Guillaume Lajoie · Mohammad Pezeshki 🔗 |
-
|
PiKV: KV Cache Management System for MoE Architecture ( Poster ) > link | Dong Liu · Yanxuan Yu · Ben Lengerich · Ying Nian Wu · Xuhong Wang 🔗 |
-
|
VScan: A Two-Stage Visual Token Reduction Framework for Accelerating Large Vision-Language Models ( Poster ) > link | Ce Zhang · Kaixin Ma · Tianqing Fang · Wenhao Yu · Hongming ZHANG · Zhisong Zhang · Yaqi Xie · Katia Sycara · Haitao Mi · Dong Yu 🔗 |
-
|
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction ( Poster ) > link | Jang-Hyun Kim · Jinuk Kim · Sangwoo Kwon · Jae W. Lee · Sangdoo Yun · Hyun Oh Song 🔗 |
-
|
SD$^2$: Self-Distilled Sparse Drafters ( Poster ) > link | Mike Lasby · Nish Sinnadurai · Valavan Manohararajah · Sean Lie · Yani Ioannou · Vithursan Thangarasa 🔗 |
-
|
Autoregressive Language Modeling by Compressed Sequence Mixing ( Poster ) > link | Jatin Prakash · Aahlad Puli · Rajesh Ranganath 🔗 |
-
|
Proof-of-Concept for Private Local-to-Cloud LLM Chat via Trusted Execution Environments ( Poster ) > link | Avanika Narayan · Dan Biderman · Christopher Re 🔗 |
-
|
ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models ( Spotlight ) > link | Raghav Singhal · Kaustubh Ponkshe · Rohit Vartak · Praneeth Vepakomma 🔗 |
-
|
Think Clearly: Improving Reasoning via Redundant Token Pruning ( Poster ) > link | Daewon Choi · Jimin Lee · Jihoon Tack · Woomin Song · Saket Dingliwal · Sai Muralidhar Jayanthi · Bhavana Ganesh · Jinwoo Shin · Aram Galstyan · Sravan Babu Bodapati 🔗 |
-
|
One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning ( Poster ) > link | Ritesh Goru · Shanay Mehta · Prateek Jain 🔗 |
-
|
AWP: Activation-aware Weight Pruning and Quantization with Projected Gradient Descent ( Poster ) > link | Jing Liu · Toshiaki Koike-Akino · Ye Wang · Hassan Mansour · Matthew Brand 🔗 |
-
|
Ultra-Efficient and Effective Large Language Models with Multi-Boolean Architectures ( Poster ) > link | Ba-Hien Tran · Van Minh NGUYEN 🔗 |
-
|
Vision Language Model Distillation Using Partial Information Decomposition ( Poster ) > link | Stephen Liang 🔗 |
-
|
Compressing Large Language Models to Any Size Without Re-Computation ( Poster ) > link | Martin Genzel · Patrick Putzky · Pengfei Zhao · Sebastian Schulze · Mattes Mollenhauer · Robert Seidel · Stefan Dietzel · Thomas Wollmann 🔗 |
-
|
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs ( Spotlight ) > link | Jonathan Geuter · Youssef Mroueh · David Alvarez-Melis 🔗 |
-
|
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation ( Poster ) > link | Yehjin Shin · Seojin Kim · Noseong Park 🔗 |
-
|
Flexi-LoRA: Efficient LoRA Finetuning with Input-Adaptive Dynamic Ranks ( Poster ) > link | Zongqian Li · Yixuan Su · Han Zhou · Zihao Fu · Nigel Collier 🔗 |
-
|
Toward Dataset Distillation for Regression Problems ( Poster ) > link | Jamie Mahowald · Ravi Srinivasan · Zhangyang “Atlas” Wang 🔗 |
-
|
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile ( Poster ) > link | Hangliang Ding · Dacheng Li · Runlong Su · Peiyuan Zhang · Zhijie Deng · Ion Stoica · Hao Zhang 🔗 |
-
|
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching ( Poster ) > link | Guinan Su · Li Shen · Lu Yin · Shiwei Liu · Yanwu Yang · Jonas Geiping 🔗 |
-
|
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement ( Poster ) > link | Xuechen Zhang · Zijian Huang · Chenshun Ni · Ziyang Xiong · Jiasi Chen · Samet Oymak 🔗 |
-
|
Learning Adaptive Parallel Reasoning with Language Models ( Poster ) > link | Jiayi Pan · Xiuyu Li · Long (Tony) Lian · Charlie Snell · Yifei Zhou · Adam Yala · Trevor Darrell · Kurt Keutzer · Alane Suhr 🔗 |
-
|
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking ( Poster ) > link |
11 presentersSangmin Bae · Yujin Kim · Reza Bayat · Sungnyun Kim · Jiyoun Ha · Tal Schuster · Adam Fisch · Hrayr Harutyunyan · Ziwei Ji · Aaron Courville · Se-Young Yun |
-
|
SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling ( Poster ) > link | Yiqi Zhang · Huiqiang Jiang · Xufang Luo · Zhihe Yang · Chengruidong Zhang · Yifei Shen · Dongsheng Li · Yuqing Yang · Lili Qiu · Yang You 🔗 |
-
|
Speeding up Speculative Decoding via Sequential Approximate Verification ( Poster ) > link | Meiyu Zhong · Noel Teku · Ravi Tandon 🔗 |
-
|
Model Parallelism With Subnetwork Data Parallelism ( Poster ) > link | Vaibhav Singh · Zafir Khalid · Eugene Belilovsky · Edouard Oyallon 🔗 |
-
|
Towards Understanding Self-Pretraining for Sequence Classification ( Poster ) > link | Omar Coser · Antonio Orvieto 🔗 |
-
|
Is Visual Prompting the Right Setup for Knowledge Transfer in new Foundation Models? ( Poster ) > link | Niclas Hergenröther · Antonio Orvieto 🔗 |
-
|
CarbonGearRL: Precision-Elastic, Carbon-Aware Scheduling for Foundation-Model Training ( Poster ) > link | Thomas Chen 🔗 |
-
|
Multi-stream Sequence Learning ( Poster ) > link | Mohamed Elsayed · Rupam Mahmood 🔗 |
-
|
Cache Saver: A Modular Framework for Efficient, Affordable, and Reproducible LLM Inference ( Poster ) > link | Nearchos Potamitis · Lars Klein · Chongyang Xu · Attreyee Mukherjee · Bardia Mohammadi · Niket Tandon · Laurent Bindschaedler · Akhil Arora 🔗 |
-
|
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness ( Poster ) > link | Ahmad Mohammadshirazi · Pinaki Prasad Guha Neogi · Ser-Nam Lim · Rajiv Ramnath 🔗 |
-
|
Exchangeability in Neural Network Architectures and its Application to Dynamic Pruning ( Poster ) > link | Pu (Luke) Yi · Tianlang Chen · Yifan Yang · Sara Achour 🔗 |
-
|
TORCHSIM: High Fidelity Runtime and Memory Estimation for Distributed Training ( Poster ) > link |
11 presentersSanket Jayant Purandare · Emma Yang · Andrew Zhao · Qitong Wang · Wei Feng · Alban Desmaison · Andrew Gu · Tianyu Liu · Less Wright · Gokul Nadathur · Stratos Idreos |
-
|
Zero-Shot Conversion to Monarch-Structured Attention ( Poster ) > link | Can Yaras · Alec Xu · Pierre Abillama · Changwoo Lee · Laura Balzano 🔗 |
-
|
Towards Large Scale Training on Apple Silicon ( Poster ) > link | Tycho van der Ouderaa · Mohamed Baioumy · Matt Beton · Seth Howes · Gelu Vrabie · Alex Cheema 🔗 |
-
|
Language System: A Lightweight Ranking Framework for Language Models ( Poster ) > link | Chenheng Zhang · Tianqi Du · Jizhe Zhang · Mingqing Xiao · Yifei Wang · Yisen Wang · Zhouchen Lin 🔗 |
-
|
Foreign Sparse Attention: Effective Distillation into Sparse Attention ( Poster ) > link | Vijaykaarti Sundarapandiyan · Tom Goldstein · Ashwinee Panda 🔗 |
-
|
MatMuls are Enough for Efficient and Performant Linear-Time Attention ( Poster ) > link | Andrew Argatkiny · Ilya Makarov 🔗 |
-
|
Tail-Optimized Caching for LLM Inference ( Poster ) > link | Wenxin Zhang · Yueying Li · Tianyi Peng · Ciamac Moallemi 🔗 |
-
|
PoTPTQ: A Two-step Power-of-Two Post-training for LLMs ( Poster ) > link | Xinyu Wang · Vahid Partovi Nia · Peng Lu · Jerry Huang · Xiao-Wen Chang · Boxing Chen · Yufei Cui 🔗 |
-
|
A Minimalist Optimizer Design for LLM Pretraining ( Poster ) > link | Athanasios Glentis · Jiaxiang Li · Andi Han · Mingyi Hong 🔗 |
-
|
CoDM: A Co-design Framework for Efficient Sparse Diffusion Models ( Poster ) > link | Xiaolong Wu · Xiang Gao · Xiyun Song · Zongfang Lin · Heather Yu · Xianfeng GU 🔗 |
-
|
Balancing LoRA Performance and Efficiency with Simple Shard Sharing ( Poster ) > link | Jiale Kang · Qingyu Yin 🔗 |
-
|
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts ( Poster ) > link | Zeman Li · Yuan Deng · Peilin Zhong · Meisam Razaviyayn · Vahab Mirrokni 🔗 |
-
|
Steering LLM Reasoning Through Bias-Only Adaptation ( Poster ) > link | Viacheslav Sinii · Alexey Gorbatovski · Artem Cherepanov · Boris Shaposhnikov · Nikita Balagansky · Daniil Gavrilov 🔗 |
-
|
InterLoRA: An Adaptive LoRA Structure Based on The Mechanistic Interpretability of Transformer ( Poster ) > link | Jihao Gu · Zelin Wang · Yibo Zhang · Ping Gong · Zhisong Bie 🔗 |
-
|
LOGAH: Initialize Large Transformers via Small Graph HyperNetworks ( Poster ) > link | xinyu Zhou · Boris Knyazev · Alexia Jolicoeur-Martineau · Jie Fu 🔗 |
-
|
Mu-Parametrization for Mixture of Experts ( Poster ) > link |
11 presentersJan Małaśnicki · Kamil Ciebiera · Mateusz Boruń · Maciej Pióro · Jan Ludziejewski · Maciej Stefaniak · Michał Krutul · Sebastian Jaszczur · Marek Cygan · Kamil Adamczewski · Jakub Krajewski |
-
|
Resource-efficient Inference with Foundation Model Programs ( Spotlight ) > link | Lunyiu Nie · Zhimin Ding · Kevin Yu · Marco Cheung · Chris Jermaine · Swarat Chaudhuri 🔗 |
-
|
Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models ( Poster ) > link | Aneesh Muppidi · Katrina Brown · Rana Shahout 🔗 |
-
|
Training-Free Semantic Deferrals for Open-Ended LLM Cascades ( Poster ) > link | Duncan Soiffer · Steven Kolawole · Virginia Smith 🔗 |
-
|
Tensor Product Attention Is All You Need ( Poster ) > link | Yifan Zhang · Yifeng Liu · Huizhuo Yuan · Zhen Qin · Yang Yuan · Quanquan Gu · Andrew Yao 🔗 |
-
|
Training-free LLM Verification via Recycling Few-shot Examples ( Spotlight ) > link | Dongseok Lee · JIMYUNG HONG · Dongyoung Kim · Jaehyung Kim 🔗 |
-
|
Efficient and Accurate KV-cache Management for Long-Sequence LLMs ( Poster ) > link | Yuzhen Mao · Qitong Wang · Martin Ester · Ke Li 🔗 |
-
|
Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching ( Poster ) > link | Qizheng Zhang · Michael Wornow · Kunle Olukotun 🔗 |
-
|
FrugalRAG: Learning to retrieve and reason for multi-hop QA ( Poster ) > link | Abhinav Java · Srivathsan Koundinyan · Nagarajan Natarajan · Amit Sharma 🔗 |
-
|
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations ( Poster ) > link | Marco Federici · Riccardo Del Chiaro · Boris van Breugel · Paul Whatmough · Markus Nagel 🔗 |
-
|
Private Zeroth-Order Optimization with Public Data ( Poster ) > link | Xuchen Gong · Tian Li 🔗 |
-
|
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization ( Poster ) > link | Martin Andrews · Sam Witteveen 🔗 |
-
|
Thinformer: Guaranteed Attention Approximation via Low-Rank Thinning ( Poster ) > link | Annabelle Carrell · Albert Gong · Abhishek Shetty · Raaz Dwivedi · Lester Mackey 🔗 |
-
|
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training ( Poster ) > link | Vaibhav Singh · Paul Janson · Paria Mehrbod · Adam Ibrahim · Irina Rish · Eugene Belilovsky · Benjamin Thérien 🔗 |
-
|
A Survey on Prompt Tuning ( Poster ) > link | Zongqian Li · Yixuan Su · Nigel Collier 🔗 |
-
|
Continuous Autoregressive Generation with Mixture of Gaussians ( Poster ) > link | Alex Quach · Johnson Tsun-Hsuan Wang · Ramin Hasani · Mathias Lechner · Alexander Amini 🔗 |
-
|
Towards Understanding Orthogonalization in Muon ( Poster ) > link | Valentyn Boreiko · Zhiqi Bu · Sheng Zha 🔗 |
-
|
MuLoCo: Muon is a practical inner optimizer for DiLoCo ( Poster ) > link | Benjamin Thérien · Xiaolong Huang · Irina Rish · Eugene Belilovsky 🔗 |
-
|
An Efficient Row-Based Sparse Fine-Tuning with Low Quantization Error ( Poster ) > link | Cen-Jhih Li · Aditya Bhaskara 🔗 |
-
|
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention ( Poster ) > link | Zhihao Zhan · Jianan Zhao · Zhaocheng Zhu · Jian Tang 🔗 |
-
|
Multi-student Diffusion Distillation for Better One-step Generators ( Poster ) > link | Yanke Song · Jonathan Lorraine · Weili Nie · Karsten Kreis · James Lucas 🔗 |
-
|
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification ( Poster ) > link | Penghui Yang · Cunxiao Du · Fengzhuo Zhang · Haonan Wang · Tianyu Pang · Chao Du · Bo An 🔗 |
-
|
Early Attentive Sparsification Accelerates Neural Speech Transcription ( Poster ) > link | Zifei Xu · Sayeh Sharify · Hesham Mostafa · Tristan Webb · Wanzin Yazar · Xin Wang 🔗 |
-
|
Adaptive Backbone Selection for Efficient and Real-Time Vision Inference ( Poster ) > link | Syed Amir Hamza · Alexander Jesser 🔗 |
-
|
JSONSchemaBench: Evaluating Constrained Decoding with LLMs on Efficiency, Coverage and Quality ( Poster ) > link | Saibo Geng · Hudson Cooper · Michal Moskal · Samuel Jenkins · Julian Berman · Nathan Thomas Elian Ranchin · Robert West · Eric Horvitz · Harsha Nori 🔗 |
-
|
Unbounded Memory and Consistent Imagination via Unified Diffusion–SSM World Models ( Poster ) > link | Jia-Hua Lee · Bor Jiun Lin · Wei-Fang Sun · Chun-Yi Lee 🔗 |
-
|
LoRA Merging with SVD: Understanding Interference and Preserving Performance ( Poster ) > link | Dennis Tang · Prateek Yadav · Yi-Lin Sung · Jaehong Yoon · Mohit Bansal 🔗 |
-
|
SageAttention2++: A More Efficient Implementation of SageAttention2 ( Poster ) > link | Jintao Zhang · Xiaoming Xu · Jia wei · Haofeng Huang · Pengle Zhang · Chendong Xiang · Jun Zhu · Jianfei Chen 🔗 |
-
|
Mamba Drafters for Speculative Decoding ( Poster ) > link |
12 presentersDaewon Choi · Seunghyuk Oh · Saket Dingliwal · Jihoon Tack · Kyuyoung Kim · Woomin Song · Seojin Kim · Insu Han · Jinwoo Shin · Aram Galstyan · Shubham Katiyar · Sravan Babu Bodapati |
-
|
Efficient Temporal Tokenization for Mobility Prediction with Large Language Models ( Poster ) > link | Haoyu He · Haozheng Luo · Yan Chen · Qi Wang 🔗 |
-
|
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach ( Spotlight ) > link | Jonas Geiping · Sean McLeish · Neel Jain · John Kirchenbauer · Siddharth Singh · Brian Bartoldson · Bhavya Kailkhura · Abhinav Bhatele · Tom Goldstein 🔗 |
-
|
Quartet: Native FP4 Training Can Be Optimal for Large Language Models ( Spotlight ) > link | Roberto Castro · Andrei Panferov · Rush Tabesh · Jiale Chen · Oliver Sieberling · Mahdi Nikdan · Saleh Ashkboos · Dan Alistarh 🔗 |
-
|
Partition Generative Modeling: Masked Modeling Without Masks ( Poster ) > link | Justin Deschenaux · Lan Tran · Caglar Gulcehre 🔗 |
-
|
Optimal Formats for Weight Quantisation ( Poster ) > link | Douglas Orr · Luka Ribar · Carlo Luschi 🔗 |
-
|
Shrinking the Generation-Verification Gap with Weak Verifiers ( Poster ) > link |
12 presentersJon Saad-Falcon · Estefany Kelly Buchanan · Mayee Chen · Tzu-Heng Huang · Brendan McLaughlin · Tanvir Bhathal · Shang Zhu · Ben Athiwaratkun · Frederic Sala · Scott Linderman · Azalia Mirhoseini · Christopher Re |
-
|
SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression ( Poster ) > link | Yiqiao Jin · Kartik Sharma · Vineeth Rakesh · Yingtong Dou · Menghai Pan · Mahashweta Das · Srijan Kumar 🔗 |
-
|
Tiny Reward Models ( Poster ) > link | Sarah Pan 🔗 |
-
|
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs ( Poster ) > link | Beomsik Cho · Jaehyung Kim 🔗 |
-
|
Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas ( Spotlight ) > link | Austin Silveria · Soham Govande · Daniel Y Fu 🔗 |
-
|
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation ( Poster ) > link |
14 presentersLiliang Ren · Congcong Chen · Haoran Xu · Young Jin Kim · Adam Atkinson · Zheng Zhan · Jiankai Sun · Baolin Peng · Liyuan Liu · Shuohang Wang · Hao Cheng · Jianfeng Gao · Weizhu Chen · Yelong Shen |
-
|
Context-lite Multi-turn Reinforcement Learning for LLM Agents ( Poster ) > link | Chen · Jiayu Chen · Hao Zhu · Jeff Schneider 🔗 |
-
|
Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection ( Poster ) > link | Shivam Chandhok · Qian Yang · Oscar Mañas · Kanishk Jain · Aishwarya Agrawal · Leonid Sigal 🔗 |
-
|
LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs ( Poster ) > link | Reza Arabpour · Haitz Sáez de Ocáriz Borde · Anastasis Kratsios 🔗 |
-
|
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling ( Poster ) > link | Mónika Farsang · Ramin Hasani · Radu Grosu 🔗 |
-
|
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper ( Poster ) > link | zhongling su · Rong Fu · Weihan Cao · Jianfei Gao · Minxi Jin · PeiZhilin · Hui Wang 🔗 |
-
|
Privacy Isn’t Free: Benchmarking the Systems Cost of Privacy-Preserving ML ( Poster ) > link | Nnaemeka Obiefuna · Samuel Oyeneye · Similoluwa Odunaiya · Iremide Oyelaja · Steven Kolawole 🔗 |
-
|
Demystifying Language Model Forgetting with Low-rank Example Associations ( Poster ) > link | Xisen Jin · Xiang Ren 🔗 |
-
|
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning ( Poster ) > link | Zongqian Li · Yixuan Su · Nigel Collier 🔗 |
-
|
Accelerating Linear Attention Design by Unifying Forward & Backward Propagation ( Poster ) > link | Zhen Qin · Xuyang Shen · Dong Li · Yiran Zhong 🔗 |
-
|
Accelerated Test-Time Scaling with Model-Free Speculative Sampling ( Poster ) > link | Woomin Song · Saket Dingliwal · Sai Muralidhar Jayanthi · Bhavana Ganesh · Jinwoo Shin · Aram Galstyan · Sravan Babu Bodapati 🔗 |
-
|
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers ( Poster ) > link | Woomin Song · Sai Muralidhar Jayanthi · Srikanth Ronanki · Kanthashree Sathyendra · Jinwoo Shin · Aram Galstyan · Shubham Katiyar · Sravan Babu Bodapati 🔗 |
-
|
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts ( Poster ) > link | Haizhong Zheng · Yang Zhou · Brian Bartoldson · Bhavya Kailkhura · Fan Lai · Jiawei Zhao · Beidi Chen 🔗 |
-
|
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs ( Poster ) > link |
12 presentersRaghavv Goel · Sudhanshu Agrawal · Mukul Gagrani · Junyoung Park · Yifan Zao · He Zhang · Tian Liu · Yiping Yang · Xin Yuan · Jiuyuan Lu · Christopher Lott · Mingu Lee |
-
|
Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights ( Poster ) > link | Jakub Krajewski · Marcin Chochowski · Daniel Korzekwa 🔗 |