Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)

DiffusionBlocks: Continuous-Time Blockwise Training Through Score-Based Diffusion Models

Makoto Shing · Takuya Akiba

[ ] [ Project Page ]
Fri 18 Jul 3 p.m. PDT — 3:45 p.m. PDT

Abstract: Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks, limiting accessibility to state-of-the-art AI research. We propose $\textit{{D}iffusion{B}locks}$, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process. By partitioning the network into independently trainable blocks and optimizing noise level assignments based on equal cumulative probability mass, our approach achieves both superior memory efficiency and improved performance compared to traditional backpropagation. Experiments on image generation and language modeling tasks demonstrate 4$\times$ memory reduction during training while maintaining or improving performance. DiffusionBlocks provides a promising pathway for democratizing access to large-scale neural network training with limited computational resources.

Chat is not available.