ICML Unlocking the Potential of Extremely Low-Bit Sparse Transformers through Adaptive Multi-bit Supermasks and Random Weights

Poster
in
Workshop: Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)

Unlocking the Potential of Extremely Low-Bit Sparse Transformers through Adaptive Multi-bit Supermasks and Random Weights

Yasuyuki Okoshi · Hikari Otsuka · Junnosuke Suzuki · Daichi Fujiki · Masato Motomura

[ Abstract ] [ Project Page ]

[ OpenReview]

Fri 18 Jul 1 p.m. PDT — 1:45 p.m. PDT

Abstract:

We propose Adaptive Supermask (Ada-Sup), a scalable and efficient method for discovering high-quality multi-bit supermasks in an extended Strong Lottery Ticket framework. Building on this methods, we introduce TicketLLM, a Transformer-based model that combines pruning, quantization, and random weights to enable compact low-bit sparse representations. Experimental results show that Ada-Sup can find high quality supermasks with significantly reduced training cost in comparison to previous methods, both for binary and multi-bit supermask settings. Furthermore, TicketLLM outperforms BitNet b1.58 on a 1.3B parameter model with the same memory per connection, achieving 0.08 lower perplexity despite operating at a higher sparsity level (50\% vs. 33\%).These results demonstrate the potential of leveraging supermask and random weights as a practical and powerful alternative for building lightweight, scalable LLMs.

Chat is not available.

Poster in Workshop: Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)

Unlocking the Potential of Extremely Low-Bit Sparse Transformers through Adaptive Multi-bit Supermasks and Random Weights

Yasuyuki Okoshi · Hikari Otsuka · Junnosuke Suzuki · Daichi Fujiki · Masato Motomura

Poster
in
Workshop: Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)