ICML LinCal: A Hybrid Attention Architecture Balancing Linear Computation and Global Calibration

Poster
in
Affinity Workshop: New In ML

LinCal: A Hybrid Attention Architecture Balancing Linear Computation and Global Calibration

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Recent advances in reasoning-focused language models have improved step-by-step problem-solving capabilities but introduced significant computational overhead due to increasingly decoding-intensive reasoning processes. In this work, we propose LinCal, a hybrid attention architecture that interleaves sliding window attention (SWA) layers with periodic full attention layers to balance efficiency and global information integration. LinCal performs local reasoning with SWA and uses full attention for global calibration, enabling models to scale reasoning effectively. We evaluate this approach across multiple model sizes and mathematical benchmarks, showing that it reduces decoding cost while maintaining competitive accuracy. Ablation studies further explore the impact of window size and context length on performance.

Chat is not available.

Poster in Affinity Workshop: New In ML

LinCal: A Hybrid Attention Architecture Balancing Linear Computation and Global Calibration

Poster
in
Affinity Workshop: New In ML