Sat 9:00 a.m. - 9:10 a.m.
|
Registration & Opening Remarks
(
Registration & Opening Remarks
)
>
|
🔗
|
Sat 9:10 a.m. - 9:55 a.m.
|
Beyond benchmarks: the case for spherical cows in LLM research
(
Invited Talk
)
>
|
Aditi Raghunathan
🔗
|
Sat 9:55 a.m. - 10:40 a.m.
|
Designing Efficient Attention: Insights from an Inference Perspective
(
Invited Talk
)
>
|
Tri Dao
🔗
|
Sat 10:40 a.m. - 11:45 a.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗
|
Sat 11:45 a.m. - 12:00 p.m.
|
Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning
(
Oral presentation
)
>
|
Xinyi Wang · Shawn Tan · Mingyu Jin · William Wang · Rameswar Panda · Yikang Shen
🔗
|
Sat 12:00 p.m. - 12:15 p.m.
|
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
(
Oral presentation
)
>
|
Hanlin Zhu · Shibo Hao · Zhiting Hu · Jiantao Jiao · Stuart Russell · Yuandong Tian
🔗
|
Sat 12:15 p.m. - 12:30 p.m.
|
Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models
(
Oral presentation
)
>
|
Tina Behnia · Puneesh Deora · Christos Thrampoulidis
🔗
|
Sat 1:30 p.m. - 2:15 p.m.
|
How Jailbreaking 1-Layer Transformers Taught us how to Steer LLMs
(
Invited Talk
)
>
|
Eric Wong
🔗
|
Sat 2:15 p.m. - 3:00 p.m.
|
The Art of Artificial Reasoning for Small Language Models
(
Invited Talk
)
>
|
Yejin Choi
🔗
|
Sat 3:00 p.m. - 3:15 p.m.
|
Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
(
Oral presentation
)
>
|
Freya Behrens · Lenka Zdeborova
🔗
|
Sat 3:15 p.m. - 3:30 p.m.
|
In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly
(
Oral presentation
)
>
|
Puneesh Deora · Bhavya Vasudeva · Tina Behnia · Christos Thrampoulidis
🔗
|
Sat 3:30 p.m. - 3:45 p.m.
|
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
(
Oral presentation
)
>
|
Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju
🔗
|
Sat 4:00 p.m. - 4:45 p.m.
|
Panel Discussion with Misha Belkin, Stella Biderman, Leonard Tang
(
Panel Discussion
)
>
|
🔗
|
Sat 4:45 p.m. - 6:00 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗
|
-
|
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
(
Poster
)
>
link
|
Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju
🔗
|
-
|
Understanding Attention Glitches with Threshold Relative Attention
(
Poster
)
>
link
|
Mattia Opper · Roland Fernandez · Paul Smolensky · Jianfeng Gao
🔗
|
-
|
Dynamic Low-Rank Training with Spectral Regularization: Achieving Robustness in Compressed Representations
(
Poster
)
>
link
|
Steffen Schotthöfer · Lexie Yang · Stefan Schnake
🔗
|
-
|
How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles
(
Poster
)
>
link
|
Vala Vakilian · Sadegh Mahdavi · Christos Thrampoulidis
🔗
|
-
|
Towards Understanding Self-Pretraining for Sequence Classification
(
Poster
)
>
link
|
Omar Coser · Antonio Orvieto
🔗
|
-
|
Review, Remask, Refine: Process-Guided Block Diffusion for Text Generation
(
Poster
)
>
link
|
Nikita Mounier · Parsa Idehpour
🔗
|
-
|
Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
(
Poster
)
>
link
|
Yize Zhao · Christos Thrampoulidis
🔗
|
-
|
TinyServe: Query-Aware Cache Selection for Efficient LLM Inference
(
Poster
)
>
link
|
Dong Liu · Yanxuan Yu
🔗
|
-
|
Transformers May Learn to Classify In-Context by Context-Adaptive Kernel Gradient Descent
(
Poster
)
>
link
|
Sara Dragutinović · Andrew Saxe · Aaditya Singh
🔗
|
-
|
Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning
(
Poster
)
>
link
|
Zachary Shinnick · Liangze Jiang · Hemanth Saratchandran · Anton Hengel · Damien Teney
🔗
|
-
|
Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
(
Poster
)
>
link
|
Jaeha Lee · Gio Huh · Ning Su · Tony YU
🔗
|
-
|
Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts
(
Poster
)
>
link
|
Rahul Raja · Arpita Vats
🔗
|
-
|
Koopman Autoencoders Learn Neural Representation Dynamics
(
Poster
)
>
link
|
Nishant Suresh Aswani · Saif Jabari
🔗
|
-
|
Is Visual Prompting the Right Setup for Knowledge Transfer in new Foundation Models?
(
Poster
)
>
link
|
Niclas Hergenröther · Antonio Orvieto
🔗
|
-
|
LiteByte: Efficient and Fast-Adapting MLPs for Online Byte-Level Prediction
(
Poster
)
>
link
|
Yu Mao · Yuyan Lin · Xue Liu · Chun Jason Xue
🔗
|
-
|
Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks
(
Poster
)
>
link
|
Shakir Yousefi · Andreas Plesner · Till Aczel · Roger Wattenhofer
🔗
|
-
|
An Empirical Investigation of Initialization Strategies for Kolmogorov–Arnold Networks
(
Poster
)
>
link
|
Spyros Rigas · Dhruv Verma · Georgios Alexandridis · Yixuan Wang
🔗
|
-
|
Performance Plateaus in Inference-Time Scaling for Text-to-Image Diffusion Without External Models
(
Poster
)
>
link
|
Changhyun Choi · Sungha Kim · H. Jin Kim
🔗
|
-
|
CaliPSo: Calibrated Predictive Models with Sharpness as Loss Function
(
Poster
)
>
link
|
Alexandre Capone · Kamron Zaidi · Tianyu Xu · Brian Yang · Geoff Pleiss · Jeff Schneider
🔗
|
-
|
Pruning Increases Orderedness in Weight-Tied Recurrent Computation
(
Poster
)
>
link
|
Yiding Song
🔗
|
-
|
Cross-Validation Error Dynamics in Smaller Datasets
(
Poster
)
>
link
|
Bethany Austhof · Lev Reyzin
🔗
|
-
|
Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs
(
Poster
)
>
link
|
Behnoush Khavari · Jayesh Khullar · Mehran Shakerinava · Jerry Huang · Siamak Ravanbakhsh · Sarath Chandar
🔗
|
-
|
Understanding How Chess-Playing Language Models Compute Linear Board Representations
(
Poster
)
>
link
|
Aaron Mei
🔗
|
-
|
Gradient descent in presence of extreme flatness and steepness
(
Poster
)
>
link
|
Dravyansh Sharma
🔗
|
-
|
Foundation Models on a Budget: Approximating Blocks in Large Vision Models
(
Poster
)
>
link
|
Irene Cannistraci · Simone Antonelli · Emanuele Palumbo · Thomas Sutter · Emanuele Rodola · Bastian Rieck · Julia Vogt
🔗
|
-
|
Encoding Domain Insights into Multi-modal Fusion: Improved Performance at the Cost of Robustness
(
Poster
)
>
link
|
Jackson Michaels · Sidong Zhang · Madalina Fiterau
🔗
|
-
|
Permutations as a testbed for studying the effect of input representations on learning
(
Poster
)
>
link
|
Sarah Scullen · Davis Brown · Robert Jasper · Henry Kvinge · Helen Jenne
🔗
|
-
|
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
(
Poster
)
>
link
|
Feijiang Han · Xiaodong Yu · Jianheng Tang · Qingyun Zeng · Licheng Guo · Lyle Ungar
🔗
|
-
|
SynDaCaTE: A Synthetic Dataset For Evaluating Part-Whole Hierarchical Inference
(
Poster
)
>
link
|
Jake Levi · Mark van der Wilk
🔗
|
-
|
Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
(
Poster
)
>
link
|
Valérie Costa · Thomas Fel · Ekdeep Singh Lubana · Bahareh Tolooshams · Demba Ba
🔗
|
-
|
Generative or Discriminative? Revisiting Text Classification in the Era of Transformers
(
Poster
)
>
link
|
Siva Rajesh Kasa · Sumegh Roychowdhury · Karan Gupta · Yaswanth Biruduraju · SANTHOSH KASA · Ashutosh Kumar · Pattisapu Priyatam · Arindam Bhattacharya · Shailendra Agarwal · Vijay huddar
🔗
|
-
|
Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
(
Poster
)
>
link
|
Freya Behrens · Lenka Zdeborová
🔗
|
-
|
From SGD to Spectra: A Theory of Neural Network Weight Dynamics
(
Poster
)
>
link
|
Brian Olsen · Sam Fatehmanesh · Frank Xiao · Adarsh Kumarappan · Anirudh Gajula
🔗
|
-
|
Efficient B-Tree Insertions Using Proximal Policy Optimization and Hierarchical Attention Models
(
Poster
)
>
link
|
Alexander Kastius · Nick Lechtenbörger · Felix Schulz · Johann Tast · Rainer Schlosser · Ralf Herbrich
🔗
|
-
|
Emergence, pretraining loss and associative recall: a toy model
(
Poster
)
>
link
|
Sultan Daniels · Dylan Davis · Dhruv Gautam · Wentinn Liao · Gireeja Ranade · Anant Sahai
🔗
|
-
|
Learning Gaussian Mixture Models via Transformer Measure Flows
(
Poster
)
>
link
|
Aleksandr Zimin · Anastasiia Kutakh · Yury Polyanskiy · Philippe Rigollet
🔗
|
-
|
Emergence of Hebbian Dynamics in Regularized Non-Local Learners
(
Poster
)
>
link
|
David Koplow · Tomaso A Poggio · Liu Ziyin
🔗
|
-
|
Decomposed Learning: An Avenue for Mitigating Grokking
(
Poster
)
>
link
|
Gabryel Mason-Williams · Israel Mason-Williams
🔗
|
-
|
In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly
(
Poster
)
>
link
|
Puneesh Deora · Bhavya Vasudeva · Tina Behnia · Christos Thrampoulidis
🔗
|
-
|
Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning
(
Poster
)
>
link
|
Xinyi Wang · Shawn Tan · Mingyu Jin · William Wang · Rameswar Panda · Yikang Shen
🔗
|
-
|
Approximate Message Passing on General Factor Graphs using Shallow Neural Networks
(
Poster
)
>
link
|
Leonhard Hennicke · Jan Lemcke · Rainer Schlosser · Ralf Herbrich
🔗
|
-
|
Optimizing Explanations: Nuances Matter When Evaluation Metrics Become Loss Functions
(
Poster
)
>
link
|
Jonas Raedler · Hiwot Belay Tadesse · Weiwei Pan · Finale Doshi-Velez
🔗
|
-
|
Exploring Diverse Solutions for Underdetermined Problems
(
Poster
)
>
link
|
Eric Volkmann · Andreas Radler · Johannes Brandstetter · Arturs Berzins
🔗
|
-
|
Personalizing AI Interventions in Multiple Health Behavioral Change Settings
(
Poster
)
>
link
|
Samantha Marks · Michelle Chang · Eura Nofshin · Weiwei Pan · Finale Doshi-Velez
🔗
|
-
|
Restoring Task-Relevant Information in Synthetic Data: A Small-Scale V-Information View
(
Poster
)
>
link
|
Sid Bharthulwar
🔗
|
-
|
Measuring Memorization and Generalization in Forecasting Models via Structured Perturbations of Chaotic Systems
(
Poster
)
>
link
|
Max Kanwal · Caryn Tran
🔗
|
-
|
Extrapolation by Association: Length Generalization Transfer in Transformers
(
Poster
)
>
link
|
Jack Cai · Nayoung Lee · Avi Schwarzschild · Samet Oymak · Dimitris Papailiopoulos
🔗
|
-
|
Effective Reinforcement Learning for Reasoning in Language Models
(
Poster
)
>
link
|
Lianghuan Huang · Shuo Li · Sagnik Anupam · Insup Lee · Osbert Bastani
🔗
|
-
|
Improving Pathfinding with Anchoring Tokens
(
Poster
)
>
link
|
Huaqing Zhang · Bingbin Liu · Juno Kim · Andrej Risteski
🔗
|
-
|
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
(
Poster
)
>
link
|
Sumedh Hindupur · Ekdeep Singh Lubana · Thomas Fel · Demba Ba
🔗
|
-
|
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers
(
Poster
)
>
link
|
Pulkit Gopalani · Wei Hu
🔗
|
-
|
The Necessity for Intervention Fidelity: Unintended Side Effects When Steering LLMs
(
Poster
)
>
link
|
Jonas Raedler · Weiyue Li · Alyssa Taliotis · Manasvi Goyal · Siddharth Swaroop · Weiwei Pan
🔗
|
-
|
Neural Stochastic Differential Equations on Compact State-Spaces
(
Poster
)
>
link
|
Yue-Jane Liu · Malinda Lu · Matthew Nock · Yaniv Yacoby
🔗
|
-
|
Quantitative Bounds for Length Generalization in Transformers
(
Poster
)
>
link
|
Zachary Izzo · Eshaan Nichani · Jason Lee
🔗
|
-
|
Geometry of Rank Constraints in Shallow Polynomial Neural Networks
(
Poster
)
>
link
|
Param Mody · Maksym Zubkov
🔗
|
-
|
On the Emergence of Position Bias in Transformers
(
Poster
)
>
link
|
Xinyi Wu · Yifei Wang · Stefanie Jegelka · Ali Jadbabaie
🔗
|
-
|
Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models
(
Poster
)
>
link
|
Tina Behnia · Puneesh Deora · Christos Thrampoulidis
🔗
|
-
|
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
(
Poster
)
>
link
|
Annalisa Belloni · Lorenzo Noci · Antonio Orvieto
🔗
|
-
|
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
(
Poster
)
>
link
|
Hanlin Zhu · Shibo Hao · Zhiting Hu · Jiantao Jiao · Stuart Russell · Yuandong Tian
🔗
|
-
|
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
(
Poster
)
>
link
|
Halil Alperen Gozeten · Muhammed Emrullah Ildiz · Xuechen Zhang · Hrayr Harutyunyan · Ankit Singh Rawat · Samet Oymak
🔗
|
-
|
AdaptMI: Adaptive Skill-based In-context Math Instructions for Small Language Models
(
Poster
)
>
link
|
Yinghui He · Abhishek Panigrahi · Yong LIN · Sanjeev Arora
🔗
|