Methods and Opportunities at Small Scale (MOSS)

Workshop

Methods and Opportunities at Small Scale (MOSS)

Bingbin Liu · Enric Boix-Adserà · Elisabetta Cornacchia · Surbhi Goel · Abhishek Panigrahi · Eran Malach · Cyril Zhang · Benjamin Edelman

West Ballroom B

Sat 19 Jul, 8:45 a.m. PDT

[ Abstract ] Workshop Website

[ OpenReview]

The increasing computational demands of modern ML create a critical challenge: thorough experimentation becomes prohibitively expensive precisely when we most need to understand and steer model behavior. Small-scale experiments (<= 1 GPU) offer a powerful approach for systematic investigation, enabling both scientific understanding and practical advances. Recent work demonstrates the endless opportunities at this scale, including: diagnoses and mitigations of training pathologies; minimalistic replications of modern pipelines; elementary synthetic tasks that “stress test” architectures and motivate new designs; and discovery of intriguing phenomena.This workshop aims to highlight how methods and opportunities at small scale can unlock new insights and drive progress. The emphasis will be on advancing scientific understanding (and, optionally, its interplay with theory), without the need to improve state-of-the-art performance.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 9:00 a.m. - 9:10 a.m.	Registration & Opening Remarks ( Registration & Opening Remarks ) >	🔗
Sat 9:10 a.m. - 9:55 a.m.	Beyond benchmarks: the case for spherical cows in LLM research ( Invited Talk ) >	Aditi Raghunathan 🔗
Sat 9:55 a.m. - 10:40 a.m.	Designing Efficient Attention: Insights from an Inference Perspective ( Invited Talk ) >	Tri Dao 🔗
Sat 10:40 a.m. - 11:45 a.m.	Poster Session 1 ( Poster Session ) >	🔗
Sat 11:45 a.m. - 12:00 p.m.	Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning ( Oral presentation ) >	Xinyi Wang · Shawn Tan · Mingyu Jin · William Wang · Rameswar Panda · Yikang Shen 🔗
Sat 12:00 p.m. - 12:15 p.m.	Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought ( Oral presentation ) >	Hanlin Zhu · Shibo Hao · Zhiting Hu · Jiantao Jiao · Stuart Russell · Yuandong Tian 🔗
Sat 12:15 p.m. - 12:30 p.m.	Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models ( Oral presentation ) >	Tina Behnia · Puneesh Deora · Christos Thrampoulidis 🔗
Sat 1:30 p.m. - 2:15 p.m.	How Jailbreaking 1-Layer Transformers Taught us how to Steer LLMs ( Invited Talk ) >	Eric Wong 🔗
Sat 2:15 p.m. - 3:00 p.m.	The Art of Artificial Reasoning for Small Language Models ( Invited Talk ) >	Yejin Choi 🔗
Sat 3:00 p.m. - 3:15 p.m.	Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge ( Oral presentation ) >	Freya Behrens · Lenka Zdeborova 🔗
Sat 3:15 p.m. - 3:30 p.m.	In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly ( Oral presentation ) >	Puneesh Deora · Bhavya Vasudeva · Tina Behnia · Christos Thrampoulidis 🔗
Sat 3:30 p.m. - 3:45 p.m.	Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models ( Oral presentation ) >	Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju 🔗
Sat 4:00 p.m. - 4:45 p.m.	Panel Discussion with Misha Belkin, Stella Biderman, Leonard Tang ( Panel Discussion ) >	🔗
Sat 4:45 p.m. - 6:00 p.m.	Poster Session 2 ( Poster Session ) >	🔗
-	Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models ( Poster ) > link Link	Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju 🔗
-	Understanding Attention Glitches with Threshold Relative Attention ( Poster ) > link Link	Mattia Opper · Roland Fernandez · Paul Smolensky · Jianfeng Gao 🔗
-	Dynamic Low-Rank Training with Spectral Regularization: Achieving Robustness in Compressed Representations ( Poster ) > link Link	Steffen Schotthöfer · Lexie Yang · Stefan Schnake 🔗
-	How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles ( Poster ) > link Link	Vala Vakilian · Sadegh Mahdavi · Christos Thrampoulidis 🔗
-	Towards Understanding Self-Pretraining for Sequence Classification ( Poster ) > link Link	Omar Coser · Antonio Orvieto 🔗
-	Review, Remask, Refine: Process-Guided Block Diffusion for Text Generation ( Poster ) > link Link	Nikita Mounier · Parsa Idehpour 🔗
-	Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features ( Poster ) > link Link	Yize Zhao · Christos Thrampoulidis 🔗
-	TinyServe: Query-Aware Cache Selection for Efficient LLM Inference ( Poster ) > link Link	Dong Liu · Yanxuan Yu 🔗
-	Transformers May Learn to Classify In-Context by Context-Adaptive Kernel Gradient Descent ( Poster ) > link Link	Sara Dragutinović · Andrew Saxe · Aaditya Singh 🔗
-	Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning ( Poster ) > link Link	Zachary Shinnick · Liangze Jiang · Hemanth Saratchandran · Anton Hengel · Damien Teney 🔗
-	Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO ( Poster ) > link Link	Jaeha Lee · Gio Huh · Ning Su · Tony YU 🔗
-	Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts ( Poster ) > link Link	Rahul Raja · Arpita Vats 🔗
-	Koopman Autoencoders Learn Neural Representation Dynamics ( Poster ) > link Link	Nishant Suresh Aswani · Saif Jabari 🔗
-	Is Visual Prompting the Right Setup for Knowledge Transfer in new Foundation Models? ( Poster ) > link Link	Niclas Hergenröther · Antonio Orvieto 🔗
-	LiteByte: Efficient and Fast-Adapting MLPs for Online Byte-Level Prediction ( Poster ) > link Link	Yu Mao · Yuyan Lin · Xue Liu · Chun Jason Xue 🔗
-	Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks ( Poster ) > link Link	Shakir Yousefi · Andreas Plesner · Till Aczel · Roger Wattenhofer 🔗
-	An Empirical Investigation of Initialization Strategies for Kolmogorov–Arnold Networks ( Poster ) > link Link	Spyros Rigas · Dhruv Verma · Georgios Alexandridis · Yixuan Wang 🔗
-	Performance Plateaus in Inference-Time Scaling for Text-to-Image Diffusion Without External Models ( Poster ) > link Link	Changhyun Choi · Sungha Kim · H. Jin Kim 🔗
-	CaliPSo: Calibrated Predictive Models with Sharpness as Loss Function ( Poster ) > link Link	Alexandre Capone · Kamron Zaidi · Tianyu Xu · Brian Yang · Geoff Pleiss · Jeff Schneider 🔗
-	Pruning Increases Orderedness in Weight-Tied Recurrent Computation ( Poster ) > link Link	Yiding Song 🔗
-	Cross-Validation Error Dynamics in Smaller Datasets ( Poster ) > link Link	Bethany Austhof · Lev Reyzin 🔗
-	Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs ( Poster ) > link Link	Behnoush Khavari · Jayesh Khullar · Mehran Shakerinava · Jerry Huang · Siamak Ravanbakhsh · Sarath Chandar 🔗
-	Understanding How Chess-Playing Language Models Compute Linear Board Representations ( Poster ) > link Link	Aaron Mei 🔗
-	Gradient descent in presence of extreme flatness and steepness ( Poster ) > link Link	Dravyansh Sharma 🔗
-	Foundation Models on a Budget: Approximating Blocks in Large Vision Models ( Poster ) > link Link	Irene Cannistraci · Simone Antonelli · Emanuele Palumbo · Thomas Sutter · Emanuele Rodola · Bastian Rieck · Julia Vogt 🔗
-	Encoding Domain Insights into Multi-modal Fusion: Improved Performance at the Cost of Robustness ( Poster ) > link Link	Jackson Michaels · Sidong Zhang · Madalina Fiterau 🔗
-	Permutations as a testbed for studying the effect of input representations on learning ( Poster ) > link Link	Sarah Scullen · Davis Brown · Robert Jasper · Henry Kvinge · Helen Jenne 🔗
-	ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training ( Poster ) > link Link	Feijiang Han · Xiaodong Yu · Jianheng Tang · Qingyun Zeng · Licheng Guo · Lyle Ungar 🔗
-	SynDaCaTE: A Synthetic Dataset For Evaluating Part-Whole Hierarchical Inference ( Poster ) > link Link	Jake Levi · Mark van der Wilk 🔗
-	Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit ( Poster ) > link Link	Valérie Costa · Thomas Fel · Ekdeep Singh Lubana · Bahareh Tolooshams · Demba Ba 🔗
-	Generative or Discriminative? Revisiting Text Classification in the Era of Transformers ( Poster ) > link Link	Siva Rajesh Kasa · Sumegh Roychowdhury · Karan Gupta · Yaswanth Biruduraju · SANTHOSH KASA · Ashutosh Kumar · Pattisapu Priyatam · Arindam Bhattacharya · Shailendra Agarwal · Vijay huddar 🔗
-	Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge ( Poster ) > link Link	Freya Behrens · Lenka Zdeborová 🔗
-	From SGD to Spectra: A Theory of Neural Network Weight Dynamics ( Poster ) > link Link	Brian Olsen · Sam Fatehmanesh · Frank Xiao · Adarsh Kumarappan · Anirudh Gajula 🔗
-	Efficient B-Tree Insertions Using Proximal Policy Optimization and Hierarchical Attention Models ( Poster ) > link Link	Alexander Kastius · Nick Lechtenbörger · Felix Schulz · Johann Tast · Rainer Schlosser · Ralf Herbrich 🔗
-	Emergence, pretraining loss and associative recall: a toy model ( Poster ) > link Link	Sultan Daniels · Dylan Davis · Dhruv Gautam · Wentinn Liao · Gireeja Ranade · Anant Sahai 🔗
-	Learning Gaussian Mixture Models via Transformer Measure Flows ( Poster ) > link Link	Aleksandr Zimin · Anastasiia Kutakh · Yury Polyanskiy · Philippe Rigollet 🔗
-	Emergence of Hebbian Dynamics in Regularized Non-Local Learners ( Poster ) > link Link	David Koplow · Tomaso A Poggio · Liu Ziyin 🔗
-	Decomposed Learning: An Avenue for Mitigating Grokking ( Poster ) > link Link	Gabryel Mason-Williams · Israel Mason-Williams 🔗
-	In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly ( Poster ) > link Link	Puneesh Deora · Bhavya Vasudeva · Tina Behnia · Christos Thrampoulidis 🔗
-	Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning ( Poster ) > link Link	Xinyi Wang · Shawn Tan · Mingyu Jin · William Wang · Rameswar Panda · Yikang Shen 🔗
-	Approximate Message Passing on General Factor Graphs using Shallow Neural Networks ( Poster ) > link Link	Leonhard Hennicke · Jan Lemcke · Rainer Schlosser · Ralf Herbrich 🔗
-	Optimizing Explanations: Nuances Matter When Evaluation Metrics Become Loss Functions ( Poster ) > link Link	Jonas Raedler · Hiwot Belay Tadesse · Weiwei Pan · Finale Doshi-Velez 🔗
-	Exploring Diverse Solutions for Underdetermined Problems ( Poster ) > link Link	Eric Volkmann · Andreas Radler · Johannes Brandstetter · Arturs Berzins 🔗
-	Personalizing AI Interventions in Multiple Health Behavioral Change Settings ( Poster ) > link Link	Samantha Marks · Michelle Chang · Eura Nofshin · Weiwei Pan · Finale Doshi-Velez 🔗
-	Restoring Task-Relevant Information in Synthetic Data: A Small-Scale V-Information View ( Poster ) > link Link	Sid Bharthulwar 🔗
-	Measuring Memorization and Generalization in Forecasting Models via Structured Perturbations of Chaotic Systems ( Poster ) > link Link	Max Kanwal · Caryn Tran 🔗
-	Extrapolation by Association: Length Generalization Transfer in Transformers ( Poster ) > link Link	Jack Cai · Nayoung Lee · Avi Schwarzschild · Samet Oymak · Dimitris Papailiopoulos 🔗
-	Effective Reinforcement Learning for Reasoning in Language Models ( Poster ) > link Link	Lianghuan Huang · Shuo Li · Sagnik Anupam · Insup Lee · Osbert Bastani 🔗
-	Improving Pathfinding with Anchoring Tokens ( Poster ) > link Link	Huaqing Zhang · Bingbin Liu · Juno Kim · Andrej Risteski 🔗
-	Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry ( Poster ) > link Link	Sumedh Hindupur · Ekdeep Singh Lubana · Thomas Fel · Demba Ba 🔗
-	What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers ( Poster ) > link Link	Pulkit Gopalani · Wei Hu 🔗
-	The Necessity for Intervention Fidelity: Unintended Side Effects When Steering LLMs ( Poster ) > link Link	Jonas Raedler · Weiyue Li · Alyssa Taliotis · Manasvi Goyal · Siddharth Swaroop · Weiwei Pan 🔗
-	Neural Stochastic Differential Equations on Compact State-Spaces ( Poster ) > link Link	Yue-Jane Liu · Malinda Lu · Matthew Nock · Yaniv Yacoby 🔗
-	Quantitative Bounds for Length Generalization in Transformers ( Poster ) > link Link	Zachary Izzo · Eshaan Nichani · Jason Lee 🔗
-	Geometry of Rank Constraints in Shallow Polynomial Neural Networks ( Poster ) > link Link	Param Mody · Maksym Zubkov 🔗
-	On the Emergence of Position Bias in Transformers ( Poster ) > link Link	Xinyi Wu · Yifei Wang · Stefanie Jegelka · Ali Jadbabaie 🔗
-	Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models ( Poster ) > link Link	Tina Behnia · Puneesh Deora · Christos Thrampoulidis 🔗
-	Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers ( Poster ) > link Link	Annalisa Belloni · Lorenzo Noci · Antonio Orvieto 🔗
-	Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought ( Poster ) > link Link	Hanlin Zhu · Shibo Hao · Zhiting Hu · Jiantao Jiao · Stuart Russell · Yuandong Tian 🔗
-	Continuous Chain of Thought Enables Parallel Exploration and Reasoning ( Poster ) > link Link	Halil Alperen Gozeten · Muhammed Emrullah Ildiz · Xuechen Zhang · Hrayr Harutyunyan · Ankit Singh Rawat · Samet Oymak 🔗
-	AdaptMI: Adaptive Skill-based In-context Math Instructions for Small Language Models ( Poster ) > link Link	Yinghui He · Abhishek Panigrahi · Yong LIN · Sanjeev Arora 🔗