Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)
An Empirical Investigation of Initialization Strategies for Kolmogorov–Arnold Networks
Spyros Rigas · Dhruv Verma · Georgios Alexandridis · Yixuan Wang
Keywords: [ initialization ] [ power law ] [ deep learning ] [ Kolmogorov–Arnold networks ]
Kolmogorov–Arnold Networks (KANs) are a recently introduced neural architecture that use trainable activation functions instead of fixed ones, offering greater flexibility and interpretability. Although KANs have shown promising results across various tasks, little attention has been given to how they should be initialized. In this work, we explore alternative initialization strategies, including two variance-preserving methods based on classical ideas and an empirical power-law approach with tunable exponents. Using function fitting as a small-scale testbed, we run a large grid search over architectures and initialization settings. We find that power-law configurations consistently outperform the standard baseline initialization across all architectures. The variance-preserving methods tend to underperform on smaller models but outperform the baseline as networks grow deeper and wider, though they still do not match the performance of power-law initialization. Overall, our results highlight initialization as an important yet underexplored aspect of KANs and point to several directions for future work.