Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)
Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
Jaeha Lee · Gio Huh · Ning Su · Tony YU
Keywords: [ reinforcement learning ] [ transformer model ] [ functional decomposition ] [ polynomial decomposition ] [ symbolic reasoning ] [ beam search ]
We study the capabilities of small-scale transformer models in symbolic reasoning, focusing on the NP-hard algebraic task of multivariate polynomial decomposition, with widespread applications in science and engineering. Our approach includes a fine-grained synthetic data generation pipeline, supervised pretraining, beam search, evaluations for scaling behavior and generalizability, and a novel rank-aware reinforcement learning method called Beam Grouped Relative Policy Optimization (BGRPO), which improves accuracy while reducing inference compute by up to 75%. Additionally, our model demonstrates competitive performance in polynomial simplification, outperforming Mathematica in various cases.