ICML K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

Poster
in
Affinity Workshop: New In ML

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We propose a simple yet effective alternative to reward normalization in policy gradient reinforcement learning by integrating a 1D Kalman filter for online reward estimation. Instead of relying on fixed heuristics, our method recursively estimates the latent reward mean, smoothing high-variance returns and adapting to non-stationary environments. This approach incurs minimal overhead and requires no modification to existing policy architectures. Experiments on \textit{LunarLander} and \textit{CartPole} demonstrate that Kalman-filtered rewards significantly accelerate convergence and reduce training variance compared to standard normalization techniques.

Chat is not available.

Poster in Affinity Workshop: New In ML

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

Poster
in
Affinity Workshop: New In ML