ICML Poster Learning to Reuse Policies in State Evolvable Environments

Poster

Learning to Reuse Policies in State Evolvable Environments

Ziqian Zhang · Bohan Yang · Lihe Li · Yuqi Bian · Ruiqi Xue · Feng Chen · Yi-Chen Li · lei yuan · Yang Yu

West Exhibition Hall B2-B3 #W-602

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: The policy trained via reinforcement learning (RL) makes decisions based on sensor-derived state features. It is common for state features to evolve for reasons such as periodic sensor maintenance or the addition of new sensors for performance improvement. The deployed policy fails in new state space when state features are unseen during training. Previous work tackles this challenge by training a sensor-invariant policy or generating multiple policies and selecting the appropriate one with limited samples. However, both directions struggle to guarantee the performance when faced with unpredictable evolutions. In this paper, we formalize this problem as state evolvable reinforcement learning (SERL), where the agent is required to mitigate policy degradation after state evolutions without costly exploration. We propose **Lapse** by reusing policies learned from the old state space in two distinct aspects. On one hand, Lapse directly reuses the *robust* old policy by composing it with a learned state reconstruction model to handle vanishing sensors. On the other hand, the behavioral experience from the old policy is reused by Lapse to train a newly adaptive policy through offline learning, better utilizing new sensors. To leverage advantages of both policies in different scenarios, we further propose *automatic ensemble weight adjustment* to effectively aggregate them. Theoretically, we justify that robust policy reuse helps mitigate uncertainty and error from both evolution and reconstruction. Empirically, Lapse achieves a significant performance improvement, outperforming the strongest baseline by about $2\times$ in benchmark environments.

Lay Summary:

Many automated systems rely on sensor data to make decisions, but sensors can change over time—for example, due to maintenance or upgrades. When this happens, decision-making policies trained on old sensor data often experience a drop in performance, because they are not prepared for new or missing information.Previous solutions try to make policies ignore sensor differences or train several policies and pick the right one, but these approaches often struggle when facing unexpected sensor changes. To address this, we introduce a new framework called state evolvable reinforcement learning (SERL), which aims to maintain reliable performance even as sensors change, without costly trial-and-error.Our method, called Lapse, reuses knowledge from old sensor setups in two ways: it adapts the old policy to work with missing sensors and uses experience from the old setup to train a new, more adaptive policy for new sensors. Lapse can automatically combine both approaches depending on the situation. In tests, Lapse showed significantly better performance than previous methods, helping automated systems stay reliable as their sensors evolve.

Chat is not available.