Poster
Multi-objective Linear Reinforcement Learning with Lexicographic Rewards
Bo Xue · Dake Bu · Ji Cheng · Yuanyu Wan · Qingfu Zhang
West Exhibition Hall B2-B3 #W-1006
Reinforcement learning (RL) works well when an agent optimizes for a single goal, but many real-world problems require balancing multiple, sometimes competing objectives, like maximizing efficiency while minimizing risk. While single-objective RL has strong theoretical foundations, multi-objective RL (MORL) lacks similar guarantees, making it harder to trust in practical applications. To address this, we focus on lexicographic MORL, where objectives are ranked by importance, e.g., safety first, then performance. We develop the first MORL algorithm with mathematically proven regret bounds, meaning we can quantify how well it performs compared to the best possible strategy. Even when the environment is slightly misrepresented, our method remains robust. This research matters because it provides a principled way to handle real-world tasks where trade-offs are unavoidable, from autonomous driving to healthcare. By guaranteeing performance while respecting priorities, our work helps build more reliable and transparent AI systems.