Poster
Online Robust Reinforcement Learning Through Monte-Carlo Planning
Tuan Dam · Kishan Panaganti · Brahim Driss · Adam Wierman
West Exhibition Hall B2-B3 #W-720
Imagine you're learning to play a video game by practicing on a simulator, but when you finally play the real game, the physics are slightly different—maybe the character jumps a bit lower or moves a bit slower than in the simulator. This gap between practice and reality is a major challenge in artificial intelligence, where computer programs often train in simplified virtual environments before being deployed in the messy real world. This paper tackles this "simulation-to-reality gap" by making AI planning algorithms more robust—meaning they work well even when the real world differs from their training environment. The researchers focus on a popular AI technique called Monte Carlo Tree Search (MCTS), which is like playing out thousands of possible future scenarios in your head before making a decision. Think of MCTS like a chess player who considers many possible moves and counter-moves before choosing their next play. The difference here is that instead of assuming the game rules are perfectly known, the algorithm plans for uncertainty—it considers that the "rules" of the real world might be somewhat different from what it learned in simulation. The key innovation is building uncertainty directly into the decision-making process. Instead of assuming the best-case scenario, the algorithm prepares for reasonable worst-case scenarios. It's like a cautious driver who plans their route assuming there might be unexpected traffic, rather than optimistically assuming clear roads. The algorithm does this by considering multiple possible versions of how the world might behave, making decisions that work well across all these possibilities, and balancing between being too cautious and being too optimistic. This research is important because it helps bridge the gap between AI systems that work perfectly in labs and AI systems that work reliably in the real world. Applications could include autonomous vehicles that can handle unexpected road conditions, medical treatment planning that accounts for patient variability, financial trading systems that remain stable during market volatility, and robotics that can adapt when the real environment differs from simulations. The researchers proved mathematically that their robust algorithm maintains the same learning speed as traditional methods while being much more reliable when faced with unexpected conditions. They tested this in several scenarios, including gambling problems and navigation tasks, showing that the robust approach maintains steady performance even when the real environment differs significantly from what was expected. This work represents a step toward AI systems that are not just smart, but also reliable and trustworthy in real-world deployment. By explicitly planning for uncertainty rather than ignoring it, we can build AI that performs consistently across the messy, unpredictable conditions of the real world.