Poster
Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces
Amirhossein Roknilamouki · Arnob Ghosh · Ming Shi · Fatemeh Nourzad · Eylem Ekici · Ness Shroff
West Exhibition Hall B2-B3 #W-1020
Self-driving cars and robots must learn from experience, yet even a single crash during training is unacceptable. We introduce a two-stage strategy that behaves like a student driver: it begins by cruising cautiously on quiet streets to map out what is safe, then, once confident, moves on to explore actions while still being safe. Despite this cautious start, it learns to explore nearly as efficiently as a risk-taking learner. We prove mathematically that the risk of an accident remains near zero throughout training while achieving (nearly) optimal regret. Our analysis introduces a new mathematical tool to handle hard safety scenarios and also corrects a flaw in earlier research. In our simulated driving tests, the system completed every route without a single collision, while nearly matching the performance of unsafe safe approaches, validating our theoretical insights.