Poster
Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Taehyun Cho · Seungyub Han · Seokhun Ju · Dohyeong Kim · Kyungjae Lee · Jungwoo Lee
West Exhibition Hall B2-B3 #W-1014
Reinforcement learning(RL) helps computers learn to make decisions, like choosing the best move in a game or guiding robots through tasks. Traditional RL methods focus only on the average outcome of actions, which might not be enough for safe and reliable decisions in the real world. To solve this, researchers have developed distributional RL, a method that considers all possible outcomes and their probabilities, not just the average.However, handling these full distributions is tricky because they contain infinite information. This paper introduces a new concept called Bellman Unbiasedness, which allows us to estimate the key information from these distributions using moments efficiently—like the mean and variance—without errors, even when working with just a few samples. The authors also propose a new algorithm, SF-LSVI, that learns decision-making strategies effectively and unbiasedly, even when using general function approximations (such as neural networks).This work could make RL more trustworthy and applicable to real-world problems, such as safer robotic control, smarter navigation systems, and better AI decision-making.