Poster
Accurate and Efficient World Modeling with Masked Latent Transformers
Maxime Burchi · Radu Timofte
West Exhibition Hall B2-B3 #W-605
We introduce EMERALD, a new method in the field of world modeling that helps computers to simulate the world more accurately and efficiently compared to previous approaches. World modeling can improve sample efficiency and safety when training AI agents by generating imaginary training trajectories rather than interacting with the real world. Our proposed world model uses a spatial hidden state to carry more information and simulate the environment more accurately. This increase in precision improves the performance of the agent in complex visual environments like Crafter where details can be crucial. We also propose to use MaskGIT, an efficient prediction algorithm for image and video generation methods with spatial states. This makes EMERALD both accurate and efficient compared to previous approaches. We evaluate our method on the Crafter benchmark and demonstrate state-of-the-art performance. Our method also generalizes on Atari games that do not necessarily require the use of a spatial hidden state to perceive crucial details and achieve strong performance.