Poster
Neurosymbolic World Models for Sequential Decision Making
Leonardo Hernandez Cano · Maxine Perroni-Scharf · Neil Dhir · Arun Ramamurthy · Armando Solar-Lezama
East Exhibition Hall A-B #E-1506
We present Structured World Modeling for Policy Optimization (SWMPO), a framework for unsupervised learning of neurosymbolic Finite State Machines (FSM) that capture environmental structure for policy optimization. Traditional unsupervised world modeling methods rely on unstructured representations, such as neural networks, that do not explicitly represent high-level patterns within the system (e.g., patterns in the dynamics of regions such as \emph{water} and \emph{land}).Instead, SWMPO models the environment as a finite state machine (FSM), where each state corresponds to a specific region with distinct dynamics. This structured representation can then be leveraged for tasks like policy optimization. Previous works that synthesize FSMs for this purpose have been limited to discrete spaces, not continuous spaces. Instead, our proposed FSM synthesis algorithm operates in an unsupervised manner, leveraging low-level features from unprocessed, non-visual data, making it adaptable across various domains. The synthesized FSM models are expressive enough to be used in a model-based Reinforcement Learning scheme that leverages offline data to efficiently synthesize environment-specific world models.We demonstrate the advantages of SWMPO by benchmarking its environment modeling capabilities in simulated environments.
We tackle the problem of automatically discovering and exploiting 'high-level structure' from robot sensors.For instance, consider a robot that operates in (say) two conditions: water and land; we are interested in using sensor data to automatically build a representation (i.e., model) of the robot's environment that is structured into two 'parts', one of for each condition.To this end, we describe a system designed to learn this type of structured model, and then show that this representation can be used to efficiently build models of new environments (by reusing the 'parts') and to train the robot 'in simulation' (i.e., using the model).