Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: New In ML

DREAM-VLA: A Unified Vision-Language-Action Model for Dynamic Reflection and Error-Aware Mitigation


Abstract: Current Vision-Language-Action (VLA) models often struggle with error brittleness and catastrophic forgetting, limiting their real-world deployment. We introduce DREAM-VLA, a unified VLA model built on the $\pi_0$ backbone, designed for Dynamic Reflection and Error-Aware Mitigation. DREAM-VLA features a dynamic self-reflective loop with augmented textual feedback and a novel three-tier hierarchical architecture: a Chain-of-Thought (CoT) enhanced VLM, a modular Error-Aware layer with a learnable error category identification token for continual learning of diverse error types, and an Action Expert layer with specialized adapters for targeted recovery. A two-stage training methodology, combining curriculum learning for foundational awareness and continual learning focused on successful recovery sub-sequences, equips DREAM-VLA with robust error-handling capabilities. This framework aims to significantly improve the reliability and adaptability of embodied agents through structured error detection, reflection, and learned mitigation.

Chat is not available.