ICML DREAM-VLA: A Unified Vision-Language-Action Model for Dynamic Reflection and Error-Aware Mitigation

Poster
in
Affinity Workshop: New In ML

DREAM-VLA: A Unified Vision-Language-Action Model for Dynamic Reflection and Error-Aware Mitigation

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: Current Vision-Language-Action (VLA) models often struggle with error brittleness and catastrophic forgetting, limiting their real-world deployment. We introduce DREAM-VLA, a unified VLA model built on the $\pi_0$ backbone, designed for Dynamic Reflection and Error-Aware Mitigation. DREAM-VLA features a dynamic self-reflective loop with augmented textual feedback and a novel three-tier hierarchical architecture: a Chain-of-Thought (CoT) enhanced VLM, a modular Error-Aware layer with a learnable error category identification token for continual learning of diverse error types, and an Action Expert layer with specialized adapters for targeted recovery. A two-stage training methodology, combining curriculum learning for foundational awareness and continual learning focused on successful recovery sub-sequences, equips DREAM-VLA with robust error-handling capabilities. This framework aims to significantly improve the reliability and adaptability of embodied agents through structured error detection, reflection, and learned mitigation.

Chat is not available.

Poster in Affinity Workshop: New In ML

DREAM-VLA: A Unified Vision-Language-Action Model for Dynamic Reflection and Error-Aware Mitigation

Poster
in
Affinity Workshop: New In ML