ICML Real-World Autonomy: Building Modular, Voice-Guided Embodied Agents with SLMs and Vision

Demonstration
in
Expo Demonstration: Real-World Autonomy: Building Modular, Voice-Guided Embodied Agents with SLMs and Vision

Real-World Autonomy: Building Modular, Voice-Guided Embodied Agents with SLMs and Vision

Aastha Varma

[ Abstract ]

Mon 14 Jul 4 p.m. PDT — 7 p.m. PDT

Abstract:

We present a new approach to embodied intelligence—one grounded in modular AI systems - combining small language models (SLMs), vision models, and speech interfaces. This architecture enables fast, intuitive agent behavior—even in low-resource, real-world environments.

Our prototype, an AI-powered exoskeleton, performs physical tasks through natural human interaction. It operates in three modes: Shadow (mimic gestures), Command(respond to voice), and Training (learn by demonstration). High-level reasoning is handled by SLMs, while fast, modular controllers manage low-level control.

Chat is not available.

Demonstration in Expo Demonstration: Real-World Autonomy: Building Modular, Voice-Guided Embodied Agents with SLMs and Vision

Real-World Autonomy: Building Modular, Voice-Guided Embodied Agents with SLMs and Vision

Aastha Varma

Demonstration
in
Expo Demonstration: Real-World Autonomy: Building Modular, Voice-Guided Embodied Agents with SLMs and Vision