DEMONSTRATE maps language to control through multi-task demonstrations, enabling robots to follow commands without engineering-tuned examples and allows for a novel detection mechanism of hallucinations before task execution.

Abstract

The integration of large language models (LLMs) into robotic systems holds significant promise for enabling natural language-based control. However, existing approaches often rely on handcrafted prompt examples and lack mechanisms for verifying correctness before execution. DEMONSTRATE presents a framework that removes the need for expert-tuned prompts by replacing in-context examples with demonstrations of low-level tasks. By mapping language descriptions to control objectives using task embeddings and inverse optimal control, the system can generalize to new tasks zero-shot and assess potential hallucinations preemptively. This results in a scalable and more reliable pipeline for deploying LLMs in robotics.

Model Overview

DEMONSTRATE builds upon two modules:

  1. Language Module (LM): Translates natural language commands into embeddings using a pre-trained LLM. These embeddings abstract away syntax and focus on semantic similarity between tasks.
  2. Control Module (CM): Uses multi-task inverse reinforcement learning to associate these embeddings with cost structures derived from demonstrations. This learned parametric mapping form an embedding representations to task defining parameters in the cost function allows for zero-shot synthesis of control problems, reducing reliance on hand-crafted prompts or symbolic formulations, as well as a novel systematic mechanism to detect hallucinations before task execution.
The result is a system that interprets and executes unseen commands with minimal tuning, while allowing pre-execution detection of unknown commands or hallucinations.

Pipeline

DEMONSTRATE Architecture

DEMONSTRATE’s architecture is based on a two-stage pipeline:

  1. A. Offline pipeline: A user provides sub-task descriptions together with trajectory demonstrations and an LLM is employed to compute the embedding vectors for the sub-task descriptions. The demonstrations and the embedding vectors are then used to learn (i) a parametric mapping from embedding to task vector, and (ii) the shared multitask parameters, both used in the online pipeline.
  2. B. Online pipeline: Architecture of the NARRATE pipeline, with an added Sub-task Validation Module and a modified Optimization Designer. A complex user command is initially passed to the Task Planner, dividing it into multiple sub-tasks and another LLM is used to find their embedding representations. After checking their similarity with the provided sub-task examples in the offline pipeline for sub-task validation, they are then fed into the parametric map to compute the feature vector used in the Trajectory Generator block.

Results

DEMONSTRATE achieves comparable or improved success rates to the chosen baseline models in simulation and has shown its applicability in real-world experiments:

BibTeX


@article{demonstrate2024,
  title={DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning},
  author={Anonymous},
  journal={Under Review},
  year={2024}
}