CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
Summary
CLASP is a novel modular robotic architecture designed to bridge the gap between natural language understanding and data-efficient robot task execution. It combines Task-Parameterized Kernelized Movement Primitives (TP-KMPs) for acquiring skills from just 2 to 5 kinesthetic demonstrations with pretrained Vision-Language Models (VLMs) for natural language grounding. During learning, the VLM generates skill schemas detailing parameters and preconditions. For execution, the VLM interprets commands to select appropriate skills, reason about parameter bindings, and compose novel behaviors using covariance-weighted composition. A key feature is its ability to identify capability gaps and request targeted demonstrations when no existing skill or composition suffices, all without fine-tuning the VLM. Validation on a 7-DoF manipulator demonstrated high success rates, ranging from 73.3% to 100%, across scenarios involving skill selection, composition, and active learning.
Key takeaway
For Robotics Engineers designing language-driven robot systems, CLASP offers a robust approach to achieve both natural language grounding and data efficiency. You should consider integrating task-parameterized learning with pretrained vision-language models to reduce demonstration requirements to 2-5 per skill. This method allows your robots to interpret complex commands, compose new behaviors, and actively identify skill gaps, significantly streamlining development and deployment without VLM fine-tuning.
Key insights
CLASP integrates TP-KMPs with VLMs for data-efficient, language-driven robot skill learning, selection, and composition.
Principles
- Modular design integrates VLM and TP-KMP.
- Few-shot kinesthetic demos enable skill learning.
- VLMs interpret commands for skill selection.
Method
Acquire skills from 2-5 kinesthetic demos. VLM generates skill schemas. VLM interprets commands for selection, parameter binding, and covariance-weighted composition. Request demos for gaps.
In practice
- Use TP-KMPs for data-efficient skill acquisition.
- Employ VLMs for language-driven robot control.
- Implement active learning for capability expansion.
Topics
- Robot Skill Learning
- Vision-Language Models
- Task-Parameterized Learning
- Natural Language Robot Control
- Active Learning
- Kinesthetic Demonstration
Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.