CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning

2026-06-06 · Source: Machine Learning · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

CLASP is a novel modular robotic architecture designed to bridge the gap between natural language understanding and data-efficient robot task execution. It combines Task-Parameterized Kernelized Movement Primitives (TP-KMPs) for acquiring skills from just 2 to 5 kinesthetic demonstrations with pretrained Vision-Language Models (VLMs) for natural language grounding. During learning, the VLM generates skill schemas detailing parameters and preconditions. For execution, the VLM interprets commands to select appropriate skills, reason about parameter bindings, and compose novel behaviors using covariance-weighted composition. A key feature is its ability to identify capability gaps and request targeted demonstrations when no existing skill or composition suffices, all without fine-tuning the VLM. Validation on a 7-DoF manipulator demonstrated high success rates, ranging from 73.3% to 100%, across scenarios involving skill selection, composition, and active learning.

Key takeaway

For Robotics Engineers designing language-driven robot systems, CLASP offers a robust approach to achieve both natural language grounding and data efficiency. You should consider integrating task-parameterized learning with pretrained vision-language models to reduce demonstration requirements to 2-5 per skill. This method allows your robots to interpret complex commands, compose new behaviors, and actively identify skill gaps, significantly streamlining development and deployment without VLM fine-tuning.

Key insights

CLASP integrates TP-KMPs with VLMs for data-efficient, language-driven robot skill learning, selection, and composition.

Principles

Modular design integrates VLM and TP-KMP.
Few-shot kinesthetic demos enable skill learning.
VLMs interpret commands for skill selection.

Method

Acquire skills from 2-5 kinesthetic demos. VLM generates skill schemas. VLM interprets commands for selection, parameter binding, and covariance-weighted composition. Request demos for gaps.

In practice

Use TP-KMPs for data-efficient skill acquisition.
Employ VLMs for language-driven robot control.
Implement active learning for capability expansion.

Topics

Robot Skill Learning
Vision-Language Models
Task-Parameterized Learning
Natural Language Robot Control
Active Learning
Kinesthetic Demonstration

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.