Bimanual Robot Manipulation via Multi-Agent In-Context Learning

2026-04-23 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

BiCICLe (Bimanual Coordinated In-Context Learning) is a novel framework enabling Large Language Models (LLMs) to perform few-shot bimanual robot manipulation without fine-tuning. It addresses the challenge of high-dimensional joint action spaces and tight inter-arm coordination by framing bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. The framework introduces "Arms' Debate" for iterative refinement and an "LLM-as-Judge" for Best-of-N self-evaluation to select plausible coordinated trajectories. Evaluated on 13 tasks from the TWIN benchmark, BiCICLe achieves up to a 71.1% average success rate, outperforming the best training-free baseline by 6.7 percentage points and surpassing most supervised methods. It also demonstrates strong few-shot generalization on novel tasks not included in the benchmark.

Key takeaway

For research scientists developing bimanual robot control systems, BiCICLe offers a compelling training-free alternative to traditional supervised methods. You should consider adopting its leader-follower decomposition and inference-time refinement strategies to achieve strong performance and out-of-distribution generalization, especially for tasks requiring precise inter-arm coordination. This approach significantly reduces the need for extensive, task-specific datasets and fine-tuning, accelerating deployment to novel scenarios.

Key insights

Decoupling bimanual robot control into leader-follower LLM agents enables effective, training-free in-context learning.

Principles

Factor bimanual control into sequential single-arm predictions.
Explicitly condition follower arm on leader's plan.
Iterative refinement improves inter-arm synchronization.

Method

BiCICLe uses a leader-follower LLM architecture where one arm predicts its trajectory, then the other conditions on it. "Arms' Debate" iteratively refines plans, and "Best-of-N" uses an LLM-as-Judge to select optimal trajectories.

In practice

Use text-based object positions for robust observations.
Employ voxel downsampling for accurate point cloud centroids.
Consider leader-follower for asymmetric manipulation tasks.

Topics

Bimanual Robot Manipulation
In-Context Learning
Multi-Agent LLM Coordination
BiCICLe Framework
Leader-Follower Decomposition

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.