Bimanual Robot Manipulation via Multi-Agent In-Context Learning
Summary
BiCICLe (Bimanual Coordinated In-Context Learning) is a novel framework enabling Large Language Models (LLMs) to perform few-shot bimanual robot manipulation without fine-tuning. It addresses the challenge of high-dimensional joint action spaces and tight inter-arm coordination by framing bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. The framework introduces "Arms' Debate" for iterative refinement and an "LLM-as-Judge" for Best-of-N self-evaluation to select plausible coordinated trajectories. Evaluated on 13 tasks from the TWIN benchmark, BiCICLe achieves up to a 71.1% average success rate, outperforming the best training-free baseline by 6.7 percentage points and surpassing most supervised methods. It also demonstrates strong few-shot generalization on novel tasks not included in the benchmark.
Key takeaway
For research scientists developing bimanual robot control systems, BiCICLe offers a compelling training-free alternative to traditional supervised methods. You should consider adopting its leader-follower decomposition and inference-time refinement strategies to achieve strong performance and out-of-distribution generalization, especially for tasks requiring precise inter-arm coordination. This approach significantly reduces the need for extensive, task-specific datasets and fine-tuning, accelerating deployment to novel scenarios.
Key insights
Decoupling bimanual robot control into leader-follower LLM agents enables effective, training-free in-context learning.
Principles
- Factor bimanual control into sequential single-arm predictions.
- Explicitly condition follower arm on leader's plan.
- Iterative refinement improves inter-arm synchronization.
Method
BiCICLe uses a leader-follower LLM architecture where one arm predicts its trajectory, then the other conditions on it. "Arms' Debate" iteratively refines plans, and "Best-of-N" uses an LLM-as-Judge to select optimal trajectories.
In practice
- Use text-based object positions for robust observations.
- Employ voxel downsampling for accurate point cloud centroids.
- Consider leader-follower for asymmetric manipulation tasks.
Topics
- Bimanual Robot Manipulation
- In-Context Learning
- Multi-Agent LLM Coordination
- BiCICLe Framework
- Leader-Follower Decomposition
Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.