Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)
Summary
The LeHome Challenge 2026 prizewinning solution, which secured 1st place among 62 teams in the online simulation round and 2nd in the real-world final at ICRA 2026, details a bimanual garment folding system. This system enhances a vision-language-action (VLA) policy through a reinforcement learning loop. Uniquely, the policy network functions as its own value function, predicting success, progress, and future task-relevant quantities. These predictions are crucial for driving advantage estimation, live failure detection, and candidate selection. The solution integrates existing RL concepts with engineering and optimization contributions, including AWR + RECAP for flow-matching VLA, an asynchronous distributed training/rollout pipeline utilizing HuggingFace Hub, inference-time hyperparameter optimization via Thompson sampling, and a robust sim-to-real recipe featuring camera-alignment tooling, extensive augmentation, and DAgger-like human-in-the-loop data collection.
Key takeaway
For Robotics Engineers developing bimanual manipulation systems for deformable objects, this LeHome Challenge solution provides a validated recipe. You should consider integrating its VLA policy, which doubles as a value function for success prediction, into your RL framework. Adopting its asynchronous distributed training via HuggingFace Hub and its robust sim-to-real pipeline, including DAgger-like HIL, can significantly accelerate your development and improve real-world performance.
Key insights
A bimanual garment folding system leverages a VLA policy with an RL loop, where the policy network also serves as its own value function.
Principles
- Policy networks can serve as their own value functions.
- AWR and RECAP can be combined for VLA flow-matching.
- Asynchronous distributed training enhances RL scalability.
Method
Improve a VLA policy via an RL loop where the policy network predicts success and progress for advantage estimation. Integrate AWR+RECAP, asynchronous distributed training, Thompson sampling for inference-time optimization, and a sim-to-real pipeline with DAgger-like HIL.
In practice
- Utilize HuggingFace Hub for distributed training.
- Apply Thompson sampling for inference hyperparameter optimization.
- Employ DAgger-like HIL for robust sim-to-real transfer.
Topics
- Bimanual Garment Folding
- Vision-Language-Action Policies
- Reinforcement Learning
- Sim-to-Real Transfer
- Distributed Training
- Thompson Sampling
Best for: MLOps Engineer, AI Engineer, Computer Vision Engineer, Robotics Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.