Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)

2026-06-25 · Source: Artificial Intelligence · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The LeHome Challenge 2026 prizewinning solution, which secured 1st place among 62 teams in the online simulation round and 2nd in the real-world final at ICRA 2026, details a bimanual garment folding system. This system enhances a vision-language-action (VLA) policy through a reinforcement learning loop. Uniquely, the policy network functions as its own value function, predicting success, progress, and future task-relevant quantities. These predictions are crucial for driving advantage estimation, live failure detection, and candidate selection. The solution integrates existing RL concepts with engineering and optimization contributions, including AWR + RECAP for flow-matching VLA, an asynchronous distributed training/rollout pipeline utilizing HuggingFace Hub, inference-time hyperparameter optimization via Thompson sampling, and a robust sim-to-real recipe featuring camera-alignment tooling, extensive augmentation, and DAgger-like human-in-the-loop data collection.

Key takeaway

For Robotics Engineers developing bimanual manipulation systems for deformable objects, this LeHome Challenge solution provides a validated recipe. You should consider integrating its VLA policy, which doubles as a value function for success prediction, into your RL framework. Adopting its asynchronous distributed training via HuggingFace Hub and its robust sim-to-real pipeline, including DAgger-like HIL, can significantly accelerate your development and improve real-world performance.

Key insights

A bimanual garment folding system leverages a VLA policy with an RL loop, where the policy network also serves as its own value function.

Principles

Policy networks can serve as their own value functions.
AWR and RECAP can be combined for VLA flow-matching.
Asynchronous distributed training enhances RL scalability.

Method

Improve a VLA policy via an RL loop where the policy network predicts success and progress for advantage estimation. Integrate AWR+RECAP, asynchronous distributed training, Thompson sampling for inference-time optimization, and a sim-to-real pipeline with DAgger-like HIL.

In practice

Utilize HuggingFace Hub for distributed training.
Apply Thompson sampling for inference hyperparameter optimization.
Employ DAgger-like HIL for robust sim-to-real transfer.

Topics

Bimanual Garment Folding
Vision-Language-Action Policies
Reinforcement Learning
Sim-to-Real Transfer
Distributed Training
Thompson Sampling

Best for: MLOps Engineer, AI Engineer, Computer Vision Engineer, Robotics Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.