Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)

· Source: Artificial Intelligence · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The LeHome Challenge 2026 prizewinning solution, which secured 1st place among 62 teams in the online simulation round and 2nd in the real-world final at ICRA 2026, details a bimanual garment folding system. This system enhances a vision-language-action (VLA) policy through a reinforcement learning loop. Uniquely, the policy network functions as its own value function, predicting success, progress, and future task-relevant quantities. These predictions are crucial for driving advantage estimation, live failure detection, and candidate selection. The solution integrates existing RL concepts with engineering and optimization contributions, including AWR + RECAP for flow-matching VLA, an asynchronous distributed training/rollout pipeline utilizing HuggingFace Hub, inference-time hyperparameter optimization via Thompson sampling, and a robust sim-to-real recipe featuring camera-alignment tooling, extensive augmentation, and DAgger-like human-in-the-loop data collection.

Key takeaway

For Robotics Engineers developing bimanual manipulation systems for deformable objects, this LeHome Challenge solution provides a validated recipe. You should consider integrating its VLA policy, which doubles as a value function for success prediction, into your RL framework. Adopting its asynchronous distributed training via HuggingFace Hub and its robust sim-to-real pipeline, including DAgger-like HIL, can significantly accelerate your development and improve real-world performance.

Key insights

A bimanual garment folding system leverages a VLA policy with an RL loop, where the policy network also serves as its own value function.

Principles

Method

Improve a VLA policy via an RL loop where the policy network predicts success and progress for advantage estimation. Integrate AWR+RECAP, asynchronous distributed training, Thompson sampling for inference-time optimization, and a sim-to-real pipeline with DAgger-like HIL.

In practice

Topics

Best for: MLOps Engineer, AI Engineer, Computer Vision Engineer, Robotics Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.