LadderMan: Learning Humanoid Perceptive Ladder Climbing

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

LadderMan is a unified system enabling Unitree G1 humanoid robots to robustly climb diverse ladders and perform on-ladder manipulation. It utilizes a scalable two-stage learning pipeline: hybrid motion tracking learns multiple climbing experts from a single reference motion, which are then distilled into a unified depth-based visuomotor policy via hybrid imitation and reinforcement learning. To facilitate zero-shot real-world deployment, LadderMan employs the Fast-FoundationStereo vision foundation model to bridge the sim-to-real gap in depth perception. A separate dual-agent manipulation policy allows stable teleoperated tasks, such as adjusting paintings or replacing light bulbs, while maintaining balance. Experiments demonstrate robust climbing across varying ladder geometries and materials, achieving human-comparable speeds of approximately 3.4 seconds per rung.

Key takeaway

For robotics engineers developing humanoid systems for industrial or maintenance tasks, LadderMan offers a robust framework for complex multi-contact locomotion. You can achieve reliable ladder climbing and stable on-ladder manipulation, even with diverse ladder geometries, by adopting its two-stage learning and sim-to-real perception bridging techniques. Consider its dual-agent approach for integrating manipulation without compromising balance.

Key insights

LadderMan enables robust humanoid ladder climbing and manipulation through perceptive, sim-to-real learning.

Principles

Hybrid motion tracking learns diverse expert policies from a single reference motion.
Vision foundation models effectively bridge sim-to-real depth perception gaps.
Dual-agent learning decouples lower-body stabilization from upper-body manipulation.

Method

A two-stage pipeline learns expert climbing policies via hybrid motion tracking, then distills them into a visuomotor policy using hybrid imitation/RL, enhanced by a VFM for depth perception.

In practice

Deploy zero-shot sim-to-real climbing on humanoids like the Unitree G1.
Perform stable on-ladder manipulation via teleoperation.
Use rung-focused masking to improve depth perception robustness.

Topics

Humanoid Robotics
Ladder Climbing
Reinforcement Learning
Sim-to-Real Transfer
Vision Foundation Models
Loco-Manipulation

Code references

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.