Chameleon: Control-Indexed Prospective Memory for Visuomotor Manipulation
Summary
Chameleon, a novel bio-inspired memory architecture, addresses perceptual aliasing in long-horizon robotic manipulation tasks where decision-time observations are insufficient. Developed by MARS Lab, Nanyang Technological University, Institute for Infocomm Research, A*STAR, Singapore, and National University of Singapore, Chameleon employs geometry-grounded multimodal tokens and a differentiable memory stack for goal-directed recall. It was evaluated on Camo-Dataset, a real-robot UR5e dataset featuring episodic recall, spatial tracking, and sequential manipulation under perceptual aliasing. Chameleon consistently outperformed strong baselines like Diffusion Policy, Flow Matching, and ACT, achieving 100.0% DSR (100.0% κ) in episodic recall, 73.5% DSR (60.3% κ) in spatial tracking, and 72.2% DSR (71.2% κ) in sequential tasks. The system processes observations into view-consistent patch tokens, uses a hierarchical memory with episodic and working states, and incorporates a HoloHead for latent imagination, enabling real-time control with an 82 ms inference latency.
Key takeaway
For robotics engineers developing long-horizon manipulation systems, Chameleon demonstrates that integrating bio-inspired episodic memory is crucial for overcoming perceptual aliasing. You should consider implementing geometry-grounded encoding and goal-directed recall mechanisms to ensure reliable decision-making in non-Markovian environments. This approach significantly improves control stability and task completion, especially in scenarios with occluded or transient task-relevant states, reducing failures from ambiguous observations.
Key insights
Bio-inspired episodic memory with geometry-grounded encoding and goal-directed recall improves robotic manipulation under perceptual aliasing.
Principles
- Perceptual aliasing requires history-dependent decisions.
- Disambiguating encoding reduces memory interference.
- Goal-directed recall improves decision utility.
Method
Chameleon uses a Perception→Memory→Policy pipeline. Perception creates geometry-grounded tokens. Memory uses hierarchical episodic and working states with HoloHead for predictive imagination. Policy generates end-effector trajectories via conditional flow matching.
In practice
- Use geometry-grounded tokens for robust perception.
- Implement multi-timescale memory for long-horizon tasks.
- Apply latent imagination to stabilize decision states.
Topics
- Robotic Manipulation
- Episodic Memory
- Perceptual Aliasing
- UR5e Robot
- Differentiable Memory
- HoloHead
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.