HiRo: A Compact Four-Directional Hierarchical Reservoir Token-Mixer for Efficient Image Classification
Summary
HiRo, a new parameter-efficient image classification model, addresses the challenge of balancing local feature modeling, cross-window interaction, and parameter count in high-performing architectures. It integrates shifted-window partitioning with multi-directional hierarchical reservoir computing. The model processes images by dividing them into patches, applying 2D sinusoidal positional encodings, and scanning tokens in four directions within local windows. A two-stage slice-and-mix reservoir module, featuring fixed reservoirs with trainable closed-loop readouts, processes these directional sequences. Consecutive blocks alternate between regular and shifted windows for cross-window interaction. HiRo achieves 99.46% accuracy on MNIST, 85.57% on CIFAR-10, and 59.10% on CIFAR-100, all while utilizing under 1M trainable parameters and significantly less memory and time compared to transformer-style baselines.
Key takeaway
For Machine Learning Engineers developing efficient image classification models for resource-constrained environments, you should consider HiRo's approach of combining hierarchical reservoir computing with shifted-window partitioning. This design offers competitive accuracy (e.g., 99.46% on MNIST) with under 1M parameters, significantly reducing memory and computational costs compared to transformer-style baselines. Evaluate its fixed reservoir and multi-directional mixing for your next project requiring high performance with minimal overhead.
Key insights
HiRo combines hierarchical reservoir computing and shifted windows for efficient, high-accuracy image classification with few parameters.
Principles
- Fixed reservoirs with trainable readouts reduce parameters.
- Alternate regular and shifted windows for cross-window interaction.
- Multi-directional token scanning enhances feature learning.
Method
Images are patched, encoded, and processed in local windows. Tokens are scanned four directions, passed through a two-stage slice-and-mix reservoir, then fused and averaged for classification.
In practice
- Implement fixed reservoirs for parameter efficiency.
- Use shifted windows for cross-window context.
- Apply 2D sinusoidal positional encodings.
Topics
- Image Classification
- Reservoir Computing
- Token-Mixer
- Parameter Efficiency
- Computer Vision
- Shifted Windows
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.