HiRo: A Compact Four-Directional Hierarchical Reservoir Token-Mixer for Efficient Image Classification

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

HiRo, a new parameter-efficient image classification model, addresses the challenge of balancing local feature modeling, cross-window interaction, and parameter count in high-performing architectures. It integrates shifted-window partitioning with multi-directional hierarchical reservoir computing. The model processes images by dividing them into patches, applying 2D sinusoidal positional encodings, and scanning tokens in four directions within local windows. A two-stage slice-and-mix reservoir module, featuring fixed reservoirs with trainable closed-loop readouts, processes these directional sequences. Consecutive blocks alternate between regular and shifted windows for cross-window interaction. HiRo achieves 99.46% accuracy on MNIST, 85.57% on CIFAR-10, and 59.10% on CIFAR-100, all while utilizing under 1M trainable parameters and significantly less memory and time compared to transformer-style baselines.

Key takeaway

For Machine Learning Engineers developing efficient image classification models for resource-constrained environments, you should consider HiRo's approach of combining hierarchical reservoir computing with shifted-window partitioning. This design offers competitive accuracy (e.g., 99.46% on MNIST) with under 1M parameters, significantly reducing memory and computational costs compared to transformer-style baselines. Evaluate its fixed reservoir and multi-directional mixing for your next project requiring high performance with minimal overhead.

Key insights

HiRo combines hierarchical reservoir computing and shifted windows for efficient, high-accuracy image classification with few parameters.

Principles

Method

Images are patched, encoded, and processed in local windows. Tokens are scanned four directions, passed through a two-stage slice-and-mix reservoir, then fused and averaged for classification.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.