LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation

2026-06-17 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

LEAP: Layer-skipping Efficiency via Adaptive Progression is a novel training curriculum designed to improve Vision Transformer (ViT) feature-based knowledge distillation, particularly for deploying Vision Foundation Models (VFMs) on edge devices. It tackles the common teacher-student gap where smaller student architectures struggle to imitate complex feature maps from larger teacher models. LEAP utilizes the teacher's intermediate feature maps as a sequence of progressively more difficult targets, enabling the student to build foundational representations before higher-level abstractions. This method significantly accelerates convergence and boosts performance, with a LEAP-distilled ViT-S achieving 90.1% accuracy on ImageNet-100, a +12.24% improvement over baseline. Furthermore, it yields +3.84% and +7.75% improvements for instance retrieval on ImageNet-1K's Oxford and Paris datasets, respectively, while saving 25.1% in training FLOPs and 21% in training time on ImageNet-100.

Key takeaway

For Machine Learning Engineers optimizing Vision Transformer distillation for edge deployment, LEAP offers a significant advancement. You should consider implementing this adaptive progression curriculum to mitigate the teacher-student gap, potentially achieving higher accuracy like 90.1% on ImageNet-100 and substantial training efficiency gains, including 25.1% FLOPs reduction. This approach can accelerate your model's convergence and improve performance on downstream tasks like instance retrieval, making your smaller architectures more effective.

Key insights

LEAP's adaptive progression with intermediate teacher features closes the ViT distillation gap, boosting accuracy and accelerating convergence.

Principles

Teacher's intermediate features offer progressive learning targets.
Adaptive difficulty selection accelerates model convergence.
Mitigate teacher-student gap with structured curriculum.

Method

LEAP employs a training curriculum that uses a teacher's intermediate feature maps as progressively difficult targets, allowing the student to build representations before tackling complex abstractions. Early-stopping for teacher inference saves FLOPs and time.

In practice

Distill ViT models for edge deployment.
Improve instance retrieval task performance.
Reduce training FLOPs and time.

Topics

Vision Transformers
Knowledge Distillation
Model Efficiency
Edge AI
ImageNet
DINOv2

Code references

KevinZ0217/LEAP

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.