CREST: Deployment-Realistic Hardware-in-the-Loop NAS for Embedded Sensing Systems
Summary
CREST (Cross-platform Runtime Evaluation and Search Tool) is a new hardware-in-the-loop (HIL) neural architecture search (NAS) framework designed for deploying neural networks on low-power microcontrollers (MCUs). It addresses the shortcomings of traditional NAS workflows that often rely on static proxy costs like FLOPs, treat one MCU as representative, or use continuous-inference tests instead of realistic sensing schedules. CREST enables deployment-realistic evaluation by allowing configurable workload, model family, target backend, schedule, quantization, and scoring policy. Evaluations on inertial odometry and audio classification across three Arm Cortex-M targets demonstrated its effectiveness. For inertial odometry, CREST's measured-energy HIL search reduced median per-inference energy by 41.7% compared to FLOPs-based selection and 40.8% versus memory-traffic-based selection, while maintaining similar error. FLOPs-based selection also resulted in infeasible deployments on memory-constrained targets. The framework highlights the necessity of jointly optimizing model architecture, target platform, runtime schedule, and deployment policy for effective MCU NAS.
Key takeaway
For Machine Learning Engineers deploying neural networks on low-power microcontrollers, relying solely on static proxy costs like FLOPs or continuous-inference benchmarks is insufficient and risky. You should adopt hardware-in-the-loop (HIL) NAS frameworks like CREST to jointly optimize model architecture, target platform, runtime schedule, and deployment policy. This approach ensures feasible deployments and can significantly reduce per-inference energy, as demonstrated by a 41.7% reduction in median energy for inertial odometry. Prioritize realistic evaluation to avoid infeasible designs and optimize true energy consumption.
Key insights
Deployment-realistic NAS for MCUs requires hardware-in-the-loop evaluation, jointly optimizing architecture, platform, schedule, and policy beyond static proxies.
Principles
- Static proxy costs mis-rank Pareto candidates.
- Continuous-inference tests obscure schedule-dependent energy.
- Optimal architectures vary across target platforms.
Method
CREST fixes the optimizer, HIL measurement boundary, logging, and replay workflow. It exposes workload, model family, target backend, schedule, quantization, and scoring policy as configurable axes for deployment-realistic evaluation.
In practice
- Use HIL NAS for low-power MCU deployments.
- Evaluate energy with realistic sensing schedules.
- Jointly optimize architecture, platform, and schedule.
Topics
- Neural Architecture Search
- Microcontrollers
- Hardware-in-the-Loop
- Embedded Sensing Systems
- Energy Efficiency
- Arm Cortex-M
Best for: Research Scientist, AI Scientist, AI Hardware Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.