KATANA: A Fast, Low-Power Mapping of Kalman Filters onto Edge NPUs for Real-Time Tracking
Summary
KATANA is an NPU-aware optimization framework designed for mapping Linear and Extended Kalman Filters (LKF, EKF) onto commercial Neural Processing Units found in contemporary AI-PC SoCs, such as the Intel Core Ultra Series 1 and 2. This framework addresses the critical need for real-time, low-power state estimation in edge deployments like radar surveillance, counter-UAV defense, autonomous driving, and robotics, where traditional CPU execution serializes multi-object tracking updates and custom accelerators are costly. KATANA employs three algebraic graph rewrites: subtract-to-add reformulation via a precomputed negative-projection matrix H_neg, static-shape tensor fusion, and block-diagonal batched parallelization, ensuring 100% of operations execute on the NPU's matrix engine. On the Series 2, the optimized batched EKF achieves 223.35 FPS at 13.43 W, and the LKF reaches 408.73 FPS at 14.05 W, delivering up to a 97.9% reduction in dynamic energy versus CPU implementations.
Key takeaway
For Robotics Engineers or AI Hardware Engineers developing real-time tracking systems on edge platforms, you should consider existing NPU capabilities on AI-PC SoCs like Intel Core Ultra Series 1 and 2. KATANA demonstrates that mapping Kalman Filters to these NPUs can drastically reduce power consumption by up to 97.9% and achieve high frame rates (e.g., 408.73 FPS for LKF), freeing your CPU and GPU for other critical tasks and extending mission duration or operational range.
Key insights
KATANA enables efficient Kalman Filter execution on edge NPUs, achieving real-time tracking with significant power savings.
Principles
- Edge systems demand low power and real-time processing.
- NPUs can offload Kalman Filters from CPUs/GPUs.
- Algebraic graph rewrites optimize NPU utilization.
Method
KATANA applies subtract-to-add reformulation (H_neg), static-shape tensor fusion, and block-diagonal batched parallelization to map LKF/EKF operations entirely onto the NPU matrix engine.
In practice
- Deploy Kalman Filters on Intel Core Ultra NPUs.
- Reduce dynamic energy for multi-object tracking.
- Free CPU/GPU for primary workloads.
Topics
- Kalman Filters
- Edge NPUs
- Real-time Tracking
- AI-PC SoCs
- Power Efficiency
- Multi-Object Tracking
Best for: Computer Vision Engineer, Research Scientist, AI Hardware Engineer, Robotics Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.