KATANA: A Fast, Low-Power Mapping of Kalman Filters onto Edge NPUs for Real-Time Tracking

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

KATANA is an NPU-aware optimization framework designed for mapping Linear and Extended Kalman Filters (LKF, EKF) onto commercial Neural Processing Units found in contemporary AI-PC SoCs, such as the Intel Core Ultra Series 1 and 2. This framework addresses the critical need for real-time, low-power state estimation in edge deployments like radar surveillance, counter-UAV defense, autonomous driving, and robotics, where traditional CPU execution serializes multi-object tracking updates and custom accelerators are costly. KATANA employs three algebraic graph rewrites: subtract-to-add reformulation via a precomputed negative-projection matrix H_neg, static-shape tensor fusion, and block-diagonal batched parallelization, ensuring 100% of operations execute on the NPU's matrix engine. On the Series 2, the optimized batched EKF achieves 223.35 FPS at 13.43 W, and the LKF reaches 408.73 FPS at 14.05 W, delivering up to a 97.9% reduction in dynamic energy versus CPU implementations.

Key takeaway

For Robotics Engineers or AI Hardware Engineers developing real-time tracking systems on edge platforms, you should consider existing NPU capabilities on AI-PC SoCs like Intel Core Ultra Series 1 and 2. KATANA demonstrates that mapping Kalman Filters to these NPUs can drastically reduce power consumption by up to 97.9% and achieve high frame rates (e.g., 408.73 FPS for LKF), freeing your CPU and GPU for other critical tasks and extending mission duration or operational range.

Key insights

KATANA enables efficient Kalman Filter execution on edge NPUs, achieving real-time tracking with significant power savings.

Principles

Method

KATANA applies subtract-to-add reformulation (H_neg), static-shape tensor fusion, and block-diagonal batched parallelization to map LKF/EKF operations entirely onto the NPU matrix engine.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Hardware Engineer, Robotics Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.