KATANA: A Fast, Low-Power Mapping of Kalman Filters onto Edge NPUs for Real-Time Tracking

2026-06-12 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

KATANA is an NPU-aware optimization framework designed for mapping Linear and Extended Kalman Filters (LKF, EKF) onto commercial Neural Processing Units found in contemporary AI-PC SoCs, such as the Intel Core Ultra Series 1 and 2. This framework addresses the critical need for real-time, low-power state estimation in edge deployments like radar surveillance, counter-UAV defense, autonomous driving, and robotics, where traditional CPU execution serializes multi-object tracking updates and custom accelerators are costly. KATANA employs three algebraic graph rewrites: subtract-to-add reformulation via a precomputed negative-projection matrix H_neg, static-shape tensor fusion, and block-diagonal batched parallelization, ensuring 100% of operations execute on the NPU's matrix engine. On the Series 2, the optimized batched EKF achieves 223.35 FPS at 13.43 W, and the LKF reaches 408.73 FPS at 14.05 W, delivering up to a 97.9% reduction in dynamic energy versus CPU implementations.

Key takeaway

For Robotics Engineers or AI Hardware Engineers developing real-time tracking systems on edge platforms, you should consider existing NPU capabilities on AI-PC SoCs like Intel Core Ultra Series 1 and 2. KATANA demonstrates that mapping Kalman Filters to these NPUs can drastically reduce power consumption by up to 97.9% and achieve high frame rates (e.g., 408.73 FPS for LKF), freeing your CPU and GPU for other critical tasks and extending mission duration or operational range.

Key insights

KATANA enables efficient Kalman Filter execution on edge NPUs, achieving real-time tracking with significant power savings.

Principles

Edge systems demand low power and real-time processing.
NPUs can offload Kalman Filters from CPUs/GPUs.
Algebraic graph rewrites optimize NPU utilization.

Method

KATANA applies subtract-to-add reformulation (H_neg), static-shape tensor fusion, and block-diagonal batched parallelization to map LKF/EKF operations entirely onto the NPU matrix engine.

In practice

Deploy Kalman Filters on Intel Core Ultra NPUs.
Reduce dynamic energy for multi-object tracking.
Free CPU/GPU for primary workloads.

Topics

Kalman Filters
Edge NPUs
Real-time Tracking
AI-PC SoCs
Power Efficiency
Multi-Object Tracking

Best for: Computer Vision Engineer, Research Scientist, AI Hardware Engineer, Robotics Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.