STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

STaR-KV, or Spatio-Temporal Adaptive Re-weighting, is a novel training-free KV cache compression framework designed for GUI Vision-Language Models (VLMs). It addresses the critical deployment bottleneck where KV caches grow linearly with interaction steps, exemplified by UI-TARS-1.5-7B consuming 76 GB of GPU memory on just five screenshots. Existing compression methods are limited by aggregating visual-token importance into a single saliency map and using a fixed top-B cutoff. STaR-KV refutes these assumptions by calibrating token importance along three axes: subspace-aware scoring using online spatial mutual information, a temporal stability discount for redundant cache entries, and an entropy-derived temperature for adaptive score distribution reshaping. This approach achieves the strongest average accuracy among state-of-the-art methods like GUIKV and SnapKV across four GUI benchmarks, with no compression-stage FLOPs overhead (-0.07%) and reducing peak GPU memory by nearly 40% at a 20% KV-cache budget.

Key takeaway

For Machine Learning Engineers deploying GUI Vision-Language Models, if you are struggling with high GPU memory consumption from KV caches, consider implementing STaR-KV. This training-free compression framework can cut peak GPU memory by nearly 40% at a 20% KV-cache budget while maintaining or improving accuracy over existing methods. You can apply its subspace-aware scoring and temporal stability discount to optimize VLM performance and enable deployment on mainstream 80 GB accelerators.

Key insights

GUI VLM KV cache compression benefits from spatio-temporal adaptive re-weighting, moving beyond fixed saliency and cutoffs.

Principles

Method

STaR-KV employs subspace-aware scoring via online spatial mutual information, a temporal stability discount, and an entropy-derived temperature to adaptively reshape KV cache score distributions.

In practice

Topics

Code references

Best for: MLOps Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.