TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

TRACE, or Trajectory Risk-Aware Compression for Long-Horizon Agent Safety, addresses the challenge of detecting safety risks in long-horizon LLM agent trajectories. Existing turn-level or short-context detectors often fail to retain and aggregate sparse, delayed, and compositional risk signals over extended contexts. TRACE introduces a Compressor-Reader design where a Compressor encodes the full trajectory into a compact latent evidence state, supervised at the trajectory level. A Reader then uses this latent state as a safety reference to judge the raw trajectory. This approach effectively aggregates dispersed risk cues and minimizes premature evidence loss. TRACE achieved the best accuracy across ASSEBench, Pre-Ex-Bench, and R-Judge benchmarks, outperforming strong baselines by up to 12.6 percentage points. It also demonstrated reduced performance degradation on LongSafety with increasing context length, with code available at https://github.com/Peregrine123/TRACE_official.

Key takeaway

For AI Security Engineers developing long-horizon LLM agents, you should consider integrating a trajectory-level evidence compression system like TRACE. This approach significantly improves the detection of sparse and compositional safety risks that evade traditional short-context methods. Implementing a Compressor-Reader architecture can enhance your agent's ability to aggregate dispersed risk cues. This also reduces performance degradation as context length grows, ensuring more robust safety monitoring.

Key insights

TRACE uses a Compressor-Reader design to compress long LLM agent trajectories into a latent evidence state for improved safety detection.

Principles

Method

TRACE employs a Compressor to encode full trajectories into a compact latent evidence state under trajectory-level supervision. A Reader then judges the raw trajectory using this latent state as a safety reference.

In practice

Topics

Code references

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.