Evolution of Log-Based Detection Rules in Public Repositories

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Cybersecurity & Data Privacy, Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The study presents the first longitudinal analysis of log-based detection rule evolution in public repositories, specifically the community-driven Sigma project and the curated Splunk Security Content (SSC). Researchers introduced a predicate graph intermediate representation (PGIR) and a tree alignment procedure to compare rule versions based on detection logic, not surface syntax. Applying this method to 6,859 rule histories from 2017–2026, they found approximately 56% of rules undergo at least one detection logic revision. Evolution is predominantly non-monotonic, with over half of rules both adding and removing clauses over time. Recurring reversions indicate that changes are often revisited rather than strictly accumulated. Combining structural analysis with LLM-based inference revealed that roughly a quarter to a third of rules alternate between expanding coverage and reducing false positives, reflecting ongoing operational trade-offs rather than steady convergence.

Key takeaway

For AI Security Engineers maintaining host-based intrusion detection rules, recognize that rule evolution is rarely a linear path to stability. Your rules will likely oscillate between expanding coverage and reducing false positives, often revisiting prior logic. Implement tools that use canonical representations like PGIR to track semantic changes, not just syntax. This approach helps identify unresolved design tensions and supports more informed, data-driven rule refinement processes.

Key insights

Log-based detection rules evolve non-monotonically, balancing coverage and false positives through iterative, often conflicting, revisions.

Principles

Method

A predicate graph intermediate representation (PGIR) canonicalizes rule logic, enabling semantic comparison via a four-phase tree alignment algorithm and LLM-based classification of operational intent.

In practice

Topics

Code references

Best for: AI Scientist, AI Security Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.