Evolution of Log-Based Detection Rules in Public Repositories
Summary
The study presents the first longitudinal analysis of log-based detection rule evolution in public repositories, specifically the community-driven Sigma project and the curated Splunk Security Content (SSC). Researchers introduced a predicate graph intermediate representation (PGIR) and a tree alignment procedure to compare rule versions based on detection logic, not surface syntax. Applying this method to 6,859 rule histories from 2017–2026, they found approximately 56% of rules undergo at least one detection logic revision. Evolution is predominantly non-monotonic, with over half of rules both adding and removing clauses over time. Recurring reversions indicate that changes are often revisited rather than strictly accumulated. Combining structural analysis with LLM-based inference revealed that roughly a quarter to a third of rules alternate between expanding coverage and reducing false positives, reflecting ongoing operational trade-offs rather than steady convergence.
Key takeaway
For AI Security Engineers maintaining host-based intrusion detection rules, recognize that rule evolution is rarely a linear path to stability. Your rules will likely oscillate between expanding coverage and reducing false positives, often revisiting prior logic. Implement tools that use canonical representations like PGIR to track semantic changes, not just syntax. This approach helps identify unresolved design tensions and supports more informed, data-driven rule refinement processes.
Key insights
Log-based detection rules evolve non-monotonically, balancing coverage and false positives through iterative, often conflicting, revisions.
Principles
- Rule evolution reflects ongoing operational trade-offs.
- Structural changes are often coordinated, not isolated.
- Non-monotonic evolution is the dominant pattern.
Method
A predicate graph intermediate representation (PGIR) canonicalizes rule logic, enabling semantic comparison via a four-phase tree alignment algorithm and LLM-based classification of operational intent.
In practice
- Use PGIR for semantic rule comparison.
- Implement LLM-based intent classification.
- Develop tooling for canonical rule representations.
Topics
- Log-Based Detection
- Security Operations Centers
- Rule Evolution Analysis
- Predicate Graph IR
- Sigma Project
- Splunk Security Content
- LLM Intent Inference
Code references
Best for: AI Scientist, AI Security Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.