STMutants: A Mutation Testing Dataset for Structured Text Programs in Industrial Automation
Summary
The paper introduces STMutants, the first publicly available mutation testing dataset for IEC 61131-3 Structured Text (ST) programs, which are critical for real-time, safety-critical industrial automation. This dataset addresses a significant gap in PLC software testing research by providing a reproducible benchmark. STMutants comprises 110 generated first-order mutants from 11 ST programs, collected from the OSCAT basic library and industrial sources, with 108 retained after observability and equivalence screening. The dataset covers seven mutation operator categories adapted for the PLC domain. A four-phase methodology, including fault-type profiling, syntactic transformation, compilability verification, and manual equivalence screening (κ=0.87), ensures mutant quality. A baseline evaluation using three large language models (GPT-5.2, Gemini 2.5, Claude Sonnet 4.5) showed mutation detection accuracies of 86.1%, 94.4%, and 86.1% respectively, confirming significant performance differences and highlighting limitations in temporal reasoning for complex programs.
Key takeaway
For Machine Learning Engineers developing AI-assisted quality assurance tools for industrial automation, you should integrate STMutants to benchmark your models' performance on IEC 61131-3 Structured Text. Be aware that current LLMs struggle with temporal reasoning in complex PLC programs, achieving only 10% accuracy on some, so focus development on improving stateful, multi-cycle fault propagation analysis. This dataset offers a crucial baseline for advancing dependable PLC software verification.
Key insights
STMutants provides the first public mutation testing benchmark for IEC 61131-3 Structured Text, enabling reproducible research and LLM evaluation for PLC software.
Principles
- Competent Programmer Hypothesis: Real faults are small deviations.
- Coupling Effect: Simple fault detection implies complex fault detection.
- Sufficient Mutation Operators: Reduced sets can achieve high effectiveness.
Method
Mutants are constructed via a four-phase process: fault-type profiling/operator selection, syntactic transformation, compilability verification, and manual equivalence screening with strong inter-rater agreement (κ=0.87).
In practice
- Evaluate automated test generation techniques for ST programs.
- Benchmark AI models and hybrid analysis systems for PLC software.
- Research fault localization methods using known fault locations.
Topics
- Mutation Testing
- Structured Text
- Programmable Logic Controllers
- Industrial Automation
- Large Language Models
- Test Suite Generation
- Fault Localization
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.