STMutants: A Mutation Testing Dataset for Structured Text Programs in Industrial Automation

2026-06-05 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

The paper introduces STMutants, the first publicly available mutation testing dataset for IEC 61131-3 Structured Text (ST) programs, which are critical for real-time, safety-critical industrial automation. This dataset addresses a significant gap in PLC software testing research by providing a reproducible benchmark. STMutants comprises 110 generated first-order mutants from 11 ST programs, collected from the OSCAT basic library and industrial sources, with 108 retained after observability and equivalence screening. The dataset covers seven mutation operator categories adapted for the PLC domain. A four-phase methodology, including fault-type profiling, syntactic transformation, compilability verification, and manual equivalence screening (κ=0.87), ensures mutant quality. A baseline evaluation using three large language models (GPT-5.2, Gemini 2.5, Claude Sonnet 4.5) showed mutation detection accuracies of 86.1%, 94.4%, and 86.1% respectively, confirming significant performance differences and highlighting limitations in temporal reasoning for complex programs.

Key takeaway

For Machine Learning Engineers developing AI-assisted quality assurance tools for industrial automation, you should integrate STMutants to benchmark your models' performance on IEC 61131-3 Structured Text. Be aware that current LLMs struggle with temporal reasoning in complex PLC programs, achieving only 10% accuracy on some, so focus development on improving stateful, multi-cycle fault propagation analysis. This dataset offers a crucial baseline for advancing dependable PLC software verification.

Key insights

STMutants provides the first public mutation testing benchmark for IEC 61131-3 Structured Text, enabling reproducible research and LLM evaluation for PLC software.

Principles

Competent Programmer Hypothesis: Real faults are small deviations.
Coupling Effect: Simple fault detection implies complex fault detection.
Sufficient Mutation Operators: Reduced sets can achieve high effectiveness.

Method

Mutants are constructed via a four-phase process: fault-type profiling/operator selection, syntactic transformation, compilability verification, and manual equivalence screening with strong inter-rater agreement (κ=0.87).

In practice

Evaluate automated test generation techniques for ST programs.
Benchmark AI models and hybrid analysis systems for PLC software.
Research fault localization methods using known fault locations.

Topics

Mutation Testing
Structured Text
Programmable Logic Controllers
Industrial Automation
Large Language Models
Test Suite Generation
Fault Localization

Code references

nucleron/matiec

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.