Cluster-Aware Dual-Level Test Specification Generation for Large-Scale Automotive Software Requirements
Summary
A novel "Cluster-then-Summarize" pipeline has been developed to automate test specification generation for large-scale automotive software requirements, addressing the manual effort and scalability issues in meeting Automotive SPICE SWE.6 and ISO 26262 standards. This three-stage pipeline first embeds requirements using all-MiniLM-L6-v2 sentence transformers, then groups them via UMAP dimensionality reduction and HDBSCAN clustering with an adaptive "min_cluster_size" selection based on Silhouette and Calinski–Harabasz scores. Next, a multi-level map-reduce summarization algorithm, using a batch size of 10 and merge factor of 3, distills each cluster into concise, domain-conformant descriptions, preserving quantitative thresholds and safety integrity levels. Finally, it generates dual-level test specifications—individual requirement verification and cluster-level integration tests—by leveraging cluster topology, nearby-cluster context, and RAG grounded in industry standards. Evaluation across seven automotive datasets demonstrates improved integration test coverage, enhanced summarization fidelity (ROUGE-L 0.3793, BERTScore 0.8908), and higher test quality, with 89.59% overall faithfulness.
Key takeaway
For Machine Learning Engineers tasked with automating ASPICE SWE.6 compliance for large automotive software requirement sets, you should consider implementing a cluster-aware dual-level test generation pipeline. This approach significantly improves integration test coverage and summarization fidelity by leveraging semantic clustering and multi-level summarization. It effectively addresses the scalability challenges of manual processes, ensuring robust and traceable test specifications for safety-critical systems.
Key insights
Clustering requirements before LLM summarization and dual-level test generation significantly improves coverage and fidelity for large-scale automotive software.
Principles
- Cluster topology can drive multi-level test generation.
- Adaptive clustering parameters enhance scalability.
- Map-reduce summarization maintains content fidelity.
Method
Embed requirements (all-MiniLM-L6-v2), reduce dimensions (UMAP), cluster (HDBSCAN with auto "min_cluster_size"), then apply multi-level map-reduce summarization. Generate individual and cluster-level tests with RAG and nearby-cluster context.
In practice
- Use UMAP+HDBSCAN for scalable requirement clustering.
- Implement map-reduce for LLM summarization of large text sets.
- Inject cluster context for improved LLM test generation.
Topics
- Automotive SPICE SWE.6
- Large Language Models
- Requirements Engineering
- HDBSCAN Clustering
- Test Specification Generation
- Retrieval-Augmented Generation
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.