Retrieval Augmented (Knowledge Graph), and Large Language Model-Driven Design Structure Matrix (DSM) Generation of Cyber-Physical Systems
Summary
This research explores the use of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Graph-based RAG (GraphRAG) for automating the generation of Design Structure Matrices (DSMs). The study evaluates these methods across two distinct use cases: a power screwdriver and a CubeSat, both with established architectural references. Performance is measured on two key tasks: determining relationships between predefined components and the more complex challenge of identifying components and their subsequent relationships. The evaluation uses cell-level metrics like accuracy, precision, recall, and F1-score, alongside global graph-based metrics such as edit distance and spectral distance. The findings indicate that model architecture and careful prompt design often influence performance more significantly than model size alone, with specific RAG and GraphRAG configurations showing notable gains. All code is publicly available for reproducibility and expert feedback.
Key takeaway
For AI Scientists developing automated system architecture tools, consider that model architecture and precise prompt engineering are often more impactful than sheer model size for DSM generation. Focus on carefully curating reference documents for RAG and GraphRAG, as simply adding more data does not guarantee improved accuracy. Prioritize models like mixtral:8x22b for physical component interactions and llama3.3:70b for abstract system-level relationships to optimize performance and reduce computational overhead.
Key insights
LLMs, especially with RAG and GraphRAG, can automate Design Structure Matrix generation for complex systems.
Principles
- Model architecture can outweigh parameter count in relationship classification.
- Careful prompt design is critical for LLM-based architectural generation.
- Aggregating all references does not consistently improve RAG performance.
Method
The method involves a three-step procedure: preparing references and design configuration, processing with LLM/RAG/GraphRAG variants, and analyzing results using cell-level and graph-based metrics. It includes semi-automated document classification and prompt engineering.
In practice
- Use mixtral:8x22b for spatial reasoning tasks in DSM generation.
- Employ llama3.3:70b for high-level, abstract whole-part relationships.
- Tune RAG reference selection to avoid irrelevant or conflicting information.
Topics
- Large Language Models
- Retrieval-Augmented Generation
- Knowledge Graphs
- Design Structure Matrix
- Cyber-Physical Systems
Code references
Best for: AI Scientist, AI Researcher, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.