From Database to Triple Store with SHACL
Summary
The article introduces Shaclify, an open-source, AI-driven project that uses SHACL (Shapes Constraint Language) as a bridge to migrate data from SQL databases, specifically SQL Server, into knowledge graphs like Jena-Fuseki. The process involves exporting a SQL database's Data Definition Language (DDL) and tables as CSV files. An LLM, such as Claude, then generates a SHACL 1.2 schema from the DDL, mapping data types, constraints, and foreign key associations to IRIs. This SHACL file is subsequently used to generate TARQL transformations, which convert the CSV data into RDF Triples in Turtle format. A test HR database with 10 tables and 210 records was converted, generating 2,221 triples with 99.1% SHACL compliance, demonstrating efficient handling of NULL values, foreign keys, and datatype casting.
Key takeaway
For AI Engineers or Data Scientists tasked with migrating relational database data to a knowledge graph, consider adopting the SHACL-driven approach demonstrated by Shaclify. This method significantly reduces the manual effort and time typically associated with schema mapping and data transformation, leveraging LLMs for initial SHACL generation and TARQL for efficient, scalable CSV-to-RDF conversion. You can then use the generated SHACL to create SPARQL queries against your new knowledge graph.
Key insights
SHACL serves as an effective schema abstraction layer for AI-driven data migration from SQL to knowledge graphs.
Principles
- SHACL acts as a universal schematic bridge.
- Generate SHACL once per DDL for schema consistency.
- AI can accelerate complex data integration tasks.
Method
Export SQL DDL and data as CSV. Use an LLM to generate SHACL from DDL. Generate TARQL transformations from SHACL. Run TARQL on CSVs to produce RDF/Turtle. Load RDF into a triple store.
In practice
- Use Shaclify for SQL to knowledge graph migration.
- Employ TARQL for CSV to RDF conversion.
- Generate SPARQL queries from SHACL with LLMs.
Topics
- SQL to Knowledge Graph Migration
- SHACL (Shapes Constraint Language)
- Large Language Models
- RDF Data Transformation
- Data Integration Automation
Code references
Best for: AI Engineer, Data Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Ontologist.