From Database to Triple Store with SHACL

· Source: The Ontologist · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, medium

Summary

The article introduces Shaclify, an open-source, AI-driven project that uses SHACL (Shapes Constraint Language) as a bridge to migrate data from SQL databases, specifically SQL Server, into knowledge graphs like Jena-Fuseki. The process involves exporting a SQL database's Data Definition Language (DDL) and tables as CSV files. An LLM, such as Claude, then generates a SHACL 1.2 schema from the DDL, mapping data types, constraints, and foreign key associations to IRIs. This SHACL file is subsequently used to generate TARQL transformations, which convert the CSV data into RDF Triples in Turtle format. A test HR database with 10 tables and 210 records was converted, generating 2,221 triples with 99.1% SHACL compliance, demonstrating efficient handling of NULL values, foreign keys, and datatype casting.

Key takeaway

For AI Engineers or Data Scientists tasked with migrating relational database data to a knowledge graph, consider adopting the SHACL-driven approach demonstrated by Shaclify. This method significantly reduces the manual effort and time typically associated with schema mapping and data transformation, leveraging LLMs for initial SHACL generation and TARQL for efficient, scalable CSV-to-RDF conversion. You can then use the generated SHACL to create SPARQL queries against your new knowledge graph.

Key insights

SHACL serves as an effective schema abstraction layer for AI-driven data migration from SQL to knowledge graphs.

Principles

Method

Export SQL DDL and data as CSV. Use an LLM to generate SHACL from DDL. Generate TARQL transformations from SHACL. Run TARQL on CSVs to produce RDF/Turtle. Load RDF into a triple store.

In practice

Topics

Code references

Best for: AI Engineer, Data Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Ontologist.