A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project
Summary
The ILIAD project, focused on creating interoperable Digital Twins of the Ocean, requires semantic data harmonisation of heterogeneous environmental data into the Ocean Information Model (OIM). Traditional approaches like RML and OTTR proved too complex for ILIAD data scientists due to their reliance on specialized syntaxes, tooling, and deep knowledge of Semantic Web standards. To address this, a Pythonic functional approach was developed, enabling data scientists to produce correct RDF through simple Python function calls. These functions are organized into low-level (exposing OWL/RDF syntax), mid-level (encapsulating ontology design patterns), and high-level (orchestrating domain-specific tasks) abstractions. A Jinja-based code generation pipeline automates the creation of thousands of low-level functions for QUDT units and quantity kinds, significantly reducing manual effort and ensuring consistency. This approach has been successfully applied in the ILIAD Aquaculture pilot to integrate and analyze environmental datasets for salmon-lice exposure detection.
Key takeaway
For AI Scientists working on semantic data integration in Python-centric environments, consider adopting a functional, template-based approach to abstract away complex Semantic Web standards. This method allows you to define ontology design patterns as composable Python functions, significantly reducing the learning curve and integrating seamlessly with existing workflows. You should explore automated code generation for repetitive low-level functions to maintain scalability and consistency with evolving vocabularies.
Key insights
A Pythonic functional approach simplifies semantic data harmonisation by abstracting complex Semantic Web standards into familiar Python functions.
Principles
- Encode ontology templates as executable Python functions.
- Organize functions into abstraction layers for different users.
- Automate low-level function generation from authoritative vocabularies.
Method
Develop Python libraries with low-level (RDF/OWL syntax), mid-level (ontology patterns), and high-level (domain-specific orchestration) functions. Use Jinja templates and SPARQL queries to auto-generate repetitive low-level functions.
In practice
- Integrate semantic harmonisation directly into Python workflows.
- Use auto-generated functions for QUDT units to reduce manual coding.
- Separate concerns: ontology engineers build low/mid-level, data scientists build high-level.
Topics
- Semantic Data Harmonisation
- ILIAD Project
- Ocean Information Model
- Pythonic Functional Programming
- Ontology Design Patterns
Code references
- SINTEF/iliad-pythonic-harmonisation-demo
- qudt/qudt-public-repo
- RMLio/rmlmapper-java
- ILIAD-ocean-twin/OIM
- RDFLib/rdflib
Best for: AI Scientist, Data Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.