BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation

2021-01-07 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

BLINKG is a new benchmark designed to evaluate Large Language Models (LLMs) for Knowledge Graph (KG) generation, specifically focusing on their ability to map heterogeneous data sources to ontology terms. This benchmark addresses the lack of standardized frameworks for assessing LLM effectiveness in KG construction, a task that can traditionally require six person-months of manual effort. BLINKG includes three progressively complex scenarios (Basic, Schema-aligned, Schema-distant) based on real-world use cases, supporting CSV, JSON, and XML inputs. An extensive evaluation of six state-of-the-art LLMs (DeepSeek-R1, Gemini 2.5 Pro, GPT-4o, OpenAI o3, LLaMa 3.3 70B Instruct, Mixtral 8x22B Instruct) using BLINKG shows promising solutions in simple scenarios but limited performance in complex cases, particularly for join conditions and transformation functions. The benchmark also defines requirements for (semi)automated LLM-driven KG construction.

Key takeaway

For knowledge engineers aiming to automate Knowledge Graph construction, you should recognize that current LLMs offer promising solutions for basic data-to-ontology mapping tasks. However, for complex scenarios involving schema-distant data or intricate join conditions, expect significant limitations. You will likely need to integrate LLMs within a human-in-the-loop workflow, combining their capabilities with symbolic reasoning or expert validation to ensure robust and semantically sound KGs.

Key insights

BLINKG benchmarks LLM capabilities in mapping diverse data to ontologies for Knowledge Graph generation.

Principles

LLMs excel in simple schema-aligned mapping tasks.
Complex tasks like join conditions challenge LLM reasoning.
Structured prompting improves LLM output consistency.

Method

BLINKG evaluates LLMs across three scenarios (Basic, Schema-aligned, Schema-distant) using Precision, Recall, and F-score, enhanced by Levenshtein and cosine similarity checks against gold standards.

In practice

Use BLINKG to compare LLM mapping solutions.
Apply post-processing to LLM outputs for better scores.
Consider hybrid LLM+symbolic reasoning for complex KGC.

Topics

Knowledge Graph Generation
LLM Benchmarking
Ontology Mapping
Data-to-Knowledge Conversion
Semantic Alignment
Data Transformation

Code references

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.