AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

2026-03-04 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Advanced, quick

Summary

AILS-NTUA's three-stage system won SemEval 2026 Task 12 for Abductive Event Reasoning, achieving a first-place rank with an accuracy score of 0.95. The system integrates graph-based retrieval, LLM-driven abductive reasoning enhanced by reflective prompt evolution, and post-hoc consistency enforcement. Beyond the winning solution, the authors conducted a cross-model error analysis across 14 models from 7 families. This analysis revealed three shared inductive biases: "causal chain incompleteness," "proximate cause preference," and "salience bias." Their cross-family convergence, marked by a 51% cause-count reduction, points to systematic rather than model-specific failure modes in multi-label causal reasoning.

Key takeaway

A three-stage system combining graph-based retrieval and LLM-driven reflective prompting achieved 0.95 accuracy, ranking first in SemEval 2026 Task 12 for Abductive Event Reasoning. Error analysis across 14 models revealed systematic inductive biases—causal chain incompleteness, proximate cause preference, and salience bias—converging to reduce cause-counts by 51%. This identifies shared failure modes in multi-label causal reasoning, critical for developing robust abductive AI.

Topics

Abductive Event Reasoning
Large Language Models
Prompt Engineering
Graph-based Retrieval
Causal Reasoning Biases

Best for: Research Scientist, AI Researcher, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.