FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction

2026-04-20 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

FregeLogic, a hybrid neuro-symbolic system, was developed for SemEval-2026 Task 11 (Subtask 1) to predict syllogistic validity while minimizing content effects. The system integrates an ensemble of five Large Language Model (LLM) classifiers, including Llama 4 Maverick, Llama 4 Scout, and Qwen3-32B, with a Z3 SMT solver. The LLM ensemble uses varied prompting strategies, and the Z3 solver acts as a formal logic tiebreaker when LLMs disagree, indicating potential content-biased errors. Evaluated using nested 5-fold cross-validation on a dataset of 960 instances, FregeLogic achieved 94.3% accuracy, a content effect of 2.85, and a combined score of 41.88. This performance represents a 2.76-point improvement in combined score and a 16% reduction in content effect compared to a pure LLM ensemble.

Key takeaway

For research scientists developing robust logical reasoning systems, FregeLogic demonstrates that integrating formal methods like SMT solvers can significantly reduce content effects and improve overall accuracy. You should consider implementing targeted neuro-symbolic approaches, especially by deferring to formal verification when LLM ensembles show disagreement, to enhance the reliability of your models in tasks requiring logical judgment.

Key insights

Hybrid neuro-symbolic systems can enhance syllogistic validity prediction by mitigating content bias.

Principles

LLM disagreement signals content-biased errors.
Formal methods improve accuracy where ensemble consensus is low.

Method

FregeLogic combines an LLM ensemble with a Z3 SMT solver. The solver resolves cases where LLMs disagree, leveraging structured-output API calls for robust formal verification using Aristotelian encoding and existence axioms.

In practice

Use Z3 SMT solver for formal verification.
Employ structured-output API calls for solver integration.

Topics

FregeLogic
Neuro-Symbolic Architecture
Syllogistic Validity Prediction
SemEval-2026 Task 11
Large Language Models

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.