FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction
Summary
FregeLogic, a hybrid neuro-symbolic system, was developed for SemEval-2026 Task 11 (Subtask 1) to predict syllogistic validity while minimizing content effects. The system integrates an ensemble of five Large Language Model (LLM) classifiers, including Llama 4 Maverick, Llama 4 Scout, and Qwen3-32B, with a Z3 SMT solver. The LLM ensemble uses varied prompting strategies, and the Z3 solver acts as a formal logic tiebreaker when LLMs disagree, indicating potential content-biased errors. Evaluated using nested 5-fold cross-validation on a dataset of 960 instances, FregeLogic achieved 94.3% accuracy, a content effect of 2.85, and a combined score of 41.88. This performance represents a 2.76-point improvement in combined score and a 16% reduction in content effect compared to a pure LLM ensemble.
Key takeaway
For research scientists developing robust logical reasoning systems, FregeLogic demonstrates that integrating formal methods like SMT solvers can significantly reduce content effects and improve overall accuracy. You should consider implementing targeted neuro-symbolic approaches, especially by deferring to formal verification when LLM ensembles show disagreement, to enhance the reliability of your models in tasks requiring logical judgment.
Key insights
Hybrid neuro-symbolic systems can enhance syllogistic validity prediction by mitigating content bias.
Principles
- LLM disagreement signals content-biased errors.
- Formal methods improve accuracy where ensemble consensus is low.
Method
FregeLogic combines an LLM ensemble with a Z3 SMT solver. The solver resolves cases where LLMs disagree, leveraging structured-output API calls for robust formal verification using Aristotelian encoding and existence axioms.
In practice
- Use Z3 SMT solver for formal verification.
- Employ structured-output API calls for solver integration.
Topics
- FregeLogic
- Neuro-Symbolic Architecture
- Syllogistic Validity Prediction
- SemEval-2026 Task 11
- Large Language Models
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.