Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Science & Research — Health & Medical Research, Research Methodology & Innovation, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A novel multi-LLM pipeline for ontology-constrained literature synthesis in predictive coding neuroscience utilizes a council of ten local language models to score 31 studies against a 36-concept glossary. This glossary defines three hypotheses: predictive suppression, feedforward error propagation, and ubiquity, evaluated across local and global oddball contexts. The pipeline extracts evidence, incorporates figure descriptions via a vision-language model, and validates outputs. Results indicate higher agreement for predictive suppression (Mean = 0.46) and feedforward error propagation (Mean = 0.51) in local oddball contexts, but weaker support for ubiquity (Mean = 0.23) and generally lower scores in global oddball contexts. A new "hypothesis-space temperature" metric quantifies dispersion, showing greater variability in global oddball contexts (0.00348) compared to local oddball contexts (0.00114). This framework provides auditable disagreement measurements, mapping heterogeneous literatures into quantitative evidence spaces.

Key takeaway

For research scientists synthesizing fragmented interdisciplinary literature, you should consider implementing a local multi-LLM council with an expert-guided ontology. This framework provides auditable, quantitative mapping of hypothesis support, revealing structured agreement and disagreement across studies. It allows you to measure inter-model variability and track literature dynamics over time, offering a powerful tool for cumulative theory-building beyond traditional meta-analysis limitations.

Key insights

A multi-LLM council with an expert-guided ontology can quantitatively map scientific literature's support for hypotheses, revealing structured agreement and disagreement.

Principles

Multi-LLM councils quantify disagreement as an analytical tool.
Expert-guided ontologies and instruction layers constrain LLM outputs.
Hypothesis-space temperature measures theoretical consensus or volatility.

Method

The pipeline reads papers, extracts evidence including figure descriptions, assembles ontology-constrained prompts, and validates outputs against an expert glossary using a council of local LLMs.

In practice

Use a multi-LLM council to measure inter-model variability.
Define a precise glossary for LLM-assisted literature review.
Map literature into a quantitative hypothesis space.

Topics

Large Language Models
Literature Synthesis
Predictive Coding Neuroscience
Ontology Engineering
Multi-Agent Systems
Computational Neuroscience

Code references

HNXJ/mllm-public

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.