Presupposition and Reasoning in Conditionals: A Theory-Based Study of Humans and LLMs

2026-05-18 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A study compared human judgments and Large Language Model (LLM) predictions on presupposition projection in conditional sentences, a key area in theories of meaning and pragmatics. Researchers collected likelihood ratings from 120 human participants and four LLMs using a normed dataset designed to control the relationship between the antecedent and the projected presupposition. The results indicate that humans integrate both probabilistic and pragmatic cues in their judgments, while LLMs exhibit varying degrees of alignment with these human patterns. Further evaluation using a linguistically motivated checklist within an "LLM-as-a-Judge" framework revealed that models best matching human ratings often lacked coherent pragmatic reasoning, whereas models demonstrating stronger reasoning produced less human-like judgments. These findings suggest that LLM performance on such tasks might stem from surface pattern matching rather than genuine pragmatic competence.

Key takeaway

For research scientists developing or evaluating LLMs for complex linguistic tasks, you should prioritize benchmarks grounded in linguistic theory. Your evaluations must go beyond simple accuracy metrics to probe the underlying reasoning capabilities, as models matching human judgments on surface patterns may still lack true pragmatic competence. This approach helps distinguish genuine understanding from mere pattern matching.

Key insights

LLMs often match human linguistic judgments via surface patterns, not deep pragmatic reasoning.

Principles

Human judgment integrates probabilistic and pragmatic cues.
Linguistic theory-grounded benchmarks are crucial.

Method

A parallel behavioral study compared human and LLM likelihood ratings on normed conditional sentences, followed by an "LLM-as-a-Judge" evaluation using a linguistic checklist to assess reasoning.

In practice

Design benchmarks with linguistic theory.
Evaluate LLMs beyond surface-level accuracy.

Topics

Presupposition Projection
Conditional Sentences
Large Language Models
Pragmatic Reasoning
Human-LLM Comparison

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.