Presupposition and Reasoning in Conditionals: A Theory-Based Study of Humans and LLMs
Summary
A study compared human judgments and Large Language Model (LLM) predictions on presupposition projection in conditional sentences, a key area in theories of meaning and pragmatics. Researchers collected likelihood ratings from 120 human participants and four LLMs using a normed dataset designed to control the relationship between the antecedent and the projected presupposition. The results indicate that humans integrate both probabilistic and pragmatic cues in their judgments, while LLMs exhibit varying degrees of alignment with these human patterns. Further evaluation using a linguistically motivated checklist within an "LLM-as-a-Judge" framework revealed that models best matching human ratings often lacked coherent pragmatic reasoning, whereas models demonstrating stronger reasoning produced less human-like judgments. These findings suggest that LLM performance on such tasks might stem from surface pattern matching rather than genuine pragmatic competence.
Key takeaway
For research scientists developing or evaluating LLMs for complex linguistic tasks, you should prioritize benchmarks grounded in linguistic theory. Your evaluations must go beyond simple accuracy metrics to probe the underlying reasoning capabilities, as models matching human judgments on surface patterns may still lack true pragmatic competence. This approach helps distinguish genuine understanding from mere pattern matching.
Key insights
LLMs often match human linguistic judgments via surface patterns, not deep pragmatic reasoning.
Principles
- Human judgment integrates probabilistic and pragmatic cues.
- Linguistic theory-grounded benchmarks are crucial.
Method
A parallel behavioral study compared human and LLM likelihood ratings on normed conditional sentences, followed by an "LLM-as-a-Judge" evaluation using a linguistic checklist to assess reasoning.
In practice
- Design benchmarks with linguistic theory.
- Evaluate LLMs beyond surface-level accuracy.
Topics
- Presupposition Projection
- Conditional Sentences
- Large Language Models
- Pragmatic Reasoning
- Human-LLM Comparison
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.