Can AI Refute Economic Theory? Evidence from Beyond the Knowledge Cutoff
Summary
Experiments investigated whether artificial intelligence (AI) can refute economic theory by testing several AI models, including Gemini, Refine, Claude, and ChatGPT, on their ability to identify errors in four published economic theory papers. ChatGPT Pro demonstrated the strongest performance, occasionally generating counterexamples and corrected proofs. Despite this, no AI model successfully located a genuine error without substantial human guidance, and the presence of data contamination complicates the interpretation of results. The author concludes that while a competent human collaborating with a frontier AI model could potentially outperform existing peer review processes, AI models are not yet capable of independently refuting economic theory.
Key takeaway
For research scientists evaluating AI for critical analysis or peer review, recognize that current frontier models like ChatGPT Pro require significant human guidance to identify genuine errors in complex theoretical work. Your focus should be on utilizing AI to construct counterexamples or refine proofs, rather than expecting independent error detection. Consider integrating AI as a powerful assistant within a human-led review process to enhance efficiency and accuracy, acknowledging data contamination risks.
Key insights
AI models cannot independently refute economic theory, but human-AI collaboration shows promise for peer review.
Principles
- AI models struggle with true error detection without human aid.
- Data contamination complicates AI performance evaluation.
- Human-AI teams may exceed current peer review standards.
Method
AI models (Gemini, Refine, Claude, ChatGPT) were tasked with checking four published economic theory papers for errors, with performance assessed on counterexample generation and proof correction.
In practice
- Pair frontier AI with human experts for complex review.
- Scrutinize AI outputs for data contamination effects.
- Focus AI on generating counterexamples, not initial error finding.
Topics
- Artificial Intelligence
- Economic Theory
- Peer Review
- ChatGPT Pro
- Model Evaluation
- Data Contamination
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.