The Self-Correction Illusion: LLMs Correct Others but Not Themselves

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

A study reveals that Large Language Models (LLMs) exhibit a "Self-Correction Illusion," where they struggle to correct errors in their own reasoning but show significantly higher correction rates for identical claims presented under external chat-template roles. This asymmetry is identified as a chat-template artifact, not a cognitive deficit. Experiments across 13 model-domain cells, involving seven model families and three domains, demonstrated that relabeling a byte-identical erroneous claim from the agent's own assistant role to an external role (e.g., user, tool, or system) boosted explicit-correction rates by 23 to 93 percentage points, with 10 of 13 cells achieving p<0.001. The proposed "source-conditioned role relabeling" is a prompt-structure-only intervention requiring no training or model modification. Its effectiveness varies by domain, with system roles dominating math tasks and user messages excelling in logical deduction. While the effect is asymmetric, preventing easy error injection, this safety can be overridden by specific trust-framing instructions.

Key takeaway

For Machine Learning Engineers deploying LLM agents, you should integrate source-conditioned role relabeling into your prompt structures to significantly boost self-correction. By re-presenting an agent's internal erroneous claim as an external message, such as from a "system" or "user" role, you can achieve 23-93 percentage point increases in error detection without model retraining. However, be aware that a single trust-framing instruction can override this safety, making careful prompt design crucial.

Key insights

LLMs' self-correction failure is a chat-template artifact, not a cognitive deficit, due to addressability.

Principles

Method

Source-conditioned role relabeling appends a byte-identical erroneous claim under an external chat-template role (user, tool, system) with an audit instruction, without altering the claim's content.

In practice

Topics

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.