The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings

2026-04-20 · Source: cs.AI updates on arXiv.org · Field: Education & Learning — Educational Technology (EdTech), Language Learning & Cultural Education · Depth: Expert, quick

Summary

A study by Susanto et al. investigates how different generations of Large Language Models (LLMs) influence the writing of secondary-level English as a Foreign Language (EFL) students. The research compares student compositions assisted by LLMs before and after ChatGPT's release, employing both expert qualitative scoring and quantitative metrics like readability tests, Pearson's correlation coefficient, and MTLD. Findings indicate that advanced LLMs enhance assessment scores and lexical diversity, particularly for lower-proficiency learners, but this improvement may mask their actual writing ability. Crucially, increased LLM assistance showed a negative correlation with human expert ratings, suggesting that while LLMs improve surface fluency, they do not foster deep coherence or genuine learning. The authors advocate for a pedagogical shift from evaluating output quality to verifying the learning process.

Key takeaway

For AI Scientists developing educational tools, you should prioritize integrating features that verify the learning process rather than solely optimizing for output quality. Focus on designing LLMs that provide ideational scaffolding, helping students develop their own ideas and critical thinking, rather than merely generating text. This approach ensures AI acts as a genuine learning scaffold, preventing it from becoming a compensatory crutch that masks true student ability and hinders deep learning.

Key insights

Advanced LLMs boost surface fluency in EFL writing but may hinder deep learning and mask true student ability.

Principles

LLM assistance correlates negatively with expert human ratings.
Pedagogy must verify learning process, not just output quality.

Method

The study analyzed EFL student compositions assisted by LLMs pre- and post-ChatGPT release, using expert qualitative scoring and quantitative metrics like readability and lexical diversity tests.

In practice

Differentiate LLM functions: ideational scaffolding vs. textual production.
Align AI use with a learner's Zone of Proximal Development.

Topics

Large Language Models
EFL Student Writing
ChatGPT
Writing Assessment
Pedagogical Strategies

Best for: AI Scientist, Research Scientist, AI Ethicist, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.