What Makes Words Hard? Sakura at BEA 2026 Shared Task on Vocabulary Difficulty Prediction

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Researchers from RIKEN, The University of Osaka, and other institutions describe two models for vocabulary difficulty prediction, developed for the BEA 2026 Shared Task. Their high-accuracy black-box model, which fine-tuned a Large Language Model (LLM) using a soft-target loss function, achieved a Pearson's correlation coefficient (PCC) $r>0.91$ and secured the top result in the open track. An explainable model, built with features like L1 similarity and spelling difficulty, also performed strongly with $r>0.77$, outperforming a fine-tuned encoder baseline in the closed track. The study further analyzes the British Council's Knowledge-based Vocabulary Lists (KVL) data, revealing that test item construction, including spelling difficulty and L1 equivalent choices, significantly influences reported difficulty scores beyond inherent word production difficulty. The code for these models is publicly available.

Key takeaway

For research scientists developing educational NLP tools, this work demonstrates that fine-tuning LLMs and MLMs with soft-target cross-entropy loss offers superior performance for continuous value prediction tasks like vocabulary difficulty. You should consider implementing this soft-target approach to enhance model accuracy, especially when working with nuanced, continuous data. Additionally, recognize that external factors like spelling and test item design can significantly skew perceived vocabulary difficulty, necessitating careful feature engineering or dataset scrutiny.

Key insights

Soft-target loss fine-tuning significantly improves LLM and MLM performance in continuous value prediction tasks.

Principles

Method

Fine-tune LLMs/MLMs using cross-entropy loss with soft targets, where continuous values are expressed as probability-weighted sums of nearest discrete points for improved precision over discretization or MSE loss.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.