What Makes Words Hard? Sakura at BEA 2026 Shared Task on Vocabulary Difficulty Prediction

2025-12-11 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Researchers from RIKEN, The University of Osaka, and other institutions describe two models for vocabulary difficulty prediction, developed for the BEA 2026 Shared Task. Their high-accuracy black-box model, which fine-tuned a Large Language Model (LLM) using a soft-target loss function, achieved a Pearson's correlation coefficient (PCC) $r>0.91$ and secured the top result in the open track. An explainable model, built with features like L1 similarity and spelling difficulty, also performed strongly with $r>0.77$, outperforming a fine-tuned encoder baseline in the closed track. The study further analyzes the British Council's Knowledge-based Vocabulary Lists (KVL) data, revealing that test item construction, including spelling difficulty and L1 equivalent choices, significantly influences reported difficulty scores beyond inherent word production difficulty. The code for these models is publicly available.

Key takeaway

For research scientists developing educational NLP tools, this work demonstrates that fine-tuning LLMs and MLMs with soft-target cross-entropy loss offers superior performance for continuous value prediction tasks like vocabulary difficulty. You should consider implementing this soft-target approach to enhance model accuracy, especially when working with nuanced, continuous data. Additionally, recognize that external factors like spelling and test item design can significantly skew perceived vocabulary difficulty, necessitating careful feature engineering or dataset scrutiny.

Key insights

Soft-target loss fine-tuning significantly improves LLM and MLM performance in continuous value prediction tasks.

Principles

Model size, not architecture, primarily determines performance in vocabulary difficulty prediction.
Vocabulary difficulty scores are influenced by spelling and test item construction, not just word knowledge.

Method

Fine-tune LLMs/MLMs using cross-entropy loss with soft targets, where continuous values are expressed as probability-weighted sums of nearest discrete points for improved precision over discretization or MSE loss.

In practice

Use soft-target loss for LLM/MLM fine-tuning in continuous prediction tasks.
Consider spelling difficulty and L1 context when designing vocabulary assessments.
Employ SHAP for feature importance analysis in explainable models.

Topics

Vocabulary Difficulty Prediction
Large Language Models
Soft-Target Loss Function
Explainable AI
British Council KVL

Code references

adno/vocabulary-difficulty

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.