What Makes Words Hard? Sakura at BEA 2026 Shared Task on Vocabulary Difficulty Prediction
Summary
Researchers from RIKEN, The University of Osaka, and other institutions describe two models for vocabulary difficulty prediction, developed for the BEA 2026 Shared Task. Their high-accuracy black-box model, which fine-tuned a Large Language Model (LLM) using a soft-target loss function, achieved a Pearson's correlation coefficient (PCC) $r>0.91$ and secured the top result in the open track. An explainable model, built with features like L1 similarity and spelling difficulty, also performed strongly with $r>0.77$, outperforming a fine-tuned encoder baseline in the closed track. The study further analyzes the British Council's Knowledge-based Vocabulary Lists (KVL) data, revealing that test item construction, including spelling difficulty and L1 equivalent choices, significantly influences reported difficulty scores beyond inherent word production difficulty. The code for these models is publicly available.
Key takeaway
For research scientists developing educational NLP tools, this work demonstrates that fine-tuning LLMs and MLMs with soft-target cross-entropy loss offers superior performance for continuous value prediction tasks like vocabulary difficulty. You should consider implementing this soft-target approach to enhance model accuracy, especially when working with nuanced, continuous data. Additionally, recognize that external factors like spelling and test item design can significantly skew perceived vocabulary difficulty, necessitating careful feature engineering or dataset scrutiny.
Key insights
Soft-target loss fine-tuning significantly improves LLM and MLM performance in continuous value prediction tasks.
Principles
- Model size, not architecture, primarily determines performance in vocabulary difficulty prediction.
- Vocabulary difficulty scores are influenced by spelling and test item construction, not just word knowledge.
Method
Fine-tune LLMs/MLMs using cross-entropy loss with soft targets, where continuous values are expressed as probability-weighted sums of nearest discrete points for improved precision over discretization or MSE loss.
In practice
- Use soft-target loss for LLM/MLM fine-tuning in continuous prediction tasks.
- Consider spelling difficulty and L1 context when designing vocabulary assessments.
- Employ SHAP for feature importance analysis in explainable models.
Topics
- Vocabulary Difficulty Prediction
- Large Language Models
- Soft-Target Loss Function
- Explainable AI
- British Council KVL
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.