The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage

2026-06-15 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

The BD-LSC and ST-WSD datasets have been introduced to improve benchmarking for lexical semantic change (LSC) detection, particularly for bi-directional shifts and words with both slang and standard meanings. The Bi-Directional Lexical Semantic Change (BD-LSC) dataset tracks sense gain, loss, and stability across three time periods, enabling the study of complex semantic trajectories. Complementarily, the SlangTrack Word Sense Disambiguation (ST-WSD) dataset offers fine-grained, instance-level sense annotations for words combining slang and standard usages. Evaluations across unsupervised clustering, supervised machine learning, transformer-based models, and large language models revealed that few-shot GPT-4o achieved the strongest aggregate performance on Exact Sense Match (ESM) and multi-label accuracy. However, Macro-F1 scores near 0.5 across all systems highlight that rare slang senses remain a significant open challenge.

Key takeaway

For NLP Engineers developing semantic change detection models, these new BD-LSC and ST-WSD datasets offer critical benchmarks for bi-directional and slang-inclusive LSC. You should focus research efforts on improving Macro-F1 scores, as current models, even GPT-4o, struggle significantly with rare slang senses. Integrating these datasets into your evaluation pipeline will reveal model weaknesses in complex semantic shifts.

Key insights

New benchmarks address bi-directional lexical semantic change and slang usage, revealing challenges with rare slang senses.

Principles

LSC detection must capture sense gain, loss, and stability.
Slang and standard word usages complicate LSC analysis.
Instance-level sense annotations are crucial for robust benchmarking.

Method

The BD-LSC dataset captures bi-directional semantic change, while ST-WSD provides instance-level sense annotations for slang and standard usage. Models are systematically evaluated across diverse methodological families.

In practice

Utilize BD-LSC to evaluate models on complex semantic trajectories.
Employ ST-WSD for fine-grained slang/standard WSD benchmarking.
Prioritize improving Macro-F1 scores for rare slang sense detection.

Topics

Lexical Semantic Change
Slang Detection
Word Sense Disambiguation
Benchmarking Datasets
Large Language Models
GPT-4o

Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.