The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage
The article announces the introduction of the Bi-Directional Lexical Semantic Change (BD-LSC) dataset and the SlangTrack Word Sense Disambiguation (ST-WSD) dataset, designed to benchmark models for detecting lexical semantic changes in slang and standard language usage. The BD-LSC dataset tracks sense gain, loss, and stability over three time periods, while the ST-WSD dataset offers detailed annotations for mixed usage words, facilitating comprehensive evaluation of various methodologies, including unsupervised clustering and transformer-based models. Notably, the few-shot GPT-4o model demonstrated superior performance in Exact Sense Match and multi-label accuracy, highlighting the ongoing challenge of accurately detecting rare slang senses.