A Sociolinguistic Analysis of Automatic Speech Recognition Bias in Newcastle English

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, quick

Summary

A study investigated Automatic Speech Recognition (ASR) bias in Newcastle English, a regional dialect from North-East England, using spontaneous speech from the Diachronic Electronic Corpus of Tyneside English (DECTE). Researchers evaluated a commercial ASR system, analyzing over 3,000 transcription errors classified by linguistic domain and social variables like gender, age, and socioeconomic status. An acoustic case study also examined how phonetic variation contributes to misrecognition. Results showed that phonological variation caused most errors, specifically dialect-specific vowel quality, glottalisation, local vocabulary, and non-standard grammar. Error rates were higher for men and speakers at the extremes of the age spectrum, indicating socially patterned ASR errors.

Key takeaway

For NLP Engineers developing or evaluating ASR systems, understanding that errors are socially patterned and linked to dialectal variation is crucial. You should prioritize incorporating sociolinguistic expertise into system development and evaluation, ensuring training data explicitly addresses regional accents and community-based speech to build more equitable and accurate ASR technologies.

Key insights

ASR errors are not random but socially patterned, driven by dialectal variation and demographic factors.

Principles

Method

Evaluated a commercial ASR system on spontaneous dialectal speech, classifying over 3,000 transcription errors by linguistic domain and social variables, complemented by an acoustic case study.

In practice

Topics

Best for: NLP Engineer, Research Scientist, AI Researcher, AI Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.