Modeling semantic association in self-paced reading with language model embeddings

· Source: cs.CL updates on arXiv.org · Field: Science & Research — Social Sciences & Behavioral Studies, Research Methodology & Innovation · Depth: Expert, extended

Summary

This study investigated how language model (LM) embeddings quantify semantic association in self-paced reading, analyzing its effects on N400 brain potentials and reading times. Researchers used a corpus of joint electroencephalography (EEG) and self-paced reading data from 56 participants reading natural Dutch texts from the Tilburg corpus (TiNT). Ten different implementations of semantic association were tested, varying embedding models (uncontextualized "wikipedia2vec_nlwiki_20180420_300d" and contextualized "e5-large-trm-nl" sentence embeddings) and context lengths. Bayesian hierarchical models and Bayes factors revealed that the choice of embedding model significantly alters the estimated effect of semantic association. Specifically, sentence embeddings demonstrated reliable effects on both neural and behavioral measures, unlike word embeddings, highlighting the critical role of methodological choices.

Key takeaway

For research scientists modeling language processing, carefully consider your choice of embedding model when quantifying semantic association. This study indicates that contextualized sentence embeddings, like "e5-large-trm-nl", are more effective than uncontextualized word embeddings for reliably predicting N400 and reading times in naturalistic text. You should prioritize sentence embeddings and explore varying context window definitions to accurately capture semantic effects beyond word predictability.

Key insights

Methodological choices, especially embedding model type, critically impact semantic association quantification in reading comprehension.

Principles

Method

Semantic association is quantified as cosine similarity between a critical word's embedding and its context's embedding, varying embedding models and context lengths.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.