A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Advanced, extended

Summary

Researchers from the University at Buffalo, Nanyang Technological University, and The University of Texas at Austin developed a scalable computational tool to classify manner and result verbs in sentence context, a distinction crucial for developmental language research. They addressed the lack of large annotated resources by using GPT-4o with linguistically informed prompts to generate sentence-level annotations from MASC and InterCorp datasets, expanding coverage from 151 to 436 VerbNet classes. A RoBERTa-based classifier was then trained on these annotations, achieving an average accuracy of up to 89.6% across three held-out gold-standard datasets, including a new expert-annotated set. The study highlights that semantic properties of verb roots are more critical for this classification than sentence structure, positioning the tool as a valuable resource for analyzing verb semantics in large language corpora.

Key takeaway

For NLP engineers working on fine-grained semantic analysis in developmental language research, this work provides a robust method for classifying manner and result verbs. You should consider adopting LLM-driven annotation pipelines to generate training data for similar tasks where expert-annotated resources are limited. This approach can enable richer, corpus-based analyses of early verb learning and language development, moving beyond coarse lexical measures.

Key insights

A RoBERTa classifier, trained on LLM-generated annotations, accurately distinguishes manner and result verbs at scale.

Principles

Method

The method involves using GPT-4o with linguistically informed prompts (semantic properties, sentence structure) to annotate manner and result verbs in MASC and InterCorp datasets, then training a RoBERTa-based classifier on this data.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.