Enhancing Science Classroom Discourse Analysis through Joint Multi-Task Learning for Reasoning-Component Classification

2026-04-24 · Source: cs.CL updates on arXiv.org · Field: Education & Learning — Educational Technology (EdTech), Academic Research & Higher Education, Educational Psychology & Learning Sciences · Depth: Expert, extended

Summary

An automated discourse analysis system (ADAS) has been developed to classify teacher and student utterances in science classrooms, focusing on Utterance Type (UT) and Reasoning Component (RC). The system employs a dual-probe head RoBERTa-base classifier, addressing severe label imbalance through stratified corpus re-splitting and LLM-based synthetic data augmentation. The RC coding scheme was revised from six to four classes (ER, SR-D, SR-I, O) to improve learnability and reduce ambiguity. While LLM augmentation significantly improved UT minority-class recognition, achieving a macro-F1 of 0.635, a TF-IDF + Logistic Regression baseline surprisingly outperformed transformer models on RC classification (macro-F1 of 0.574), suggesting lexical separability for RC. Discourse pattern analyses revealed that teacher "Feedback-with-Question" (Fq) moves consistently precede student inferential reasoning (SR-I), and a "Prompt" (P) followed by "Prompt" (P) framing yields the lowest reasoning engagement. The study also identified a positional annotation bias in human-labeled data regarding end-of-lesson cognitive complexity.

Key takeaway

For NLP Engineers developing educational analytics tools, consider that while LLM-based augmentation significantly boosts performance for complex, context-dependent classifications like Utterance Type, simpler lexical models might suffice or even excel for tasks with strong lexical signals, such as Reasoning Component classification. You should prioritize expanding labeled corpora for transformer models to fully leverage their capabilities in RC tasks and be aware of potential human annotation biases when interpreting temporal discourse patterns.

Key insights

Automated discourse analysis using LLM augmentation improves utterance classification and reveals effective teaching patterns.

Principles

Lexical features can suffice for certain classification tasks.
Teacher "Feedback-with-Question" (Fq) promotes inferential reasoning.
Positional bias can influence human annotation outcomes.

Method

A dual-probe head RoBERTa-base classifier is trained with LLM-generated synthetic data and focal loss on a revised 4-class reasoning component scheme, using session-level splitting to prevent data leakage.

In practice

Use LLM augmentation for minority class data imbalance.
Prioritize "Feedback-with-Question" (Fq) in teaching.
Avoid repetitive "Prompt" (P) moves in instruction.

Topics

Automated Discourse Analysis
Multi-Task Learning
Reasoning Component Classification
LLM Data Augmentation
RoBERTa-base Model

Best for: NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.