Interpretable Difficulty-Aware Knowledge Tracing in Tutor-Student Dialogues

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, long

Summary

Researchers from the University of Massachusetts Amherst propose an interpretable difficulty-aware conversational Knowledge Tracing (KT) framework for AI-powered tutoring systems. This framework, built upon large language models (LLMs), explicitly models both student abilities and the difficulty of tutor-posed tasks at each turn in a dialogue. It integrates Item Response Theory (IRT) to map LLM outputs into student ability and question difficulty parameters, enabling interpretable predictions grounded in cognitive learning theories. The system uses a knowledge estimator to extract student knowledge states and a difficulty estimator to compute task difficulty, both leveraging LLMs. Evaluated on two tutor-student dialogue datasets, QATD2k and MathDial, the framework, using Llama-3.1-8B-Instruct, quantitatively and qualitatively outperforms existing KT baselines, including LLMKT, DKT, DKVMN, SAINT, AKT, and simpleKT, while generating interpretable outputs consistent with cognitive theory.

Key takeaway

For AI scientists and NLP engineers developing intelligent tutoring systems, this framework offers a robust approach to enhance both the accuracy and interpretability of student performance predictions. By explicitly modeling task difficulty alongside student ability using IRT, you can build systems that provide more personalized and transparent pedagogical support. Consider integrating this difficulty-aware, IRT-based approach to improve trust and effectiveness in your LLM-powered tutoring applications.

Key insights

Integrating Item Response Theory with LLMs enhances knowledge tracing interpretability and accuracy in tutoring dialogues.

Principles

Method

The framework uses LLMs for knowledge and difficulty estimation, then feeds these into an IRT-based predictor. It fine-tunes the LLM using binary cross-entropy loss on observed correctness labels.

In practice

Topics

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.