Interpretable Difficulty-Aware Knowledge Tracing in Tutor-Student Dialogues

2026-03-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, long

Summary

Researchers from the University of Massachusetts Amherst propose an interpretable difficulty-aware conversational Knowledge Tracing (KT) framework for AI-powered tutoring systems. This framework, built upon large language models (LLMs), explicitly models both student abilities and the difficulty of tutor-posed tasks at each turn in a dialogue. It integrates Item Response Theory (IRT) to map LLM outputs into student ability and question difficulty parameters, enabling interpretable predictions grounded in cognitive learning theories. The system uses a knowledge estimator to extract student knowledge states and a difficulty estimator to compute task difficulty, both leveraging LLMs. Evaluated on two tutor-student dialogue datasets, QATD2k and MathDial, the framework, using Llama-3.1-8B-Instruct, quantitatively and qualitatively outperforms existing KT baselines, including LLMKT, DKT, DKVMN, SAINT, AKT, and simpleKT, while generating interpretable outputs consistent with cognitive theory.

Key takeaway

For AI scientists and NLP engineers developing intelligent tutoring systems, this framework offers a robust approach to enhance both the accuracy and interpretability of student performance predictions. By explicitly modeling task difficulty alongside student ability using IRT, you can build systems that provide more personalized and transparent pedagogical support. Consider integrating this difficulty-aware, IRT-based approach to improve trust and effectiveness in your LLM-powered tutoring applications.

Key insights

Integrating Item Response Theory with LLMs enhances knowledge tracing interpretability and accuracy in tutoring dialogues.

Principles

Student performance depends on knowledge and task difficulty.
Opaque LLM representations hinder interpretability.
IRT provides a cognitive theory-aligned prediction layer.

Method

The framework uses LLMs for knowledge and difficulty estimation, then feeds these into an IRT-based predictor. It fine-tunes the LLM using binary cross-entropy loss on observed correctness labels.

In practice

Use LLM logits to derive scalar ability and difficulty estimates.
Fine-tune LLMs on dialogue data for educational domain adaptation.
Employ a 1PL IRT model for interpretable correctness prediction.

Topics

Knowledge Tracing
Large Language Models
Item Response Theory
Tutor-Student Dialogues
Student Ability Modeling

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.