Analytics for Quality Assurance for Item Pools (AQuAP): Monitoring and Maintaining Item Bank Health in AI-Driven Assessment Systems

· Source: cs.SE updates on arXiv.org · Field: Education & Learning — Educational Technology (EdTech), Academic Research & Higher Education, Psychometrics & Assessment · Depth: Advanced, extended

Summary

Analytics for Quality Assurance for Item Pools (AQuAP) is a dashboard environment developed by Duolingo to continuously monitor the quality and health of item banks in large-scale, AI-driven educational assessment systems like the Duolingo English Test (DET). AQuAP integrates with the Item Factory framework, which automates item generation and human-in-the-loop review processes. It consolidates psychometric, operational, and editorial metrics, including the central Effective Bank Size (EBS), which quantifies unique test sessions before content repetition. AQuAP extends this with Adjusted Effective Bank Size (AEBS) and Effective Bank Use (EBU) to account for uneven item utilization, alongside metrics like maximum exposure, maximum conditional exposure, and the rarely-administered fraction. Currently hosted in Sigma Computing, AQuAP is migrating to Dash for enhanced visualization and alerting, aiming to become a predictive decision-support system for assessment quality management.

Key takeaway

For MLOps Engineers or Psychometricians managing high-stakes, AI-driven assessment systems, implementing continuous item bank monitoring is critical. You should adopt a system like AQuAP to track metrics such as Effective Bank Size (EBS) and Adjusted Effective Bank Size (AEBS) to ensure content security and diversity. This proactive approach helps you identify item depletion, overexposure, or underutilization, enabling timely content generation or algorithmic adjustments to maintain test validity and fairness.

Key insights

AQuAP provides a comprehensive, continuous monitoring system for AI-driven assessment item bank health using advanced psychometric and operational analytics.

Principles

Method

AQuAP continuously ingests item performance, exposure, and reviewer activity data, presenting it in dashboards. Experts evaluate anomalies and make strategic decisions, supported by simulations for metric control.

In practice

Topics

Best for: AI Scientist, Research Scientist, Data Scientist, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.