Survey Statistics: sampling to assess data quality

· Source: Statistical Modeling, Causal Inference, and Social Science · Field: Science & Research — Mathematics & Computational Sciences, Health & Medical Research, Research Methodology & Innovation · Depth: Intermediate, quick

Summary

In 2010, a project with the Malawi Ministry of Health, involving 270,000 patients across 377 HIV antiretroviral clinics, addressed the challenge of paper-based patient treatment cards and clinic registers. Traditionally, a full census was conducted quarterly to identify and correct data errors. Hedt-Gauthier et al. (2012) proposed an alternative using Lot Quality Assurance Sampling (LQAS) to determine if a full census was necessary. This method involved sampling 76 treatment cards per clinic and applying a decision rule: if 3 or more disagreements (D >= 3) were found, a full census would be performed. In 19 clinic visits, the sampling-based rule took less than half the 28.5 hours required for a full census and caught over half the errors, demonstrating a more efficient approach to data quality assurance.

Key takeaway

For AI Scientists developing data quality protocols in resource-constrained environments, consider integrating Lot Quality Assurance Sampling (LQAS) as a primary method. This approach can significantly reduce the time and cost associated with data verification compared to full censuses, while still identifying critical error rates. Evaluate the specific costs and benefits of Type I and Type II errors to tailor LQAS decision rules effectively for your application.

Key insights

LQAS offers an efficient, cost-effective alternative to full censuses for data quality assurance.

Principles

Method

Sample 'n' records, count disagreements 'D', and apply a decision rule (e.g., if D >= 3, conduct a full census) to determine if further action is needed.

In practice

Topics

Best for: AI Scientist, Data Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Statistical Modeling, Causal Inference, and Social Science.