Survey Statistics: sampling to assess data quality
Summary
In 2010, a project with the Malawi Ministry of Health, involving 270,000 patients across 377 HIV antiretroviral clinics, addressed the challenge of paper-based patient treatment cards and clinic registers. Traditionally, a full census was conducted quarterly to identify and correct data errors. Hedt-Gauthier et al. (2012) proposed an alternative using Lot Quality Assurance Sampling (LQAS) to determine if a full census was necessary. This method involved sampling 76 treatment cards per clinic and applying a decision rule: if 3 or more disagreements (D >= 3) were found, a full census would be performed. In 19 clinic visits, the sampling-based rule took less than half the 28.5 hours required for a full census and caught over half the errors, demonstrating a more efficient approach to data quality assurance.
Key takeaway
For AI Scientists developing data quality protocols in resource-constrained environments, consider integrating Lot Quality Assurance Sampling (LQAS) as a primary method. This approach can significantly reduce the time and cost associated with data verification compared to full censuses, while still identifying critical error rates. Evaluate the specific costs and benefits of Type I and Type II errors to tailor LQAS decision rules effectively for your application.
Key insights
LQAS offers an efficient, cost-effective alternative to full censuses for data quality assurance.
Principles
- Decision rules should balance costs and benefits.
- Sampling can be more efficient than full censuses.
Method
Sample 'n' records, count disagreements 'D', and apply a decision rule (e.g., if D >= 3, conduct a full census) to determine if further action is needed.
In practice
- Implement LQAS for routine data quality checks.
- Use sampling to reduce audit time and resource use.
Topics
- Lot Quality Assurance Sampling
- Public Health Informatics
- Data Quality Management
- Statistical Decision Making
- Cost-Benefit Analysis
Best for: AI Scientist, Data Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Statistical Modeling, Causal Inference, and Social Science.