OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

OrderDP is a novel, plug-and-play dynamic data pruning framework designed to overcome limitations in existing data pruning strategies, which often lead to biased gradient estimation and unclear performance impacts. This framework ensures stable, unbiased, and near-lossless training acceleration through theoretical guarantees. OrderDP operates by first randomly selecting a data subset and then identifying the top-$q$ samples, establishing unbiasedness with respect to a surrogate loss function. The framework includes comprehensive convergence and generalization analyses, detailing its effect on optimal performance and enabling controlled acceleration while maintaining guaranteed final performance. Empirical evaluations on datasets like CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate OrderDP's competitive accuracy, stable convergence, and precise control, achieving over 40% reduction in training costs with a simpler design and faster runtime.

Key takeaway

For Machine Learning Engineers or AI Scientists facing heavy training burdens, OrderDP offers a theoretically guaranteed method to significantly accelerate model training. You can reduce training costs by over 40% on datasets like ImageNet-1K while ensuring competitive accuracy and stable convergence. Consider integrating this plug-and-play framework into your deep learning pipelines to achieve faster iteration cycles and more efficient resource utilization without compromising performance.

Key insights

OrderDP provides theoretically guaranteed, unbiased, and near-lossless data pruning for accelerated model training.

Principles

Method

OrderDP randomly selects a data subset, then identifies the top-$q$ samples to ensure unbiased training with respect to a surrogate objective, providing theoretical guarantees.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.