Selecting Samples on Graphs: A Unified Dataset Pruning Framework for Lossless Training Acceleration

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

A new unified graph-based dataset pruning framework addresses the computational cost of large training datasets by modeling the dataset as a weighted graph. This approach unifies existing pruning criteria, which typically rely on either intrinsic (independent sample value) or extrinsic (pairwise relations for diversity) signals, by encoding intrinsic value in node weights and extrinsic value in edge weights. The problem is then cast as a Maximum Weight Clique Problem (MWCP), which, despite being NP-hard, admits a principled greedy solution with a formal approximation guarantee under mild conditions. This method significantly outperforms existing dataset pruning techniques, demonstrating a reduction in training time by over 40% without sacrificing accuracy on ImageNet-1k with ResNet-50.

Key takeaway

For Machine Learning Engineers optimizing training workflows on large datasets, this unified graph-based dataset pruning framework offers a robust solution. You can significantly reduce training time by over 40% on benchmarks like ImageNet-1k with ResNet-50, without sacrificing model accuracy. Consider integrating this approach to enhance computational efficiency and accelerate model development, especially when dealing with resource constraints.

Key insights

A unified graph-based framework prunes datasets by modeling samples as nodes and relations as edges, solving a Maximum Weight Clique Problem.

Principles

Method

Model dataset as a weighted graph where node weights are intrinsic value and edge weights are extrinsic value. Solve the Maximum Weight Clique Problem using a greedy approach.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.