TimeLAVA: Learning-Agnostic Data Valuation for Time Series

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

TimeLAVA is a novel learning-agnostic framework designed for data valuation in time series, addressing the critical need for principled data curation, quality control, and robust learning in domains like healthcare, finance, and industrial monitoring. Unlike existing model-dependent or i.i.d.-focused approaches, TimeLAVA quantifies the intrinsic quality of temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. Its core is a Selective Wavelet-based Wasserstein discrepancy, which combines multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness. Segment values are efficiently computed via sensitivity analysis, bypassing model training, and aggregated into point-wise scores. Experiments show TimeLAVA produces significantly more informative value scores than existing methods across anomaly detection, data pruning, and label noise detection on diverse real-world datasets.

Key takeaway

For data scientists and ML engineers working with critical time series data, TimeLAVA offers a robust solution to improve data quality and model reliability. If you are struggling with model-dependent valuation methods or i.i.d. assumptions, consider integrating TimeLAVA to identify valuable data segments, prune noisy data, or detect label errors. This approach can significantly enhance your model's generalization capabilities and reduce the need for extensive model retraining.

Key insights

TimeLAVA provides a learning-agnostic, robust framework for valuing time series data by quantifying distributional discrepancy.

Principles

Method

TimeLAVA values temporal segments by their marginal contribution to minimizing distributional discrepancy using a Selective Wavelet-based Wasserstein discrepancy, then aggregates these into point-wise scores via sensitivity analysis without model training.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.