TimeLAVA: Learning-Agnostic Data Valuation for Time Series
Summary
TimeLAVA is a novel learning-agnostic framework designed for data valuation in time series, addressing the critical need for principled data curation, quality control, and robust learning in domains like healthcare, finance, and industrial monitoring. Unlike existing model-dependent or i.i.d.-focused approaches, TimeLAVA quantifies the intrinsic quality of temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. Its core is a Selective Wavelet-based Wasserstein discrepancy, which combines multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness. Segment values are efficiently computed via sensitivity analysis, bypassing model training, and aggregated into point-wise scores. Experiments show TimeLAVA produces significantly more informative value scores than existing methods across anomaly detection, data pruning, and label noise detection on diverse real-world datasets.
Key takeaway
For data scientists and ML engineers working with critical time series data, TimeLAVA offers a robust solution to improve data quality and model reliability. If you are struggling with model-dependent valuation methods or i.i.d. assumptions, consider integrating TimeLAVA to identify valuable data segments, prune noisy data, or detect label errors. This approach can significantly enhance your model's generalization capabilities and reduce the need for extensive model retraining.
Key insights
TimeLAVA provides a learning-agnostic, robust framework for valuing time series data by quantifying distributional discrepancy.
Principles
- Effective time series data valuation must capture temporal dependencies and non-stationary dynamics.
- Learning-agnostic valuation improves generalizability across models.
- Data valuation can be theoretically linked to model-agnostic generalization.
Method
TimeLAVA values temporal segments by their marginal contribution to minimizing distributional discrepancy using a Selective Wavelet-based Wasserstein discrepancy, then aggregates these into point-wise scores via sensitivity analysis without model training.
In practice
- Apply TimeLAVA to enhance anomaly detection in sequential data.
- Utilize TimeLAVA for efficient data pruning in large time series datasets.
- Employ TimeLAVA for identifying and mitigating label noise in time series.
Topics
- TimeLAVA
- Data Valuation
- Time Series Analysis
- Wavelet Transforms
- Wasserstein Distance
- Anomaly Detection
- Data Pruning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.