Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A new data selection framework, Mixture Optimization via Scaling-Aware Iterative Collection (MOSAIC), has been developed to improve training data efficiency for large-scale deep learning models in physical AI applications. MOSAIC addresses the challenge of ambiguous data point effects on various evaluation metrics by partitioning datasets into domains, fitting neural scaling laws from each domain to evaluation metrics, and iteratively optimizing data mixtures. Applied to autonomous driving (AD), MOSAIC was used to train an End-to-End (E2E) planner model, which was evaluated using the Extended Predictive Driver Model Score (EPDMS), an aggregate of driving rule compliance metrics. This framework achieved superior performance on EPDMS, utilizing up to 80% less data compared to diverse baselines.

Key takeaway

For research scientists developing large-scale deep learning models for physical AI, adopting the MOSAIC framework can significantly enhance data efficiency. By strategically selecting data based on its impact on specific evaluation metrics, you can achieve better model performance, such as improved driving rule compliance in autonomous driving, while substantially reducing the required training data volume. Consider integrating MOSAIC's iterative optimization to streamline your data collection and model training processes.

Key insights

MOSAIC optimizes data selection for physical AI by iteratively adding data from domains that maximize metric changes.

Principles

Method

MOSAIC partitions data, fits neural scaling laws from domains to evaluation metrics, then iteratively adds data from domains that maximize metric changes to optimize the data mixture.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.