GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

GRACE is a novel gradient-aligned data curation method designed for efficient post-training of language models, specifically targeting reasoning data. Unlike existing pipelines that score entire samples, GRACE evaluates individual steps within a reasoning trace. It assigns scores based on each step's alignment with the answer-oriented gradient direction and its consistency with the preceding reasoning trajectory. These step-level scores are then aggregated to a sample-level value for subset selection, utilizing only the model's internal optimization signals without external reward models or step annotations. To ensure scalability, GRACE employs a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. When post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE achieved 108.8% of full-data performance using only 20% of the data, and 100.2% with just 5%, demonstrating effective transferability across different model backbones.

Key takeaway

For AI Engineers and Research Scientists optimizing large language models, GRACE offers a significant advantage by drastically reducing the amount of reasoning data needed for post-training. You can achieve superior or comparable performance with as little as 5-20% of the original dataset, which directly translates to lower computational costs and faster iteration cycles. Consider integrating GRACE into your data curation pipelines to improve efficiency and scalability.

Key insights

GRACE curates reasoning data by scoring individual steps based on gradient alignment and consistency, enabling efficient post-training.

Principles

Method

GRACE scores individual reasoning steps using gradient alignment and trajectory consistency, aggregating these into a sample-level value via a representation-level gradient proxy in a single forward pass for efficient subset selection.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.