GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, medium

Summary

GRACE (Gradient-aligned Reasoning dAta Curation for Efficient post-training) is a novel method for curating reasoning data that scores individual steps within a reasoning trace, rather than whole samples. Existing methods treat all intermediate steps as equally valuable, leading to inefficient training on low-value steps. GRACE addresses this by viewing each reasoning trace as a sequence of optimization events, scoring each step based on its alignment with the answer-oriented gradient direction and its consistency with the preceding reasoning trajectory. These step-level scores are aggregated into a sample-level utility score for subset selection. To ensure scalability, GRACE employs a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE achieved 108.8% of full-data performance with 20% of the data and 100.2% with only 5%, demonstrating effective transfer across model backbones.

Key takeaway

For AI Engineers and Research Scientists working with large reasoning datasets, GRACE offers a significant improvement in post-training efficiency. You should consider implementing GRACE to select high-value data subsets, as it can achieve superior or equivalent model performance with substantially less training data (e.g., 5-20% of the original dataset). This approach reduces computational costs and training time without sacrificing model quality, making your resource allocation more effective.

Key insights

GRACE curates reasoning data by scoring individual steps based on gradient alignment and trajectory consistency for efficient post-training.

Principles

Method

GRACE assigns utility scores to individual reasoning steps using answer-oriented gradient alignment and trajectory consistency, then aggregates these into a sample-level score for subset selection, employing a representation-level gradient proxy for scalability.

In practice

Topics

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.