GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training
Summary
GRACE (Gradient-aligned Reasoning dAta Curation for Efficient post-training) is a novel method for curating reasoning data that scores individual steps within a reasoning trace, rather than whole samples. Existing methods treat all intermediate steps as equally valuable, leading to inefficient training on low-value steps. GRACE addresses this by viewing each reasoning trace as a sequence of optimization events, scoring each step based on its alignment with the answer-oriented gradient direction and its consistency with the preceding reasoning trajectory. These step-level scores are aggregated into a sample-level utility score for subset selection. To ensure scalability, GRACE employs a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE achieved 108.8% of full-data performance with 20% of the data and 100.2% with only 5%, demonstrating effective transfer across model backbones.
Key takeaway
For AI Engineers and Research Scientists working with large reasoning datasets, GRACE offers a significant improvement in post-training efficiency. You should consider implementing GRACE to select high-value data subsets, as it can achieve superior or equivalent model performance with substantially less training data (e.g., 5-20% of the original dataset). This approach reduces computational costs and training time without sacrificing model quality, making your resource allocation more effective.
Key insights
GRACE curates reasoning data by scoring individual steps based on gradient alignment and trajectory consistency for efficient post-training.
Principles
- Reasoning steps contribute unevenly to optimization.
- Step utility depends on alignment with target objective.
- Trajectory consistency is crucial for reasoning coherence.
Method
GRACE assigns utility scores to individual reasoning steps using answer-oriented gradient alignment and trajectory consistency, then aggregates these into a sample-level score for subset selection, employing a representation-level gradient proxy for scalability.
In practice
- Use GRACE for efficient reasoning data curation.
- Apply step-level scoring to optimize training data.
- Leverage gradient proxies for scalable data valuation.
Topics
- Reasoning Data Curation
- Gradient Alignment
- Post-training Efficiency
- Large Language Models
- Vision-Language Models
Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.