GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training
Summary
GRACE is a novel gradient-aligned data curation method designed for efficient post-training of language models, specifically targeting reasoning data. Unlike existing pipelines that score entire samples, GRACE evaluates individual steps within a reasoning trace. It assigns scores based on each step's alignment with the answer-oriented gradient direction and its consistency with the preceding reasoning trajectory. These step-level scores are then aggregated to a sample-level value for subset selection, utilizing only the model's internal optimization signals without external reward models or step annotations. To ensure scalability, GRACE employs a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. When post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE achieved 108.8% of full-data performance using only 20% of the data, and 100.2% with just 5%, demonstrating effective transferability across different model backbones.
Key takeaway
For AI Engineers and Research Scientists optimizing large language models, GRACE offers a significant advantage by drastically reducing the amount of reasoning data needed for post-training. You can achieve superior or comparable performance with as little as 5-20% of the original dataset, which directly translates to lower computational costs and faster iteration cycles. Consider integrating GRACE into your data curation pipelines to improve efficiency and scalability.
Key insights
GRACE curates reasoning data by scoring individual steps based on gradient alignment and consistency, enabling efficient post-training.
Principles
- Reasoning steps contribute unevenly to overall trace value.
- Internal model signals can guide data curation effectively.
Method
GRACE scores individual reasoning steps using gradient alignment and trajectory consistency, aggregating these into a sample-level value via a representation-level gradient proxy in a single forward pass for efficient subset selection.
In practice
- Use GRACE for efficient reasoning data curation.
- Apply GRACE to reduce training data while maintaining performance.
Topics
- GRACE Method
- Reasoning Data Curation
- Gradient Alignment
- Step-level Scoring
- Representation-level Gradient Proxy
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.