GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training
Summary
GRACE (Gradient-aligned Reasoning Data Curation for Efficient Post-training) is a novel method for curating reasoning data that scores individual steps within a reasoning trace, rather than entire samples. It evaluates each step based on its alignment with the answer-oriented gradient direction and its consistency with the preceding reasoning trajectory. These step-level scores are then aggregated to a sample-level value for subset selection, relying solely on the model's internal optimization signals without external reward models or step annotations. To ensure scalability, GRACE employs a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE achieved 108.8% of full-data performance with only 20% of the data, and 100.2% with just 5%, demonstrating effective transferability across different model backbones.
Key takeaway
For AI Engineers optimizing large language model post-training, GRACE offers a data-efficient approach to improve reasoning capabilities. By focusing on step-level gradient alignment, you can achieve comparable or superior performance with significantly smaller datasets (e.g., 5-20% of original data). Consider integrating GRACE's gradient-aligned curation to reduce computational costs and accelerate model development cycles, especially for models like Qwen3-VL-2B-Instruct.
Key insights
GRACE curates reasoning data by scoring individual steps based on gradient alignment and trajectory consistency.
Principles
- Individual reasoning steps have uneven value.
- Internal optimization signals suffice for data curation.
Method
GRACE scores each reasoning step by its alignment with the answer-oriented gradient and consistency with the preceding trajectory, aggregating these into a sample-level score for data subset selection.
In practice
- Use gradient-aligned scoring for data selection.
- Estimate step-level alignment via a gradient proxy.
Topics
- GRACE Method
- Gradient-aligned Data Curation
- Reasoning Data Selection
- Post-training Efficiency
- Large Language Models
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.