GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

GRACE (Gradient-aligned Reasoning Data Curation for Efficient Post-training) is a novel method for curating reasoning data that scores individual steps within a reasoning trace, rather than entire samples. It evaluates each step based on its alignment with the answer-oriented gradient direction and its consistency with the preceding reasoning trajectory. These step-level scores are then aggregated to a sample-level value for subset selection, relying solely on the model's internal optimization signals without external reward models or step annotations. To ensure scalability, GRACE employs a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE achieved 108.8% of full-data performance with only 20% of the data, and 100.2% with just 5%, demonstrating effective transferability across different model backbones.

Key takeaway

For AI Engineers optimizing large language model post-training, GRACE offers a data-efficient approach to improve reasoning capabilities. By focusing on step-level gradient alignment, you can achieve comparable or superior performance with significantly smaller datasets (e.g., 5-20% of original data). Consider integrating GRACE's gradient-aligned curation to reduce computational costs and accelerate model development cycles, especially for models like Qwen3-VL-2B-Instruct.

Key insights

GRACE curates reasoning data by scoring individual steps based on gradient alignment and trajectory consistency.

Principles

Method

GRACE scores each reasoning step by its alignment with the answer-oriented gradient and consistency with the preceding trajectory, aggregating these into a sample-level score for data subset selection.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.