GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

2026-05-13 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

GRACE (Gradient-aligned Reasoning Data Curation for Efficient Post-training) is a novel method for curating reasoning data that scores individual steps within a reasoning trace, rather than entire samples. It evaluates each step based on its alignment with the answer-oriented gradient direction and its consistency with the preceding reasoning trajectory. These step-level scores are then aggregated to a sample-level value for subset selection, relying solely on the model's internal optimization signals without external reward models or step annotations. To ensure scalability, GRACE employs a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE achieved 108.8% of full-data performance with only 20% of the data, and 100.2% with just 5%, demonstrating effective transferability across different model backbones.

Key takeaway

For AI Engineers optimizing large language model post-training, GRACE offers a data-efficient approach to improve reasoning capabilities. By focusing on step-level gradient alignment, you can achieve comparable or superior performance with significantly smaller datasets (e.g., 5-20% of original data). Consider integrating GRACE's gradient-aligned curation to reduce computational costs and accelerate model development cycles, especially for models like Qwen3-VL-2B-Instruct.

Key insights

GRACE curates reasoning data by scoring individual steps based on gradient alignment and trajectory consistency.

Principles

Individual reasoning steps have uneven value.
Internal optimization signals suffice for data curation.

Method

GRACE scores each reasoning step by its alignment with the answer-oriented gradient and consistency with the preceding trajectory, aggregating these into a sample-level score for data subset selection.

In practice

Use gradient-aligned scoring for data selection.
Estimate step-level alignment via a gradient proxy.

Topics

GRACE Method
Gradient-aligned Data Curation
Reasoning Data Selection
Post-training Efficiency
Large Language Models

Code references

StigLidu/GradAlign

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.