Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

This research investigates the impact of compressed reasoning data on large language model (LLM) post-training, addressing the trade-off between performance and token cost. It introduces a taxonomy of Chain-of-Thought (CoT) reasoning: Explicit CoT (all operations), Composed CoT (combined operations), and Implicit CoT (omitted intermediates). Using a synthetic compositional reasoning task, experiments across various models revealed that coarser CoT requires more supervised fine-tuning (SFT) data. Composed and Implicit CoT benefit more from data scaling than Explicit CoT, with Composed CoT also gaining from data repetition, though Implicit CoT risks memorization. Notably, subsequent reinforcement learning with verifiable rewards (RLVR) can decompose compressed steps learned during SFT. Furthermore, unidirectional CoT ordering enhances generalization on longer sequential tasks, offering insights for CoT design under data constraints.

Key takeaway

For machine learning engineers optimizing LLM post-training with chain-of-thought data, strategically select your CoT compression. If data resources are limited, be aware that coarser CoT requires more SFT data. Prioritize Composed CoT for better data scaling and repetition benefits, but use Implicit CoT cautiously due to memorization risks. Consider integrating RLVR after SFT to further refine and decompose learned compressed reasoning steps, especially for complex tasks.

Key insights

The study clarifies how different CoT compression types affect LLM post-training and data efficiency.

Principles

Method

Proposed a taxonomy of CoT: Explicit (all operations), Composed (combined), and Implicit (omitted). Used a synthetic task to vary difficulty and compression.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.