ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

ReasoningFlow is a novel framework designed to capture the non-linear discourse structures of Large Reasoning Model (LRM) traces as fine-grained directed acyclic graphs (DAGs). Developed through careful manual annotation of 31 traces (2.1k steps) with high inter-annotator agreement (Krippendorff's α>0.8), the schema defines 8 node types and 14 edge types. It was then scaled to automatic annotation of 1,260 traces (247.7k steps) from five models (Qwen2.5-32B-Inst, QwQ-32B, DeepSeek-V3, DeepSeek-R1, GPT-oss-120B) across math, science, and argumentation tasks. Analysis revealed that LRMs exhibit structurally similar traces despite different training, ReasoningFlow identifies diverse fine-grained behaviors like local verification and self-reflection, most erroneous steps do not contribute to final answers, and mechanistic causal dependencies do not align with language-level discourse structures.

Key takeaway

For MLOps Engineers and AI Scientists evaluating LRM performance or developing robust LLM applications, relying solely on stepwise error detection is insufficient. You should integrate discourse structure analysis, like ReasoningFlow, to understand how errors actually propagate or are corrected. This approach reveals that most LRM errors are unused or neglected, enabling more accurate faithfulness monitoring and targeted improvements to reasoning processes, especially for local verification mechanisms.

Key insights

ReasoningFlow maps LRM reasoning traces into fine-grained DAGs, uncovering diverse behaviors and the actual impact of errors.

Principles

Method

Develop an annotation schema with 8 node and 14 edge types, validate manually, then use an LLM-powered pipeline for node segmentation, classification, and edge detection/classification.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.