ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

2026-06-03 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

ReasoningFlow is a new framework designed to capture the non-linear discourse structures within Large Reasoning Model (LRM) traces, such as backtracking and self-correction, by converting them into fine-grained directed acyclic graphs (DAGs). Researchers developed and validated an annotation schema through manual annotation of 31 traces (2.1k steps), achieving high inter-annotator agreement. This was then scaled to automatically annotate 1,260 traces (247.7k steps) across three tasks—math, science, and argumentation—and five models, including Qwen2.5-32B-Inst, QwQ-32B, DeepSeek-V3, DeepSeek-R1, and GPT-oss-120B. Analysis of these ReasoningFlow graphs revealed that LRMs exhibit structurally similar traces despite diverse training, diverse fine-grained reasoning behaviors (e.g., local verification, self-reflection), and that most erroneous steps do not contribute to final answers. The study also found that mechanistic causal dependencies do not reflect language-level discourse structure.

Key takeaway

For NLP Engineers evaluating Large Reasoning Models, understanding the internal reasoning process is crucial. You should consider using discourse structure analysis, like ReasoningFlow, to gain fine-grained insights into model behaviors such as self-correction and local verification. This approach helps you monitor reasoning traces more effectively and identify if erroneous steps actually influence final outputs, guiding your model refinement strategies.

Key insights

ReasoningFlow maps LRM traces to DAGs, revealing structural similarities and diverse fine-grained reasoning behaviors.

Principles

LRMs show structurally similar reasoning traces.
Erroneous steps often don't impact final LRM answers.
Discourse structure differs from causal dependencies.

Method

ReasoningFlow captures LRM discourse structures as fine-grained directed acyclic graphs (DAGs) via manual schema validation and subsequent automatic annotation for large-scale analysis.

In practice

Monitor LRM reasoning traces for specific behaviors.
Evaluate LRM errors not contributing to final answers.
Analyze LRM reasoning across diverse models.

Topics

Large Reasoning Models
Reasoning Traces
Discourse Structures
Directed Acyclic Graphs
LLM Evaluation
Model Monitoring

Code references

jinulee-v/reasoningflow

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.