TRN-R1-Zero: Text-rich Network Reasoning via LLMs with Reinforcement Learning Only

2026-04-22 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

TRN-R1-Zero is a novel post-training framework designed for zero-shot reasoning on text-rich networks (TRNs), which integrates textual semantics with relational structures without requiring task-specific supervision. Unlike existing graph neural networks or LLM-based approaches that often rely on fixed label spaces, distillation from larger models, or supervised fine-tuning, TRN-R1-Zero trains base LLMs solely via reinforcement learning. It employs a Neighbour-aware Group Relative Policy Optimisation objective, dynamically adjusting rewards based on a margin gain metric that quantifies the informativeness of neighboring signals. This method guides the model toward relational reasoning, achieving superior and robust performance across citation, hyperlink, social, and co-purchase TRN benchmarks. Notably, TRN-R1-Zero, trained strictly on node-level tasks, demonstrates zero-shot inference capabilities on edge- and graph-level tasks, extending beyond cross-domain transfer.

Key takeaway

For AI Engineers and Research Scientists developing LLM-based solutions for graph data, TRN-R1-Zero offers a compelling alternative to supervised fine-tuning or distillation. You should consider adopting its reinforcement learning-only approach, particularly the neighbor-aware policy optimization, to enable robust zero-shot reasoning on text-rich networks. This method can significantly reduce reliance on extensive labeled data and external reasoning models, enhancing model efficiency and generalization across diverse graph tasks and domains.

Key insights

Reinforcement learning with neighbor-aware rewards enables LLMs to perform zero-shot reasoning on text-rich networks without supervision.

Principles

Reinforcement learning can intrinsically activate LLM reasoning.
Dynamic reward adjustment based on neighbor influence improves policy optimization.
Node-level training can generalize to edge and graph-level tasks.

Method

TRN-R1-Zero uses a Neighbour-aware Group Relative Policy Optimisation objective, scaling rewards by a margin gain metric that measures the impact of neighborhood information on classification decisions, thereby emphasizing structurally informative samples during policy updates.

In practice

Apply RL-only post-training for zero-shot TRN classification.
Use prompt engineering for node, neighbor, and label integration.
Implement neighbor-aware reward shaping for stable optimization.

Topics

TRN-R1-Zero
Text-rich Networks
Reinforcement Learning
Zero-shot Reasoning
Large Language Models

Code references

superallen13/TRN-R1-Zero

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.