Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation

2026-05-14 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

Graphs of Research (GoR) is a supervised fine-tuning (SFT) method designed to enhance large language models' (LLMs) ability to generate research ideas by leveraging citation evolution graphs. Unlike existing methods that rely on static literature retrieval or complex prompt engineering, GoR extracts a 2-hop reference neighborhood for a seed paper, analyzes relations based on citation position, frequency, predecessor links, and publication time, and organizes this into a paper-evolution directed acyclic graph (DAG). An automated pipeline extracts data from five major ML/NLP venues, creating a dataset of 498 train, 50 validation, and 50 test seed papers with approximately 7,600 cited references. Qwen2.5-7B-Instruct-1M was fine-tuned using a structured-text prompt incorporating the citation graph and reference information. GoR-SFT achieved state-of-the-art performance in head-to-head LLM-judge tournaments against gpt-4o-driven baselines.

Key takeaway

For AI Scientists and Research Scientists focused on automating scientific discovery, GoR demonstrates that incorporating citation evolution graphs significantly improves LLM-based research idea generation. You should consider integrating structural citation data, including position, frequency, and temporal links, into your LLM training or prompting strategies to enhance the relevance and novelty of generated research concepts. This approach can reduce the manual effort in identifying novel research directions.

Key insights

Citation evolution graphs provide effective supervision for LLM-based research idea generation.

Principles

Structural relations among references are crucial.
Citation context signals enhance idea generation.

Method

GoR extracts 2-hop reference neighborhoods, derives relations from citation position, frequency, predecessor links, and publication time, then organizes them into a paper-evolution DAG for LLM fine-tuning.

In practice

Use 2-hop reference neighborhoods.
Incorporate citation position and frequency.
Fine-tune LLMs with structured graph data.

Topics

Graphs of Research
Citation Evolution Graphs
Research Idea Generation
Large Language Models
Supervised Fine-tuning

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.