Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation
Summary
Graphs of Research (GoR) is a supervised fine-tuning (SFT) method designed to enhance large language models' (LLMs) ability to generate research ideas by leveraging citation evolution graphs. Unlike existing methods that rely on static literature retrieval or complex prompt engineering, GoR extracts a 2-hop reference neighborhood for a seed paper, analyzes relations based on citation position, frequency, predecessor links, and publication time, and organizes this into a paper-evolution directed acyclic graph (DAG). An automated pipeline extracts data from five major ML/NLP venues, creating a dataset of 498 train, 50 validation, and 50 test seed papers with approximately 7,600 cited references. Qwen2.5-7B-Instruct-1M was fine-tuned using a structured-text prompt incorporating the citation graph and reference information. GoR-SFT achieved state-of-the-art performance in head-to-head LLM-judge tournaments against gpt-4o-driven baselines.
Key takeaway
For AI Scientists and Research Scientists focused on automating scientific discovery, GoR demonstrates that incorporating citation evolution graphs significantly improves LLM-based research idea generation. You should consider integrating structural citation data, including position, frequency, and temporal links, into your LLM training or prompting strategies to enhance the relevance and novelty of generated research concepts. This approach can reduce the manual effort in identifying novel research directions.
Key insights
Citation evolution graphs provide effective supervision for LLM-based research idea generation.
Principles
- Structural relations among references are crucial.
- Citation context signals enhance idea generation.
Method
GoR extracts 2-hop reference neighborhoods, derives relations from citation position, frequency, predecessor links, and publication time, then organizes them into a paper-evolution DAG for LLM fine-tuning.
In practice
- Use 2-hop reference neighborhoods.
- Incorporate citation position and frequency.
- Fine-tune LLMs with structured graph data.
Topics
- Graphs of Research
- Citation Evolution Graphs
- Research Idea Generation
- Large Language Models
- Supervised Fine-tuning
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.