LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks

2026-06-16 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A recent study reveals that directly concatenating LLM-generated node features to graph neural networks (GNNs) can unexpectedly decrease accuracy on homophilous graph benchmarks, contradicting widespread reports of improvement. Using an MLP backbone with SBERT-encoded GPT-4o-mini TAPE features on the Planetoid public split, this method reduced PubMed test accuracy by -17.0 +/- 0.3 pp and Cora by -4.3 +/- 0.6 pp. This degradation lessens with different GNN backbones (GCN, GCNII, GAT), random splits, or smaller encoders, and reverses on medium-homophily datasets like WikiCS (+4.4 pp) and ogbn-arxiv (+11.7 pp). The research introduces "Delta_sig", a measure of LLM-alone discriminability, which correlates more strongly with concatenation cost (r^2 = 0.38) than homophily (r^2 = 0.06) across nine datasets. A power law, |Delta_concat| proportional to (sqrt(d_l/n))^1.31 with r^2 = 0.97, further explains the observed performance drops.

Key takeaway

For Machine Learning Engineers integrating LLM features into graph neural networks, you should critically evaluate the impact of simple input concatenation. If your graph datasets are highly homophilous, or if the LLM features exhibit high "Delta_sig" discriminability, direct concatenation may degrade accuracy rather than improve it. Consider alternative integration strategies like joint training or distillation, or carefully assess the "Delta_sig" metric for your specific LLM features and dataset before deployment to avoid performance regressions.

Key insights

Concatenating LLM features to GNNs can degrade accuracy on homophilous graphs, especially with high LLM-alone discriminability.

Principles

LLM feature concatenation isn't universally beneficial for GNNs.
High LLM-alone discriminability ("Delta_sig") predicts performance drops.
Performance degradation follows a power law related to feature dimensions.

Method

The study proposes "Delta_sig", a measure of LLM-alone discriminability, to predict whether concatenating LLM features will help or hurt GNN performance on a given dataset.

In practice

Evaluate "Delta_sig" before concatenating LLM features to GNNs.
Avoid simple concatenation on highly homophilous graphs.
Consider alternative integration methods beyond pure concatenation.

Topics

LLM Features
Graph Neural Networks
Feature Concatenation
Homophilous Graphs
Delta_sig Metric
GPT-4o-mini

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.