GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction
Summary
GHGbench is a new open dataset and benchmark designed for predicting company- and building-level greenhouse gas (GHG) emissions, addressing fragmentation in existing resources. The company track includes over 32,000 company-year records from 12,000+ firms, detailing Scope 1+2 and Scope 3 emissions alongside financial and sectoral data. The building track harmonizes 491,591 building-year records from 13 open sources across 26 metropolitan areas, incorporating climate covariates and multimodal remote-sensing embeddings. GHGbench establishes canonical splits for in-distribution and cross-region/city transfer tasks, with temporal hold-out and short-horizon forecasting as supplementary tasks. Baselines include gradient-boosted trees, a tabular foundation model, MLP, FT-Transformer, and multimodal fusion, evaluated using multi-seed paired-bootstrap tests. Key findings indicate building emissions are harder to predict than company emissions, the in-distribution to out-of-distribution gap is significant, and multimodal remote-sensing embeddings improve prediction where tabular generalization fails.
Key takeaway
For AI Scientists and Machine Learning Engineers developing carbon emission prediction models, GHGbench offers a standardized, comprehensive benchmark to evaluate model performance under realistic conditions. Your focus should be on improving generalization across diverse entities and regions, as the out-of-distribution gap significantly impacts accuracy. Consider integrating multimodal remote-sensing data, especially for building emissions, to overcome limitations of purely tabular approaches and enhance predictive precision.
Key insights
GHGbench unifies carbon emission prediction with comprehensive datasets and benchmarks for company and building levels.
Principles
- Building emissions are structurally harder to predict.
- Out-of-distribution generalization is a major challenge.
- Multimodal data improves prediction where tabular data fails.
Method
GHGbench defines canonical data splits for in-distribution and cross-region/city transfer tasks, evaluating baselines with multi-seed paired-bootstrap tests.
In practice
- Use multimodal remote-sensing for building emissions.
- Prioritize models robust to out-of-distribution data.
Topics
- GHGbench
- Carbon Emission Prediction
- Multi-Entity Benchmark
- Remote Sensing Embeddings
- Tabular Foundation Models
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.