Regression Language Models for Code
Summary
Regression Language Models (RLMs) offer a unified approach to code-to-metric regression, predicting numeric outcomes directly from code text, bypassing complex feature engineering. A 300M parameter RLM, initialized from T5Gemma, demonstrates strong performance, achieving over 0.9 Spearman-rank on APPS competitive programming for memory footprint prediction in Python and C++. This single model also attains over 0.5 average Spearman-rank across 17 CodeNet languages and a 0.46 average Kendall-Tau on five Neural Architecture Search (NAS) design spaces, outperforming graph neural networks. RLMs can simultaneously predict Triton GPU kernel latency and neural network accuracy/speed from ONNX representations. Key findings also highlight benefits from language and regression pretraining, the superiority of decoder-based numeric outputs over MSE heads, and the impact of encoder size and custom tokenization.
Key takeaway
For AI Scientists and Machine Learning Engineers optimizing code performance or designing neural architectures, Regression Language Models (RLMs) provide a compelling, unified solution. You can leverage RLMs to predict metrics like memory, latency, and accuracy directly from raw code or ONNX graphs, eliminating tedious feature engineering. Consider integrating RLMs into your development pipeline to accelerate performance prediction and guide optimization decisions across diverse programming languages and hardware platforms.
Key insights
A unified Regression Language Model (RLM) can predict diverse code metrics directly from text, simplifying computational graph regression.
Principles
- Text-to-text regression simplifies complex feature engineering.
- Pretraining on language data and synthetic metrics accelerates convergence.
- Decoder-based numeric outputs are superior and normalization-free.
Method
The RLM method treats regression as a next-token prediction problem using an encoder-decoder architecture. It employs explicit digit-by-digit numeric tokenization for y-values, like `<+><-><1><7><2><5>`, and constrained decoding to ensure valid numerical outputs without normalization.
In practice
- Employ RLMs for memory and latency prediction in Python and C++.
- Use RLMs to predict neural network performance from ONNX graphs.
- Pretrain RLMs on synthetic metrics to boost real-task convergence.
Topics
- Regression Language Models
- Code Performance Prediction
- Neural Architecture Search
- ONNX
- T5Gemma
- Multi-task Learning
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.