LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction
Summary
This study investigates the application of Large Language Models (LLMs) to structured industrial data for car retrofit prediction, specifically for BMW Group. Researchers analyzed a dataset of 284,271 prototype vehicles and 48,716 retrofit visits, where categorical values were hashed to remove semantic cues. They compared strong tabular machine learning baselines with three LLM strategies: embedding features (Amazon Titan Embed v2), direct prompted classification (Claude Sonnet 4), and an ML+LLM stacking approach. While classical tree ensembles remained the strongest standalone models, LLM embeddings proved useful (binary AUC = 0.982). Direct prompting, however, collapsed to random performance (binary AUC = 0.500) due to the lack of semantic signal. The hybrid stacking model achieved the best manually built multiclass performance (weighted F1 = 0.626), suggesting LLMs are more effective as complementary components than as standalone replacements for robust tabular baselines.
Key takeaway
For Machine Learning Engineers building predictive systems on privacy-constrained industrial tabular data, you should prioritize robust classical models like gradient-boosted trees. While direct LLM prompting is ineffective without semantic content, consider integrating LLM embeddings or outputs into a hybrid stacking architecture. This approach can provide complementary signal, as demonstrated by a multiclass weighted F1 of 0.626, enhancing overall system performance without incurring the high costs and latency of full LLM replacement.
Key insights
LLMs complement, but do not replace, strong tabular models on privacy-constrained industrial data lacking semantic cues.
Principles
- Hashed categorical data severely degrades direct LLM prompting.
- LLM embeddings can capture useful structural patterns in tabular data.
- Hybrid ML+LLM stacking can exploit complementary error patterns.
Method
Serialize structured rows into key-value text, embed with an LLM, or use for direct prompted classification, then combine with classical models via stacking.
In practice
- Use LLM embeddings as features for classical models.
- Integrate LLM outputs as meta-features in stacking ensembles.
Topics
- Large Language Models
- Tabular Data
- Industrial Prediction
- Retrofit Planning
- Hybrid ML Systems
- Time Series Forecasting
Code references
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.