Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles
Summary
A new hybrid architecture for customer churn prediction on structured tabular data integrates feature-tokenized transformers (FT-Transformer) with gradient-boosted trees (XGBoost) via calibration-aware stacking. This framework addresses challenges like class imbalance using weighted loss functions and leverages out-of-fold stacking with a logistic regression meta-learner to recalibrate base model outputs. Tested on a public bank churn dataset of 10,000 customers with a 20% churn rate, the model achieved 62.10% F1, 0.861 AUC-ROC, and 0.647 PR-AUC. It statistically significantly outperformed the Multi-Layer Perceptron (MLP) baseline by 3.37 F1 points (p < 0.001) and 0.027 AUC. Ablation studies confirmed that both the transformer component and the stacking strategy materially contribute to its robust performance.
Key takeaway
For data scientists building churn prediction models, you should consider this hybrid FT-Transformer and XGBoost stacking ensemble. It significantly improves F1-score and AUC while providing well-calibrated probabilities crucial for cost-sensitive interventions. Implement class-weighted loss and out-of-fold stacking to enhance performance and ensure robust, reproducible results on imbalanced tabular datasets. This approach offers a strong balance of accuracy and interpretability for your retention strategies.
Key insights
Hybrid models combining transformers and tree ensembles offer superior, calibrated churn prediction on tabular data.
Principles
- Diverse inductive biases reduce ensemble variance.
- Stacking meta-learners improve probability calibration.
- Class-weighted loss functions preserve minority class distributions.
Method
The method involves preprocessing data, training FT-Transformer and XGBoost base models with class-weighted loss, generating out-of-fold predictions, and training a logistic regression meta-learner on these predictions.
In practice
- Use FT-Transformer for complex feature interactions.
- Employ XGBoost for discrete decision boundaries.
- Apply logistic regression as a simple, effective meta-learner.
Topics
- Customer Churn Prediction
- FT-Transformer
- XGBoost
- Stacking Ensembles
- Tabular Data
- Probability Calibration
Best for: AI Engineer, Research Scientist, Machine Learning Engineer, Data Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.