A time-series classification framework for individual-level absenteeism prediction under severe class imbalance
Summary
A new Time Series Classification (TSC) framework addresses the challenge of individual-level absenteeism prediction, which incurs significant operational costs in sectors like healthcare. Existing methods are limited by mapping features to same-time labels and discarding sequential attendance history. This framework proactively predicts future absences by separating historical attendance sequences from future absence labels. Researchers constructed a reproducible simulated dataset, calibrated to the UCI dataset, due to the absence of public longitudinal data. The analysis evaluated Binary Focal Loss (BFL) and Geometric Mean (G-Mean) loss under severe class imbalance, finding BFL achieved specificity 0.813 and balanced accuracy 0.888, comparable to G-Mean, which adapts automatically without parameter calibration. Among deep learning architectures, the hybrid LSTM-Fully Convolutional Network (LSTM-FCN) delivered strong precision and specificity. Stable performance, with approximately 80% balanced accuracy on held-out test data, was achieved using batch sizes >= 64 and window sizes between 40-80 days.
Key takeaway
For workforce planners and ML engineers tasked with improving absenteeism prediction in high-demand environments, this Time Series Classification framework offers a robust approach. You should consider implementing a TSC model, specifically the LSTM-FCN architecture, to leverage historical attendance sequences for genuinely proactive forecasts. Employing G-Mean loss can simplify handling severe class imbalance, as it adapts automatically. Optimize your model with batch sizes >= 64 and window sizes between 40-80 days to achieve stable, high balanced accuracy, significantly enhancing workforce planning capabilities.
Key insights
A Time Series Classification framework proactively predicts individual absenteeism by analyzing historical attendance sequences, outperforming traditional methods.
Principles
- Separate historical sequences from future labels for proactive prediction.
- G-Mean loss adapts automatically to severe class imbalance.
- LSTM-FCN delivers strong precision and specificity for TSC.
Method
The framework constructs a simulated dataset, analyzes BFL and G-Mean loss under severe class imbalance, and evaluates deep learning architectures (LSTM, CNN, LSTM-FCN) for optimal performance.
In practice
- Use LSTM-FCN for robust time-series classification tasks.
- Consider G-Mean loss for imbalanced datasets without calibration.
- Optimize batch sizes >= 64 and window sizes 40-80 days.
Topics
- Time Series Classification
- Absenteeism Prediction
- Class Imbalance
- LSTM-FCN
- Binary Focal Loss
- Geometric Mean Loss
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.