A time-series classification framework for individual-level absenteeism prediction under severe class imbalance

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A new Time Series Classification (TSC) framework addresses the challenge of individual-level absenteeism prediction, which incurs significant operational costs in sectors like healthcare. Existing methods are limited by mapping features to same-time labels and discarding sequential attendance history. This framework proactively predicts future absences by separating historical attendance sequences from future absence labels. Researchers constructed a reproducible simulated dataset, calibrated to the UCI dataset, due to the absence of public longitudinal data. The analysis evaluated Binary Focal Loss (BFL) and Geometric Mean (G-Mean) loss under severe class imbalance, finding BFL achieved specificity 0.813 and balanced accuracy 0.888, comparable to G-Mean, which adapts automatically without parameter calibration. Among deep learning architectures, the hybrid LSTM-Fully Convolutional Network (LSTM-FCN) delivered strong precision and specificity. Stable performance, with approximately 80% balanced accuracy on held-out test data, was achieved using batch sizes >= 64 and window sizes between 40-80 days.

Key takeaway

For workforce planners and ML engineers tasked with improving absenteeism prediction in high-demand environments, this Time Series Classification framework offers a robust approach. You should consider implementing a TSC model, specifically the LSTM-FCN architecture, to leverage historical attendance sequences for genuinely proactive forecasts. Employing G-Mean loss can simplify handling severe class imbalance, as it adapts automatically. Optimize your model with batch sizes >= 64 and window sizes between 40-80 days to achieve stable, high balanced accuracy, significantly enhancing workforce planning capabilities.

Key insights

A Time Series Classification framework proactively predicts individual absenteeism by analyzing historical attendance sequences, outperforming traditional methods.

Principles

Method

The framework constructs a simulated dataset, analyzes BFL and G-Mean loss under severe class imbalance, and evaluates deep learning architectures (LSTM, CNN, LSTM-FCN) for optimal performance.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.