Evaluation of AutoML Frameworks for IDS under Imbalanced Data Conditions of the NSL-KDD Dataset
Summary
This work investigates the impact of severe class imbalance on automated machine learning (AutoML) frameworks for multiclass network intrusion detection, utilizing the NSL-KDD dataset. Unlike prior studies, this research preserves the original five-class distribution, including highly underrepresented R2L and U2R attacks, for a realistic evaluation. Nine open-source AutoML frameworks were analyzed under a unified protocol, considering architectural design, ensemble strategies, and imbalance-handling mechanisms. Results indicate that frameworks incorporating ensemble learning and imbalance-aware optimization achieve superior minority-class discrimination. PyCaret obtained the best overall performance with 66% macro-F1, followed by AutoGluon at 55%, while frameworks lacking native balancing support showed significant degradation. The analysis concludes that accuracy-oriented optimization alone is insufficient for highly imbalanced IDS scenarios, highlighting the need for native integration of imbalance-aware optimization, resampling, and stratified evaluation strategies.
Key takeaway
For Machine Learning Engineers developing network intrusion detection systems, selecting an AutoML framework requires careful consideration of its native imbalance-handling capabilities. You should prioritize frameworks that integrate ensemble learning and imbalance-aware optimization, such as PyCaret, to ensure robust detection of rare attack categories. Relying solely on accuracy-oriented optimization will lead to poor generalization on critical minority classes, necessitating evaluation with metrics like macro-F1.
Key insights
AutoML frameworks require native imbalance-aware optimization and stratified evaluation for reliable intrusion detection on severely imbalanced datasets.
Principles
- Ensemble learning improves minority-class discrimination.
- Accuracy-oriented optimization fails in imbalanced IDS.
- Preserve original class distribution for realistic evaluation.
Method
Nine open-source AutoML frameworks were analyzed under a unified protocol on the NSL-KDD dataset, preserving its original five-class imbalance, to evaluate performance in multiclass network intrusion detection.
In practice
- Prioritize AutoML frameworks with ensemble learning.
- Ensure AutoML supports imbalance-aware optimization.
- Use macro-F1 for imbalanced IDS evaluation.
Topics
- AutoML Frameworks
- Network Intrusion Detection
- Class Imbalance
- NSL-KDD Dataset
- PyCaret
- AutoGluon
- Ensemble Learning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.