I Built an AI-Powered Network Intrusion Detection System for My Final Year Project — Here’s Exactly…
Summary
A final-year project, IntrusionIQ, developed an AI-powered Network Intrusion Detection System (NIDS) from scratch on a three-VM isolated lab network. The system features a complete ML pipeline, a live dashboard, and a real attack simulation environment. Utilizing the CICIDS2017 dataset, which contains 2.8 million network flow records with 78 features, the team addressed significant class imbalance (83% benign traffic) and feature redundancy. The ML pipeline involved feature selection to reduce 78 features to 20, an 80/20 stratified train/test split, StandardScaler normalization, and SMOTE for oversampling minority classes. A Random Forest classifier achieved approximately 98%+ True Positive Rate. The architecture integrates data preprocessing, model training, a simulation engine with a 20ms delay, and a Streamlit dashboard for real-time alerts. A hybrid approach was used for live demonstration due to CICFlowMeter compatibility issues, successfully detecting Port Scan, DoS Flood, SSH Brute Force, FTP Brute Force, and SQL Injection attacks.
Key takeaway
For ML Engineers developing real-time security systems, prioritize robust data pipeline design and a demonstrable live environment over chasing incremental accuracy in isolation. Your ability to showcase a functional, end-to-end system that handles real-world constraints, like data imbalance and tool compatibility, will be more impactful than theoretical performance metrics alone. Documenting design decisions and trade-offs throughout the build process will also significantly streamline debugging and communication.
Key insights
Building a robust NIDS requires careful data preprocessing, model selection, and a demonstrable real-time architecture.
Principles
- Split data before scaling or balancing.
- Prioritize recall in NIDS for critical attack detection.
- A working demo outweighs marginal accuracy gains.
Method
The IntrusionIQ method involves feature selection, stratified data splitting, StandardScaler normalization, SMOTE oversampling, and Random Forest classification, integrated into a four-phase architecture with a real-time simulation engine and Streamlit dashboard.
In practice
- Use `feature_importances_` for feature reduction.
- Apply `.strip()` to column names to avoid silent KeyError.
- Document trade-offs and limitations transparently.
Topics
- IntrusionIQ
- Network Intrusion Detection System
- CICIDS2017 Dataset
- Machine Learning Pipeline
- Random Forest Classifier
Best for: AI Student, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Science on Medium.