From Logistic Regression to GPT-2: Building a Complete Spam Detection & Sentiment Analysis Pipeline

2026-03-24 · Source: NLP on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, long

Summary

This article details a two-phase pipeline for spam detection and sentiment analysis, benchmarking eight models across classical ML, deep learning, and transformer paradigms using the UCI SMS Spam Collection dataset. Phase 1 evaluates Logistic Regression, SVM, Random Forest, XGBoost, LSTM, BiLSTM, BERT, and GPT-2, revealing that accuracy is a misleading metric for imbalanced datasets, which comprise 87% ham and 13% spam. Instead, F1-score, Precision-Recall AUC, and ROC-AUC are used, with BERT emerging as the top performer with 11 total errors and a 1.00 ROC-AUC. Phase 2 enriches the dataset with sentiment labels using BiLSTM for classification and VADER for sentiment scoring, demonstrating that 72.2% of spam messages carry a positive sentiment, compared to 41.7% of ham.

Key takeaway

For Machine Learning Engineers building text classifiers on imbalanced datasets, you should prioritize evaluation metrics like F1-score and Precision-Recall AUC over raw accuracy. Focus on confusion matrices to understand specific failure modes, especially false negatives, and consider transformer models like BERT for superior performance, even if they require more resources. Remember that GPT-2's well-calibrated probability estimates offer threshold flexibility for optimizing recall.

Key insights

For imbalanced text classification, prioritize F1-score and Precision-Recall AUC over accuracy and ROC-AUC.

Principles

Accuracy is deceptive for imbalanced classification.
Confusion matrices reveal model failure modes.
Word clouds are powerful for early NLP feature insight.

Method

A two-phase pipeline benchmarks eight models on an 80/20 train-test split, then enriches the dataset with sentiment labels using the best classifier and VADER for cross-signal analysis.

In practice

Use F1-score for imbalanced classification.
Analyze confusion matrices to understand error types.
Retain stopwords if they carry discriminative signal.

Topics

Spam Detection
Text Classification
Class Imbalance
Transformer Models
Sentiment Analysis

Code references

Hafsa06rd/spam-detection-sentiment-analysis

Best for: Machine Learning Engineer, Data Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NLP on Medium.