How Far Can Classical NLP Go? From Bag-of-Words to Stacking on Spooky Author Identification

2026-06-24 · Source: Naturallanguageprocessing on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, long

Summary

A classical NLP experiment on Kaggle's Spooky Author Identification task demonstrates the effectiveness of traditional methods for stylistic text classification. The project progressed from a Vowpal Wabbit word baseline to a tuned stacked ensemble, aiming to distinguish authors Edgar Allan Poe, Mary Shelley, and H. P. Lovecraft from single sentences. Key improvements included adding punctuation and character n-grams, which boosted VW holdout accuracy from 0.8332 to 0.8553. A TF-IDF ensemble further enhanced probability quality, leading to a final stacked model achieving 0.8687 accuracy and 0.3504 log loss on a 70/30 holdout split. The final Kaggle submission scored 0.30414 private and 0.33621 public log loss, showing sparse count-based features outperformed averaged dense embeddings for this short-text stylistic task.

Key takeaway

For Machine Learning Engineers tackling stylistic text classification, consider robust classical NLP pipelines before defaulting to complex deep learning models. Your focus on detailed feature engineering, including punctuation and character n-grams, combined with ensemble methods like stacking, can yield highly competitive results, as demonstrated by achieving a 0.30414 private log loss on Kaggle's Spooky Author Identification. Prioritize careful validation and probability quality metrics like log loss, as these often reveal the true performance gains.

Key insights

Classical NLP, with careful feature engineering and stacking, excels at stylistic authorship attribution.

Principles

Stylistic tasks benefit from sparse n-gram and character features.
Punctuation and character n-grams capture writing style.
Stacking improves probability estimates.

Method

The project built a sequence of classical models: Vowpal Wabbit baselines, a tuned TF-IDF ensemble, and a stacked sparse-text ensemble using out-of-fold predictions, with careful hyperparameter tuning and evaluation.

In practice

Use Vowpal Wabbit for fast linear text models.
Implement NB-SVM-style Logistic Regression for text classification.
Combine base model predictions via stacking for better log loss.

Topics

Classical NLP
Authorship Attribution
Text Classification
Stacked Ensemble
TF-IDF
Feature Engineering

Best for: Machine Learning Engineer, Data Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Naturallanguageprocessing on Medium.