David Dodson: spaCy in the News: Quartz’s NLP pipeline

· Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, long

Summary

Quartz developed Scio2, a living natural language processing pipeline built on spaCy, to analyze global business news with domain-specific context. This pipeline moves beyond general AI, focusing on time-sensitive news analysis. Scio2 processes a corpus of 70,000 articles, totaling 101.4 million text blocks, to train its base model with 85,000 labeled sentences. A core component is the language graph, which stores analyzed content, tracks linguistic changes over time (e.g., the emergence of "5G"), and serves as a dynamic resource for extracting new training data to retrain spaCy models. The system performs rich analysis, including custom entity recognition (like "construct" for evolving terms) and classification of stylistic elements such as active/passive voice, aligning with Quartz's editorial style guide. This enables real-time content analysis and dynamic model evolution.

Key takeaway

For NLP Engineers building systems for rapidly evolving content, you should prioritize domain-specific pipelines over general AI approaches. Implement a dynamic language graph to store analyzed content and track linguistic shifts, enabling continuous model retraining with relevant, time-sensitive data. This strategy ensures your models remain accurate and contextually aware, adapting to new terminology and evolving narratives as they emerge.

Key insights

Domain-specific NLP pipelines, like Scio2, excel in time-sensitive news analysis by dynamically adapting to evolving language and context.

Principles

Method

Content is analyzed by spaCy, then added to a language graph. This graph stores relationships, tracks linguistic evolution, and serves as a dynamic source for extracting training data to retrain spaCy models.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.