Introducing custom pipelines and extensions for spaCy v2.0
Summary
spaCy v2.0, nearing its release candidate phase, introduces a significant new system designed to enhance its natural language processing capabilities. This update specifically enables the addition of custom pipeline components and the registration of extensions directly to its core "Doc", "Span", and "Token" objects. This improvement allows developers to extend spaCy's functionality by attaching custom attributes and methods, streamlining the integration of specialized processing steps and custom data handling within NLP workflows. The new extensibility system is a key feature, with an example extension package, "spacymoji", demonstrating its practical application for integrating custom functionalities such as emoji processing.
Key takeaway
For NLP engineers building custom applications or extending spaCy's core capabilities, spaCy v2.0's new extensibility system is crucial. You can now directly integrate custom pipeline components and attach attributes to "Doc", "Span", and "Token" objects, simplifying complex NLP workflows. This allows you to tailor spaCy more precisely to your project's unique data and processing needs, such as adding specialized text analysis or custom metadata. Consider upgrading to v2.0 to utilize these enhanced customization options.
Key insights
spaCy v2.0 enhances extensibility via new pipeline components and "Doc"/"Span"/"Token" object extensions.
Principles
- Extend core NLP objects for custom data.
- Integrate specialized processing steps.
- Streamline custom attribute attachment.
In practice
- Develop custom NLP pipeline stages.
- Add emoji processing with "spacymoji".
- Attach custom attributes to "Doc" objects.
Topics
- spaCy v2.0
- NLP Pipelines
- Custom Components
- Doc Object Extensions
- Span Object Extensions
- Token Object Extensions
- spacymoji
Best for: NLP Engineer, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.