Fast transformer inference with Metal Performance Shaders

2022-11-24 · Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

Thinc PyTorch layers now feature integrated support for Metal Performance Shaders (MPS), a significant update that enables GPU-accelerated inference for spaCy transformer-based pipelines on Apple Silicon Macs. This enhancement directly addresses the performance needs of users within the Apple ecosystem, allowing them to utilize their integrated GPUs for demanding natural language processing tasks. The new support dramatically improves inference speed, with reported gains of up to 4.7 times compared to previous configurations. This development streamlines the execution of complex transformer models, making local development and deployment more efficient and responsive for professionals working with spaCy on Apple's custom silicon architecture.

Key takeaway

For NLP Engineers developing or deploying spaCy transformer pipelines on Apple Silicon Macs, you should update your Thinc PyTorch layers to leverage Metal Performance Shaders. This integration offers a substantial performance uplift, potentially accelerating your inference tasks by up to 4.7 times. Utilizing this feature will significantly reduce processing times for local development and testing, making your workflow more efficient and responsive. Ensure your environment is configured to take advantage of the native GPU support.

Key insights

Metal Performance Shaders in Thinc PyTorch layers accelerate spaCy transformer inference on Apple Silicon Macs by up to 4.7x.

Principles

Hardware-specific optimization boosts performance.
GPU acceleration is critical for transformer inference.

In practice

Run spaCy transformers on Apple Silicon GPUs.
Achieve up to 4.7x faster inference locally.

Topics

Metal Performance Shaders
Thinc PyTorch
spaCy
Transformer Models
Apple Silicon
GPU Acceleration
Inference Optimization

Best for: Machine Learning Engineer, NLP Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.