Introducing Storage Buckets
Summary
Hugging Face has introduced Storage Buckets, a new type of repository on the Hugging Face Hub, designed as a drop-in replacement for Amazon S3 buckets. These buckets offer transparent per-terabyte pricing, ranging from $8 to $12 per month, which is more affordable and predictable than S3, and include built-in CDN and deduplication with Z without extra egress fees up to an 8:1 ratio of total storage. Unlike traditional Git-based repositories, Storage Buckets have no Git constraints and are optimized for machine learning workflows, handling large binary files like model checkpoints and datasets more efficiently by only updating changed data chunks. They are accessible via the Hugging Face CLI, a graphical user interface, and programmatically with Python, making them suitable for AI agents requiring persistent storage for memory or training runs.
Key takeaway
For AI Architects and ML Engineers managing large binary assets like model checkpoints or datasets, Hugging Face Storage Buckets offer a compelling alternative to S3. Your teams can achieve significant cost savings and performance improvements through predictable pricing, included CDN, and efficient Z-based deduplication, especially when frequently updating large files. Consider migrating existing S3 workflows to Hugging Face Buckets to streamline your ML storage infrastructure.
Key insights
Hugging Face Storage Buckets provide affordable, deduplicated, and CDN-enabled storage for AI workflows.
Principles
- Deduplication with Z optimizes binary file updates.
- Built-in CDN improves global file access.
- Predictable pricing enhances cost management.
Method
Create buckets via CLI, GUI, or Python. Sync files using `hf sync` to leverage Z-based deduplication, treating buckets like a local file system path for seamless integration.
In practice
- Store ML checkpoints and datasets.
- Provide persistent memory for AI agents.
- Replace Amazon S3 for ML artifacts.
Topics
- Hugging Face Storage Buckets
- Deduplication with Z
- AI Workflows
- Model Checkpoints
- Persistent Storage
Best for: AI Architect, CTO, VP of Engineering/Data, Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.