I Found 487 GB Of Data Nobody Had Looked At In Three Years. Deleting It Felt Like Archaeology.
Summary
A storage audit on an analytics database revealed 487 GB of unused data, leading to a significant cost reduction. The audit was initiated after the CFO questioned why the company was paying for 2.1 TB of storage when only 800 GB was reported as "active." The largest unreferenced table, "user_activity_log," accounted for nearly 25% of the total storage. This table, which tracked user clicks for a discontinued product analytics experiment, had not been updated since February 14, 2023, and was not being queried by any active processes. Deleting this dormant data, after verifying its inactivity and creating backups, resulted in a monthly hosting bill reduction of $340.
Key takeaway
For Data Engineers managing cloud database costs, regularly auditing storage for dormant data is crucial. Identifying and safely removing unused tables, like the 487 GB "user_activity_log" in this case, can lead to direct savings, such as the $340 monthly reduction achieved here. Implement a routine to query table sizes and last access dates to prevent unnecessary expenditure.
Key insights
Regular storage audits can identify dormant data, reducing costs and improving database efficiency.
Principles
- Unused data incurs ongoing storage costs.
- Data lifecycle management prevents data sprawl.
Method
Audit storage by querying table sizes, identify unreferenced or inactive tables, verify last update and query activity, then delete with backups.
In practice
- Query `information_schema.tables` for disk usage.
- Check `created_at` for last row insertion.
- Verify no active queries before deletion.
Topics
- Storage Audit
- Database Management
- Data Lifecycle Management
- Cost Optimization
- Unused Data
Best for: Data Engineer, DevOps Engineer, IT Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Science on Medium.