I Found 487 GB Of Data Nobody Had Looked At In Three Years. Deleting It Felt Like Archaeology.

· Source: Data Science on Medium · Field: Technology & Digital — Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Intermediate, quick

Summary

A storage audit on an analytics database revealed 487 GB of unused data, leading to a significant cost reduction. The audit was initiated after the CFO questioned why the company was paying for 2.1 TB of storage when only 800 GB was reported as "active." The largest unreferenced table, "user_activity_log," accounted for nearly 25% of the total storage. This table, which tracked user clicks for a discontinued product analytics experiment, had not been updated since February 14, 2023, and was not being queried by any active processes. Deleting this dormant data, after verifying its inactivity and creating backups, resulted in a monthly hosting bill reduction of $340.

Key takeaway

For Data Engineers managing cloud database costs, regularly auditing storage for dormant data is crucial. Identifying and safely removing unused tables, like the 487 GB "user_activity_log" in this case, can lead to direct savings, such as the $340 monthly reduction achieved here. Implement a routine to query table sizes and last access dates to prevent unnecessary expenditure.

Key insights

Regular storage audits can identify dormant data, reducing costs and improving database efficiency.

Principles

Method

Audit storage by querying table sizes, identify unreferenced or inactive tables, verify last update and query activity, then delete with backups.

In practice

Topics

Best for: Data Engineer, DevOps Engineer, IT Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Science on Medium.