Article: Time-Series Storage: Design Choices That Shape Cost and Performance
Summary
Time-series data storage design significantly impacts cost and query performance, often more than the choice of database itself. This article, published on May 12, 2026, by Nirmesh Khandelwal, a Senior Software Engineer at Amazon Aurora – AWS, explores fundamental design choices using tools like PostgreSQL and Apache Parquet. Key strategies include normalizing series identity into a separate metadata table, which reduced storage by approximately forty-two percent in a PostgreSQL 16 experiment with 2.8M rows. The content also details managing high-cardinality fields, designing for schema evolution using JSONB, and leveraging columnar storage formats like Parquet on object storage (e.g., S3) for substantial compression gains (up to 434 times). Additionally, it covers wide vs. narrow schemas for multi-metric rows, two-dimensional partitioning (time and series identity) to mitigate write hotspots, and downsampling strategies to manage data resolution and retention over time.
Key takeaway
For Data Engineers designing or optimizing time-series data pipelines, carefully consider normalizing series identity and employing two-dimensional partitioning (time and series identity) to drastically reduce storage costs and improve query performance. Your choice of schema (wide vs. narrow) and the strategic use of columnar formats like Parquet on object storage will further dictate efficiency and scalability, preventing surprise bills and slow dashboards. Prioritize downsampling and caching for dashboard queries.
Key insights
Optimized time-series storage relies on smart schema design, partitioning, and data lifecycle management.
Principles
- Normalize series identity for storage efficiency.
- Avoid high-cardinality fields in series identity.
- Partition data by time and series identity.
Method
Store time-series data by normalizing dimensions, using flexible JSON for schema evolution, employing columnar formats like Parquet, and partitioning by time and series identity, with downsampling for retention.
In practice
- Use PostgreSQL jsonb for flexible dimension storage.
- Deploy Parquet files on S3 for cost-effective long-term retention.
- Implement a resolution ladder for downsampling older data.
Topics
- Time-Series Storage Design
- Data Normalization
- High Cardinality Data
- Schema Evolution
- Columnar Storage
Code references
Best for: Data Engineer, Software Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.