Article: Time-Series Storage: Design Choices That Shape Cost and Performance

2026-05-12 · Source: InfoQ · Field: Technology & Digital — Software Development & Engineering, Cloud Computing & IT Infrastructure, Artificial Intelligence & Machine Learning · Depth: Intermediate, long

Summary

Time-series data storage design significantly impacts cost and query performance, often more than the choice of database itself. This article, published on May 12, 2026, by Nirmesh Khandelwal, a Senior Software Engineer at Amazon Aurora – AWS, explores fundamental design choices using tools like PostgreSQL and Apache Parquet. Key strategies include normalizing series identity into a separate metadata table, which reduced storage by approximately forty-two percent in a PostgreSQL 16 experiment with 2.8M rows. The content also details managing high-cardinality fields, designing for schema evolution using JSONB, and leveraging columnar storage formats like Parquet on object storage (e.g., S3) for substantial compression gains (up to 434 times). Additionally, it covers wide vs. narrow schemas for multi-metric rows, two-dimensional partitioning (time and series identity) to mitigate write hotspots, and downsampling strategies to manage data resolution and retention over time.

Key takeaway

For Data Engineers designing or optimizing time-series data pipelines, carefully consider normalizing series identity and employing two-dimensional partitioning (time and series identity) to drastically reduce storage costs and improve query performance. Your choice of schema (wide vs. narrow) and the strategic use of columnar formats like Parquet on object storage will further dictate efficiency and scalability, preventing surprise bills and slow dashboards. Prioritize downsampling and caching for dashboard queries.

Key insights

Optimized time-series storage relies on smart schema design, partitioning, and data lifecycle management.

Principles

Normalize series identity for storage efficiency.
Avoid high-cardinality fields in series identity.
Partition data by time and series identity.

Method

Store time-series data by normalizing dimensions, using flexible JSON for schema evolution, employing columnar formats like Parquet, and partitioning by time and series identity, with downsampling for retention.

In practice

Use PostgreSQL jsonb for flexible dimension storage.
Deploy Parquet files on S3 for cost-effective long-term retention.
Implement a resolution ladder for downsampling older data.

Topics

Time-Series Storage Design
Data Normalization
High Cardinality Data
Schema Evolution
Columnar Storage

Code references

prometheus/prometheus

Best for: Data Engineer, Software Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.