Sitar-agent: Building a reliable dynamic configuration sidecar at scale

· Source: The Airbnb Tech Blog - Medium · Field: Technology & Digital — Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Intermediate, long

Summary

Airbnb's sitar-agent, a lightweight Kubernetes sidecar, delivers dynamic configurations reliably and quickly to thousands of service instances without redeployment. Rewritten in Java in 2024, the agent synchronizes configurations from a backend, preloading compressed snapshots from AWS S3 on pod startup for faster restarts and resilience to Sitar Service unavailability. It then continuously polls the Sitar Service every 10 seconds for updates. Key design decisions included maintaining the sidecar model for isolation and multi-language support, retaining a pull-based update model with server-side optimizations like a 10-second TTL cache, and replacing the legacy Sparkey-backed local datastore with SQLite. SQLite was chosen over RocksDB due to its superior multi-language support, native write-ahead logging (WAL) for concurrent access, and simpler operational footprint, despite RocksDB's better raw performance. The migration to SQLite was safely executed via shadow reads and a feature flag-gated gradual rollout.

Key takeaway

For DevOps Engineers designing or scaling dynamic configuration systems, prioritize a sidecar architecture for isolation and multi-language support, even if it incurs higher resource costs. You should implement snapshot preloading from object storage like S3 to ensure rapid restarts and resilience. When selecting a local datastore, favor operational simplicity and broad language support, such as SQLite, over raw performance if your workload fits its envelope, ensuring safer, more maintainable deployments.

Key insights

Airbnb's sitar-agent reliably delivers dynamic configurations at scale using a Kubernetes sidecar and optimized pull model.

Principles

Method

The sitar-agent sidecar preloads S3 snapshots, then polls the Sitar Service every 10 seconds for updates, storing them locally in SQLite for application access.

In practice

Topics

Code references

Best for: MLOps Engineer, Software Engineer, DevOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Airbnb Tech Blog - Medium.