Skipper: Building Airbnb’s embedded workflow engine

· Source: The Airbnb Tech Blog - Medium · Field: Technology & Digital — Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

Airbnb developed Skipper, an embedded workflow engine designed to provide durable execution for critical "Tier 0" services without the operational overhead of external orchestration clusters or cloud-managed solutions. It addresses the problem of fragmented domain logic and bespoke retry systems by offering a shared library that integrates directly into existing services. Skipper leverages current infrastructure, using databases like MySQL or Airbnb's Unified Data Store for state persistence, and features a simple Java/Kotlin programming model with annotation-based contracts. It ensures workflow completion through a replay mechanism with checkpointed actions, where actions' results survive crashes and restarts, and compensation methods handle failures. This engine has been in production for over a year, powering more than 15 use cases across insurance, payments, and media processing, and has scaled to 10,000 workflows per second on Amazon DynamoDB.

Key takeaway

For Software Engineers building durable distributed systems, if minimizing external dependencies and operational overhead is paramount, consider an embedded workflow engine approach. This model allows your service to manage its own workflow processing, using existing databases and reducing single points of failure. You gain simplified development with familiar programming models, but must ensure workflow determinism and action idempotency to handle replays and potential at-least-once execution.

Key insights

Embedded workflow engines offer durable execution with minimal overhead by utilizing existing service infrastructure.

Principles

Method

Skipper defines Workflows for orchestration logic and Actions for individual operations. Actions are checkpointed, and durability is achieved via replay, where previously executed actions return checkpointed results instantly. Compensation methods undo effects of failed actions.

In practice

Topics

Best for: Software Engineer, DevOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Airbnb Tech Blog - Medium.