My team made seven architecture decisions in three years. Five of them were wrong.
Summary
A backend team's experience over three years revealed that five out of seven architecture decisions were "wrong," leading to significant operational costs and migrations. The author, an engineer on a fast-growing 12-engineer payment platform team, details specific missteps like choosing Kafka for a job queue, splitting a monolith into seven services, adopting DynamoDB for transactional data, and building a custom caching layer. In each case, the team selected a more complex, "future-proof" technology to solve an imagined problem, rather than the actual, simpler problem at hand. Conversely, the two correct decisions, using PostgreSQL and a single regional deployment, resulted from explicitly defining constraints. This led the author to develop a four-step framework: define the current problem with numbers, assess reversal cost, calculate 18-month carrying cost, and identify the simplest solution, emphasizing constraint identification over technology preference.
Key takeaway
For Software Engineers making critical architecture choices, prioritize defining explicit, current constraints with numbers before discussing technology. Your focus should be on the simplest solution that addresses the actual problem, considering reversal costs and 18-month operational overhead. Avoid selecting complex, "future-proof" options for imagined scale, as these often lead to significant, unbudgeted operational debt and migrations. This approach ensures decisions align with immediate needs, preventing costly regret.
Key insights
Architecture decisions should prioritize current, explicit constraints over imagined future needs to avoid costly operational overhead.
Principles
- Simpler options are often correct for the relevant time horizon.
- High-reversal-cost decisions demand conservative defaults.
- Operational cost compounds against actual runway.
Method
Before any architecture decision, define the current problem with numbers, assess reversal cost, calculate 18-month carrying cost, and identify the simplest solution.
In practice
- Use PostgreSQL for relational workloads before scaling.
- Implement Redis with TTL for most caching needs.
- Start with a monolith until team size exceeds 25 engineers.
Topics
- Architecture Decisions
- Technical Debt
- System Design
- Operational Cost
- Microservices
- Database Selection
Best for: Software Engineer, DevOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Science on Medium.