TL;DR notes from two articles I read today.
Evolving systems for new products
- A common anti-pattern to avoid is preemptively optimizing systems for the future while still trying to establish the current market fit. It leads to slower iterations between product experiments.
- You can try to expedite the development by reusing existing systems and tooling.
- Evaluate your business logic performed at read time to identify what data was shared with an application to enable better data modeling and quick, small product changes.
- Be alert to scaling challenges and set new goals accordingly. For example, you might perform nightly load tests to catch issues and decide to reduce the complexity of a system to quickly develop on the backend in response to new feature requests.
- For a design that lasts in the future, you might relate data in what used to be disparate stores so that a single request suffices instead of a string of orchestrated calls by a read service. You might need to modify your ETL pipeline to consolidate data (which can be complex and risky) for sharing and passing downstream. And while you migrate, you may want to use a dual write pattern to old and new databases, but it will reduce dependencies and make it easier to triage issues.
Full post here, 8 mins read
Improving resiliency and stability of a large-scale monolithic API service
Lessons from the API layer service used by LinkedIn:
- They chose a cross-platform design (with all platforms using the same API and same endpoints for the same features) and an all-encompassing design (one API service calls all product verticals), to allow for high code reuse.
- They reused data-schema definitions and endpoints to make it easier for engineers to collaborate but it led to issues at scale, when extended to deployment architecture. It was addressed by microclustering rather than breaking the monolith into microservices: Endpoints of the services were partitioned without breaking the code, routing traffic for each partition to a dedicated cluster of servers. Data from monitoring systems were used to identify which verticals had enough traffic to justify a partition.
- For each vertical, the build system was modified to create an additional deployable named after the vertical, with configuration inherited from the shared service and extended. Traffic from the vertical’s endpoints was examined to estimate the number of servers needed in the new cluster.
- While deploying, capacity testing was carried out - when there was enough traffic to overload at least three servers, servers were slowly taken down to observe latencies and error rates, revealing how many queries-per-second each server could process without incident. This information was used for capacity planning, to fine-tune resource allocation.
- The results of microclustering include the ability to limit downstream failures and bugs to a single vertical, and each cluster can be tuned independently of the others for better capacity planning, monitoring and granular control over deployment.
Full post here, 5 mins read
Get these notes directly in your inbox every weekday by signing up for my newsletter, in.snippets().