TL;DR notes from articles I read today.
Tips for architecting fast data applications
- Understand requirements in detail: how large each message is, how many messages are expected per minute, whether there may be large changes in frequency, whether records can be batch-processed, whether time relationships and ordering need to be preserved, how ‘dirty’ the data may be and does the dirt need to be cleaned, reported or ignored, etc.
- Implement an efficient messaging backbone for reliable, secure data exchange with low latency. Apache Kafka is a good option for this.
- Leverage your SQL knowledge, applying the same relational algebra to data streams in time-varying relations.
- Deploy cluster managers or cluster management solutions for greater scalability, agility, and resilience.
Full post here, 7 mins read
How to optimize the API response package
- Paginate responses into batches of content that are easily browsable, because they are segmented into set numbers (10 per page, 20 per page, etc), limited (say only the first 1,000 entries are paginated), and standardized (using ‘next’, ‘last’ etc for cursor navigation).
- Offer filtering of results according to parameters specified by the requester. This reduces the calls made and results displayed as well as limits the resources fed to the user, resulting in tangible optimization and better user experience. Do this while keeping in mind that overly complex filtering can work against optimization.
- Use ranges to restrict results based on a user-specified structure, so that only specific elements within the range are considered applicable for the request to execute. This lets you offload data processing from the client-side to the server.
- Avoid over-fetching and under-fetching, which can result from poorly formed requests or badly implemented scaling techniques.
Full post here, 12 mins read
Cold start/warm start with AWS Lambda
- Programming language can impact the duration of a cold start in Lambda: Java and C# are typically slower to initialize than Go, Python or Node but they perform better on warm calls.
- Adding a framework to structure the code deployed in Lambda increases execution time with cold calls, which can be minimized by using a serverless-oriented framework as opposed to a web framework. Typically, frameworks don’t impact warm calls.
- In serverless applications, one way to avoid cold starts is to keep Lambda warm beyond its fixed 5-minute life by preventing it from being unloaded. You can do this by setting up a cron to invoke Lambda at regular intervals. However, AWS Lambda will still reset every 4 hours and autoscaling must be taken into account.
- To avoid cold starts in case of concurrent calls from automatic autoscaling, make pools of Lambda instances kept warm as above; but you will need to determine an optimal number to avoid wasting resources.
Full post here, 11 mins read
Get these notes directly in your inbox every weekday by signing up for my newsletter, in.snippets().