Working with many big data teams over the years, something that always shows up across all of them when they reach a certain scale is that they encounter a range of unexpected, lesser-known scalability issues. These problems aren't always obvious from the beginning but become pain points as systems grow. From hotspots in distributed databases and the thundering herd problem to data skew in search engines and write amplification in logging systems, these issues can seriously affect performance and reliability. Each of these challenges requires specific strategies to manage and mitigate, ensuring that your systems can handle increasing loads without degrading. Let's explore these lesser-known issues, understand their impacts with real-world examples I have seen or read about, and look at how to address them effectively.

TL;DR

Scaling applications isn’t just about throwing more servers at the problem. There are some tricky, lesser-known issues that can sneak up on you and cause big headaches. In this post, we’ll look at hotspots, the thundering herd problem, data skew, write amplification, latency amplification, eventual consistency pitfalls, and cold start latency. I’ll share what these are, real-world examples, and how to fix them.

1. Hotspots in Distributed Systems

What Are Hotspots?

Hotspots happen when certain parts of your system get way more traffic than others, creating bottlenecks.

Example

Let’s use MongoDB, a popular NoSQL database, as an example. Suppose you have a collection of user profiles. If certain user profiles are accessed way more than others (say, those of influencers with millions of followers), the MongoDB shards containing these profiles can get overwhelmed. This uneven load can cause performance issues, as those particular shards experience much higher traffic than others.

How to Mitigate

Sharding: Spread data evenly across nodes using a shard key that ensures even distribution. For MongoDB, choose a shard key that avoids creating hotspots.
Caching: Implement a caching layer like Redis in front of MongoDB to handle frequent reads of hot data.
Load Balancing: Use load balancers to distribute traffic evenly across multiple MongoDB nodes, ensuring no single node becomes a bottleneck.

2. Thundering Herd Problem

What Is this THP?

This issue occurs when many processes wake up at once and hit the same task, overwhelming the system.

Example

Picture a scenario with a popular e-commerce site that uses a cache to speed up product searches. When the cache expires, suddenly all incoming requests bypass the cache and hit the backend database simultaneously. This sudden surge can overwhelm the database, leading to slow responses or even outages.

How to Mitigate

Request Coalescing: Combine multiple requests for the same resource into a single request. For instance, only allow one request to refresh the cache while the others wait for the result.
Rate Limiting: Implement rate limiting to control the flow of requests to the backend database, preventing it from being overwhelmed.
Staggered Expiry: Configure cache expiration times to be staggered rather than simultaneous, reducing the likelihood of a thundering herd.

3. Data Skew in Distributed Processing

What Does This Mean?

Data skew happens when data isn’t evenly distributed across nodes, making some nodes work much harder than others. This I believe is quite common, because most of us love to spin up systems and expect data is evenly accessed at scale, lets see an example.

Example

Consider Elasticsearch or Solr, which are commonly used for search functionality. These systems distribute data across multiple shards. If one shard ends up with a lot more data than others, maybe because certain keywords or products are much more popular, that shard will have to handle more queries. This can slow down search responses and put a heavier load on specific nodes.

Imagine you're running an e-commerce site with Elasticsearch. If most users search for a few popular products, the shards containing these products get hit harder. The nodes managing these shards can become bottlenecks, affecting your entire search performance.

How to Mitigate

Partitioning Strategies: Use strategies like hash partitioning to distribute data evenly across shards. In Elasticsearch, choosing a good shard key is crucial.
Replica Shards: Add replica shards to distribute the read load more evenly. Elasticsearch allows for replicas to share the load of search queries.
Adaptive Load Balancing: Implement dynamic load balancing to adjust the distribution of queries based on current loads. Elasticsearch provides tools to monitor shard load and re-balance as needed.

4. Write Amplification

What Is It?

Write amplification occurs when a single write operation causes multiple writes throughout the system.

Example

In a typical logging setup using Elasticsearch, writing a log entry might involve writing to a log file, updating an Elasticsearch index, and sending notifications to monitoring systems. This single log entry can lead to multiple writes, increasing the load on your system.

How to Mitigate

Batching: Combine multiple write operations into a single batch to reduce the number of writes. Elasticsearch supports bulk indexing, which can significantly reduce write amplification.
Efficient Data Structures: Use data structures that minimize the number of writes required. Elasticsearch’s underlying data structure (based on Lucene) is optimized for write-heavy operations, but using it effectively (like tuning the refresh interval) can further reduce amplification.

5. Latency Amplification

What Does This Mean?

Small delays in one part of your system can snowball, causing significant overall latency.

Example

In a microservices architecture, a single slow microservice can delay the entire request chain. Imagine a web application where an API call to fetch user details involves calls to multiple microservices. If one microservice has a 100ms delay, and there are several such calls, the total delay can add up to several seconds, degrading user experience.

How to Mitigate

Async Processing: Decouple operations with asynchronous processing using message queues like RabbitMQ or Kafka. This can help avoid blocking calls and reduce the cumulative latency.
Optimized Querying: Speed up database queries with indexing and optimization techniques. In our example, ensure that each microservice query is optimized to return results quickly.
Circuit Breakers: Implement circuit breakers to prevent slow microservices from affecting the entire request chain.

6. Eventual Consistency Pitfalls

What Does This Mean?

In distributed systems, achieving immediate consistency is often impractical, so eventual consistency is used. However, this can lead to issues if not managed correctly.

Example

An e-commerce site might show inconsistent stock levels because updates to the inventory database are only eventually consistent. This could lead to situations where customers are shown inaccurate stock information, potentially resulting in overselling or customer dissatisfaction.

How to Mitigate

Conflict Resolution: Implement strategies to handle conflicts that arise from eventual consistency. Using CRDTs (Conflict-free Replicated Data Types) can help resolve conflicts automatically.
Consistency Guarantees: Clearly define and communicate the consistency guarantees provided by your system to manage user expectations. For example, explain to users that stock levels might take a few seconds to update.

7. Cold Start Latency

What Is It?

Cold start latency is the delay that happens when an application or function takes time to initialize.

Example

In serverless architectures like AWS Lambda, functions that haven't been used in a while need to be re-initialized. This can cause a noticeable delay in response time, which is particularly problematic for time-sensitive applications.

How to Mitigate

Warm-Up Strategies: Keep functions warm by invoking them periodically. AWS Lambda offers provisioned concurrency to keep functions warm and ready to handle requests immediately.
Optimized Initialization: Reduce the initialization time of your functions by optimizing the startup processes and minimizing dependencies loaded during startup. This can involve reducing the size of the deployment package or lazy-loading certain dependencies only when needed.

Something I have noticed about all these issues is that they often stem from the fundamental challenges of distributing and managing data across systems. Whether it’s the uneven load of hotspots and data skew, the cascading delays of latency amplification, or the operational overhead of write amplification and cold starts, these problems highlight the importance of thoughtful architecture and proactive monitoring. Addressing them effectively requires a combination of good design practices, efficient use of technology, and continuous performance tuning.

Some Lesser-Known Scalability Issues in Data-Intensive Applications

TL;DR

1. Hotspots in Distributed Systems

What Are Hotspots?

Example

How to Mitigate

2. Thundering Herd Problem

What Is this THP?

Example

How to Mitigate

3. Data Skew in Distributed Processing

What Does This Mean?

Example

How to Mitigate

4. Write Amplification

What Is It?

Example

How to Mitigate

5. Latency Amplification

What Does This Mean?

Example

How to Mitigate

6. Eventual Consistency Pitfalls

What Does This Mean?

Example

How to Mitigate

7. Cold Start Latency

What Is It?

Example

How to Mitigate