As a professional software developer focused on building distributed systems for much of my professional career, I have always been fascinated by the unique challenges of designing enterprise-scale applications. This passion led me to fulfill my dream of working in Big Tech, including helping to build Microsoft Azure and working on distributed storage at Facebook. During my tenure there, I led hundreds of candidates through System Design Interviews.

When I moved on from FAANG to launch Educative, I wanted to make System Design skills training a focus for two reasons:

I wanted to help give engineers a leg up in their career journey
I wanted to stay connected to the discipline of System Design

To continue sharing System Design best practices and supporting the developer community, I contributed to the design and development of Educative's now very popular flagship System Design course.

From interviewing candidates at Meta and Microsoft to building tech skills courses for software developers, I have seen one constant: even the best engineers often struggle with understanding and designing systems to meet nonfunctional requirements (NFR). I get it — it can be difficult managing critical trade-offs around nonfunctional requirements like scalability, availability, good performance, security, etc — especially in a stressful interview setting.

Take this key System Design Interview question for example:

How can you design a scalable and performant e-commerce website that can handle millions of requests per second?

While the engineer will be able to design the system to meet all functional requirements, making the design scalable and still achieving low latency on requests will remain a challenge.

Today, through this blog, I’ll share a few essential strategies for how to meet nonfunctional requirements in your designs. These strategies will prepare you to confidently navigate System Design interviews at top tech companies.

Let's get started!

Please note that identifying and achieving an NFR are two different things. In this blog, our focus will be on achieving, not identifying nonfunctional requirements. To learn how to identify NFRs and distinguish them from the functional requirements for various System Designs, I recommend exploring our comprehensive Grokking Modern System Design Interview course, where we discuss NFRs of the various design problems in detail.

Common nonfunctional requirements

Let's discuss common nonfunctional requirements that interviewers focus on and learn how to meet them effectively during System Design interviews. The common nonfunctional requirements that I will address in this blog are:

Performance
Availability
Scalability

1) Performance

Performance is a known NFR that determines the system's ability to respond to user requests and process data efficiently. For example, when designing a messaging service, the interviewer might ask questions like: How to deliver messages with low latency? To achieve low latency, candidates need to choose an efficient two-way communication protocol. For this, they can opt for WebSocket. This is a single example of achieving performance. We will see more examples in the upcoming approaches section.

Let's look at different approaches to achieve performance.

Approaches to achieve performance

Caching: Implementing a good cache mechanism is one of the methods to achieve performance. It stores frequently accessed data and reduces the need for repeated computations, which minimizes user-perceived latency.

Let's assume an X (formerly Twitter)-like system where a service is dedicated to generate the timeline. Let's call it the timeline service. Now, the question that interviewers commonly ask related to timeline is: Does the timeline service generate a timeline of each celebrity's followers when a celebrity posts something? As celebrities have millions of followers, it impacts the performance of the system to create timelines for all followers.

In order to address this question, we need to analyze the followers first. Not every follower uses X all the time. As a first step, we can divide those followers into active users and inactive users. For inactive users, the timeline service will not generate the timeline instantly. For active users, we will introduce a cache. Let’s call it the feed cache. This cache pre-populates the timeline for active users. When active users request a timeline, the timeline service immediately retrieves the timeline from the feed cache and appends the celebrity post, returning it to the client with minimal latency.

Additionally, a cache mechanism is usually implemented in each system layer to ensure decoupling and low latency.

Algorithm/Data structure selection: Choosing efficient algorithms and data structures is another approach to increase the performance of the system. An efficient algorithm or data structure minimizes processing time and improves overall system performance. For example, I asked a candidate which data structure would be suitable to efficiently and frequently (assume every 4 seconds) store a driver’s position in the ride-hailing system.

The candidate replied that the Quadtree data structure was a suitable option here to ensure performance; it receives data from the driver every 4 seconds, and the driver relocates within Quadtree, based on the new location.

But my next follow-up question to the candidate was: If we update the Quadtree every 4 seconds, then the computational overhead increases and ultimately leads to latency. So is using a Quadtree data structure to store the driver's position every 4 seconds the right choice?

So here's my advice for tackling such a situation: It's important to think about comparing different ways of designing systems, and maybe combining some approaches will give you an optimized approach.

You may encounter more such challenging questions in the interview regarding performance of the ride hailing system. For more details, explore our Uber System Design lesson.

Load balancing: Distributing incoming traffic evenly among different servers (load balancing) is another strategy to achieve high performance. For example, with millions of users on an e-commerce website, multiple requests can arrive in a second. Load balancing is useful in such situations to reduce the load on a server and give it only the load it can handle. Load balancers distribute user requests across multiple servers to prevent bottlenecks and server performance from going down.

2) Availability

The system's availability is another nonfunctional requirement that describes how effectively it maintains users' accessibility and uptime. Generally, a system with 99.999% uptime is considered good availability. This percentage of availability is equivalent to less than 6 minutes of downtime per year. Achieving this amount of availability is very challenging. However, providing 99.999% availability helps retain a large number of users.

For example, when designing an online shopping website, availability is critical since customers frequently use the site to browse products, make purchases, and track orders. Any downtime might lead to lower sales and disappointed customers.

Let's discuss some general approaches to achieve availability.

Approaches to achieve availability

Redundancy: One way to meet availability is by replicating key components and data across numerous servers and data centers. By doing this, we ensure that if one server fails or traffic is high, the load balancer can automatically reroute requests to an alternate backup server. Additionally, implementing redundant components across multiple layers (servers, databases, and networks) can prevent a single point of failure.

Fault tolerance: During a discount sale on a shopping website, one of the key database nodes in a specific region suffers a hardware breakdown. This node handles a considerable amount of the user's activities in this region. In such scenarios, our system must be fault-tolerant, which means it will continue to work even if one or more components fail. We can achieve this tolerance by using redundant components and failover methods that automatically switch traffic from the failed component to the backup component.

Rate limiting: Another approach to achieve availability is rate limiting. The rate limiter restricts the amount of requests that a service can handle. Setting rate limits to control the number of requests a user can make to prevent system overload. For example, on a social media platform, a system overload would occur when users like posts, play videos and follow others at a higher rate than usual. Without rate limiting, this sudden increase in activity can overwhelm the system, leading to system failure.

CDNs: CDNs are cache servers distributed in different regions which not only improve performance but also increase system availability by putting less load on origin servers. Deploying servers in multiple geographical locations to ensure that regional outages do not affect overall system availability. Also, they reduce latency for users in different locations.

Stress testing and monitoring: Another way to ensure availability is stress testing. It is performed to determine how the system behaves under peak load conditions, allowing us to identify breaking points and ensure the system can handle sudden traffic spikes. This will prepare the system for availability after testing. Additionally, implementing monitoring can allow us to track system performance and detect anomalies in real time.

3) Scalability

System scalability describes how a system expands to handle increasing numbers of users while maintaining performance. For example, an interviewer might ask questions about how to design a service like YouTube that can accommodate millions of users uploading and watching videos simultaneously — or designing a URL shortening service capable of handling billions of queries every day.

To address these questions about scalability in interviews, let's look at different approaches.

Approaches to achieve scalability

Manual scaling: One approach to scale applications is manual scaling. It involves either upgrading hardware on existing machines (vertical) or adding more machines (horizontal).

Vertical scaling (hardware upgrades): Add more resources (RAM, CPU, storage) to existing machines for smaller demands. It’s easier to manage since we aren’t adding to the total number of machines.
Horizontal scaling (adding machines): Increase the number of machines to distribute the workload for larger demands. In general, horizontal scaling is considered to be the preferred option for large-scale applications because it does not have a single point of failure and also supports load balancing, compared to vertical scaling.

Automatic scaling: Dynamically adjust resources (storage, processing power) based on demand to handle traffic spikes. This can be achieved using a cloud computing technique called Auto Scaling.
Sharding: Another approach to achieve scalability by dividing the database into shards to distribute the load across multiple servers. In this way, we distribute the data load between multiple servers. Key-range and hash-based sharding are common techniques to perform shading for databases.
Modular design: Break down the system into smaller, independent components so that each service can scale independently according to demands without affecting the performance of other services.

Cache and CDNs: Caches store frequently accessed data in memory to reduce response time and database load. CDNs, on the other hand, are used to distribute static content to users rather than retrieving that data from the origin, further reducing the load from the server. By using caching and CDN, the system can manage a high number of user requests without experiencing slower performance.

If you’re interested in exploring other nonfunctional requirements like reliability, maintainability, and security in greater depth, check out our Grokking the Modern System Design Interview course.

Now that we’ve understood different approaches to achieve NFRs including performance, availability and scalability, let's practice them by taking a deeper look at Google Maps and YouTube System Design.

Acing NFRs: Google Maps and YouTube

Let’s explore nonfunctional requirements for Google Maps and YouTube System Design problems.

Design Google Maps

Designing a navigation system like Google Maps involves allowing users to identify their current location, find optimal routes based on specified destinations, and provide detailed turn-by-turn directions for seamless navigation.

Considering the following nonfunctional requirements of Google Maps, let’s describe the strategies to achieve them:

High availability: The design of Google Maps includes a large road network graph. If we hosted this graph on a server, it would definitely crash due to its large size and high user demands. To ensure availability, we divide the graph into smaller graphs or segments and host them on different servers. By replicating these servers, we eliminate single points of failure and use a load balancer to offload incoming user requests to multiple segment servers.
Scalability: To scale our Google Maps system, we use a distributed system where we host each segment on a different server to serve user requests for different routes from different segment servers. Thus, we can serve millions of user requests. As we are using a modular design here, we can easily add more segments to handle more data.

Let's summarize the strategies we used to achieve Google Maps NFRS:

Are these the only nonfunctional requirements for Google Maps? What about the performance? How does our design ensure minimal response times? Think of a solution, and then explore the nonfunctional requirements of Google Maps in detail to deepen your understanding.

Design YouTube

Designing a video streaming platform like YouTube involves enabling users to stream videos, upload videos, search for videos by their titles, and like/dislike videos.

Considering the following nonfunctional requirements of YouTube, let’s describe the strategies to achieve them:

**Minimal response times: **To ensure the performance of YouTube's design, we use different caching servers at the ISP level and CDN level to serve the most viewed content with the fastest response times. At the same time, choosing an appropriate storage system for different types of data, such as using Bigtable to store thumbnails and Blob storage to store videos, can reduce latency. We prefer to use a Lighttpd-based web server where users upload their videos as it processes such content faster and provides a smoother user experience.
Reliability: To make the system highly reliable, we use data sharding to ensure that if one type of data is unavailable, it will not affect the others. We replicate critical components to achieve system fault tolerance and eliminate faulty servers by monitoring their health using heartbeat messages.

The next question that comes to mind is how YouTube manages an increasing number of users and data storage. Which strategies would make our YouTube design scalable and available? Think about the solution and then explore how to meet YouTube's nonfunctional requirements in detail to enhance your understanding.

Quick tips for NFR interview questions

Proactively ask questions to clarify nonfunctional requirements during the interview. For example:

Expected user traffic
Expected data load
Expected downtime tolerance

Evaluate trade-offs between different techniques, such as system complexity, cost, and maintainability.

Prepare a list of commonly asked questions with their solutions. For example:

For reliable transaction processing—choose ACID-compliant relational databases.
For large data applications—choose NoSQL databases like MongoDB or Cassandra to achieve scalability.
Real-time data processing and analytics—choose platforms like Apache Kafka, Amazon Kinesis, etc.

Remember, there is no one-size-fits-all solution. Every design decision involves trade-offs. As a designer of scalable systems, your ability to weigh these trade-offs is critical. Ask the interviewer clarifying questions, consider the NFRs carefully, and make informed choices to create a robust system design.

What’s next

In this blog, I have attempted to demonstrate the importance of nonfunctional requirements and how to address them in System Design interviews. By understanding common NFRs and practical strategies for solving them, you will be better prepared to address NFR-related questions during your interview.

I highly recommend the following courses for hands-on practice achieving non-functional requirements and preparing for a challenging interview at FAANG/MAANG companies.

These are the resources I would have had back when I was interviewing at Big Tech.

Happy learning!

Guide to nonfunctional requirements for System Design Interviews