Step 1: Functional Requirement + Non Functional Requirement & Problems Consideration:
Functional Requirement
- Allow users to chat over the internet.
- Provide support for one-on-one and group chats.
- Messages need to be stored for better viewing.
- Messages need to be encrypted for security purposes.
- Should have image, video and file-sharing capabilities
- Show indicate last seen time of users
- Should indicate read/receipt of messages
What are some of the common problems that can be encountered?
- What would happen to a message if it is sent without an internet connection?
- Will encrypting and decrypting increase the latency?
- How are the messages sent and notified to the device?
Possible Tips for consideration:
- Split database schema into multiple tables such as user table, chat table, massage table etc.
- Make use of web sockets for bi-directional communication between the device and the server.
- Make use of push notifications for notifying the members even if they are online.
Non Functional Requirements
- Should have very very low latency
- Should be always available
- There shouldn’t be any lags
- Should be highly scalable
Step 2: Propose high-level design and get buy-in
To develop a high-quality design, we should have a basic knowledge of how clients and servers communicate. In a chat system, clients can be either mobile applications or web applications. Clients do not communicate directly with each other. Instead, each client connects to a chat service, which supports all the features mentioned above. Let us focus on fundamental operations. The chat service must support the following functions:
Receive messages from other clients.
Find the right recipients for each message and relay the message to the recipients.
If a recipient is not online, hold the messages for that recipient on the server until she is online.
When the sender sends a message to the receiver via the chat service, it uses the time-tested HTTP protocol, which is the most common web protocol. In this scenario, the client opens a HTTP connection with the chat service and sends the message, informing the service to send the message to the receiver. The keep-alive is efficient for this because the keep-alive header allows a client to maintain a persistent connection with the chat service. It also reduces the number of TCP handshakes. HTTP is a fine option on the sender side, and many popular chat applications such as Facebook [1] used HTTP initially to send messages.
Polling
Long Polling
Web-Socket:
WebSocket is the most common solution for sending asynchronous updates from server to client.
WebSocket connection is initiated by the client. It is bi-directional and persistent. It starts its life as a HTTP connection and could be “upgraded” via some well-defined handshake to a WebSocket connection. Through this persistent connection, a server could send updates to a client. WebSocket connections generally work even if a firewall is in place. This is because they use port 80 or 443 which are also used by HTTP/HTTPS connections.
High-level design
Just now we mentioned that WebSocket was chosen as the main communication protocol between the client and server for its bidirectional communication, it is important to note that everything else does not have to be WebSocket. In fact, most features (sign up, login, user profile, etc) of a chat application could use the traditional request/response method over HTTP.
Capacity Estimation
- 10 billion messages are sent per day by 1 billion users.
- At peak traffic, there are 700,000 active users per second.
- At peak traffic, there are 40 million messages per second.
- On average, each message has 160 characters, resulting in 1.6 TB (10Billion * 160) of data per day.
- The system is expected to be in service for 10 years, requiring approximately 6 PB (10 * 1.6B * 365) of storage.
- The application will consist of multiple microservices, each performing a specific task. Let's assume that the latency for sending a message is 20 milliseconds, and that each server can handle 100 concurrent connections. Based on these assumptions, we would need a fleet of 8000 (40M * 20ms / 100) servers to support the chat service.
Storage
Two types of data exist in a typical chat system. The first is generic data, such as user profile, setting, user friends list. These data are stored in robust and reliable relational databases. Replication and sharding are common techniques to satisfy availability and scalability requirements.
User profile, setting, user friends list : Relational DB
Chats: No SQL
Data models:
1 on 1 chat flow
User A sends a chat message to Chat server 1.
Chat server 1 obtains a message ID from the ID generator.
Chat server 1 sends the message to the message sync queue.
The message is stored in a key-value store.
5.a. If User B is online, the message is forwarded to Chat server 2 where User B is connected.
5.b. If User B is offline, a push notification is sent from push notification (PN) servers.
- Chat server 2 forwards the message to User B. There is a persistent WebSocket connection between User B and Chat server 2.
Assets and Images:
• When you upload an asset, the image/video is compressed first. It will produce some hash of the content.
• The Assets service will upload data to S3.
• When the user uploads the same image again, it will check if that image is already present on S3.
• If the same hash is present, it will return the same
• Multiple levels of hashing are used to get the correct hash and avoid collisions.
Analytics:
• We have analytic services to handle all analytic information.
• We have Apache Spark in place.
• We have these consumers on top of Kafka.
• We use Kafka and Casandra
Article: https://bytebytego.com/courses/system-design-interview/design-a-chat-system