Design an E-commerce Website from a High-level Perspective

ChunTing Wu - Apr 18 '22 - - Dev Community

Last two weeks, there was a gathering of architects, in which we talked about e-commerce websites actually have some regular patterns, so if you want to quickly provide a high-level design, how would you do it?

This article is a record of that discussion, I will list our ideas based on my impressions, and then present the thinking of architects on this classic design pattern.

Define Basic Domain Services

Usually, for a startup system or architecture, I would recommend starting with a monolith. After constructing the MVP (minimum viable product), the architecture evolves depending on the functional and non-functional requirements to see whether it should be decomposed into microservices or the more popular Service-oriented architecture a few years ago.

But in any case, there will be a monolith as the basis for a quick launch and validation of the idea.

However, for an e-commerce site, there is no need for quick validation because the success of an e-commerce site depends on the products and exposure strategy rather than the features. Therefore, for e-commerce sites, it is more important to be able to solve the problems that will be encountered in the foreseeable future.

Here are a few practical examples.

With the rapid growth of users, the demand for user management will increase significantly, so it is necessary for user-related functions to become a separate service to facilitate fast iteration of the development cycle.

Inventory, as the number of users on the site rises, the product-related features will drive most of the traffic. Customers are always looking for the products they want on the site and viewing the inventory, so in order to handle the high volume of traffic, it must be able to scale horizontally on its own. Therefore, inventory will also become a standalone service.

Another core function of an e-commerce website is the purchase and order management. When a customer places an order, external services such as cash flow and logistics must be integrated behind the scenes, and new functions such as connecting different cash flows need to be implemented continuously. However, this development cycle is obviously different from user-related functions, and the need for testing and integration is completely different from the user services, that is, the order must also become a service.

Finally, we will define basic domain services as follows.
Image description

Define Databases

In the previous section we mentioned that we have three independent services, User, Order and Inventory. Then we had to decide what database they should be using.

For a system that is just starting up, I always recommend using the most mature technology that the team is most familiar with, while taking into account the various use cases.

For this reason, I believe MySQL should be the best choice for services like Order Service and Inventory Service that require strong consistency. Although MySQL has always been criticized for its lack of horizontal scalability, there have been many new breakthroughs in recent years, and more and more distributed SQL is appearing. If the traffic of an e-commerce site rises to a bottleneck, then the pain of migrating from MySQL to distributed SQL will not be too much, after all, the applications barely need to be modified.

In the case of User Service, this is more interesting. User-related management functions usually do not have strong consistency requirements, and user information and other data structures are varied and plentiful. As a document-based database, MongoDB can support various kinds of rich data presentation and provide a lot of management convenience. Of course, you can also use MySQL for User Service, but you need to put a little more effort into data normalization.

Besides, if you use MongoDB to implement User Service, even if you have the need of strong consistency in the future, for example, if you want to give customers discount coupons, MongoDB can still provide enough strong consistency guarantee, since MongoDB also has transactions. However, it must be said that MongoDB transactions are not easy to code, which will increase the development effort.

Define Communications

Now we have three services in charge of their domain, but how do they communicate with each other? For instance, when a customer selects a product, then dispatches the order, billing and shipping, the three services must communicate with each other to complete the scenario.

From my point of view, I would recommend that these three services be asynchronous. Although the overall process must ensure consistency, which dramatically increases the implementation effort, only asynchronization can tolerate high availability and high scalability. Availability and scalability are mandatory non-functional requirements for running an e-commerce site.

For example, in the Black Friday promotion, there will be a large number of customers rushing to the website and placing a large number of orders, assuming that the three services communicate with each other synchronously, then the three services must scale at the same rate, which is obviously unreasonable.

In addition, when an order must be served by all three services, and any one of them fails temporarily or the network is temporarily down, the whole order will fail. In other words, only through asynchronous communication can we ensure the availability and scalability required for e-commerce sites.

At this point, we had a prototype of our system architecture.
Image description

However, there is one component that we haven't decided yet, and that is what kind of queue system should be used for asynchronization. In my opinion, we should use Kafka. Why not use a pure message queue like RabbitMQ? One important reason is that, Kafka has higher throughput when it comes to handling large traffic from a promotion event.

Moreover, for behaviors like order and payment, we all know it is sequential. To preserve the message order while a large number of messages are flooding the queue, it is necessary to scale horizontally through Kafka's consumer groups so that messages can be handled faster and in order at the same time.

Sketch Workflow

Once we have a simple system architecture, we begin to sketch out the entire order placement process.
Image description

At the beginning, the customer places an order with the inventory service, and the inventory service only handles inventory-related tasks, e.g., pre-deducting inventory. The purpose of reducing inventory before the payment is to avoid competition when a large number of users are consuming, and if the inventory is not actually reduced until the last stage, then there is a high risk that a consumer will buy air.

When the inventory service has finished processing, the order service is notified that it has taken over the entire transaction while informing the customer that the order has been placed. At this moment, only the order is established, and the payment must be redirected through the frontend so that the customer can properly communicate with the order service to proceed the billing.

When the payment is completed, the order service will notify the user service. User service can then proceed with user-related tasks, like coupon issuance or user level upgrade. Finally, the inventory service and the order service are informed that the transaction has been successfully completed.

Or if there is an error in the order service or user service, a message must be sent to inform the other participating services. Then the service that receives the notification can handle it accordingly, for example, the inventory service must add back the pre-reduced inventory.

But, Not Enough

Readers who have been subscribing to my articles should have seen this is a very typical distributed transaction, and the proper approach to distributed transaction is to always be resilient, i.e., resilient to the errors.

As I described in my previous article, to handle distributed transactions in an event-driven architecture, an arbiter (crontab) is desired. The crontab detects periodically which events have not been handled correctly and takes corresponding actions to repair them.

The specific details will not be explained again in this article. But we can know in addition to the three domain services, there is another service that monitors the entire workflow.

But, Still Not Enough

Because the entire transaction process is asynchronous, the user must be able to know exactly what the status of the order is, what stage it is at, and so on. Therefore, it is also necessary to have a service that subscribes to the messages sent by each service, and has its own database to record and store the status of all transactions.

This approach is also known as CQRS. The entire workflow is observed through a standalone cross-domain service, but unlike an arbiter, this observer does not actually participate in the workflow, nor does it make any kind of changes to the workflow.

The observer exists only to allow the customer to get a proper overview of the workflow.

Conclusion

E-commerce websites can be considered as the textbook of all backend architectures. All kinds of problems that backend engineers need to deal with will appear on e-commerce websites, but the books currently available on the market often only provide a monolithic solution, and focus on how to implement those functional requirements.

But as backend experts, we all know using a monolith to run an e-commerce site is ultimately a dead end. Unless the product is completely unsellable, it will sooner or later face the challenge of traffic and availability.

Although this article is a casual conversation among architects, it is actually a look at the challenges we encounter on a daily routine from a different perspective. Why does a startup e-commerce site need to have five services?

  • To add new features more quickly and to have independent integration and testing processes, microservices must be segmented into domains.
  • To have high availability and scalability, the coupling between domains must be reduced.
  • For better fault tolerance, a resilient error recovery mechanism must be established.
  • For a better user experience, a comprehensive presentation of the whole picture must be implemented.

Previously, I often mentioned the drawbacks of microservices, eg Original Sin of Microservices, Part 1 and Original Sin of Microservices, Part 2.

But this does not mean I totally reject microservices, instead I hope we can all face microservices with a correct attitude, not as the holy grail but also not as the devil.

The right trade-off is always the most important core concept of software architecture.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player