Shift from Monolith to CQRS

ChunTing Wu - Jan 14 '22 - - Dev Community

Software design is an evolving process. Every large system starts from a tiny system. When a problem is encountered in the existing architecture but cannot be solved, the system will begin to evolve. Every evolution is accompanied by some technical selections. What problems should be solved? What price will it pay? As an architect or a senior engineer, there must find a reasonable way to evolve, regardless of the development schedule, technical stack, and team level, it is necessary to be able to meet these criteria before a feasible solution can be made.

This article will introduce the spirit of CQRS (Command Query Responsibility Segmentation) and the problems to be solved. We will start from a small monolith and evolving it like the evolution of every software system, and this article will introduce the reasons and approaches behind each evolution.

Traditional Monolith

Image description

This is the most common system design. There is an API server, usually restful API, and a database. Client negotiates the transmission format with the backend in advance. Both reading and writing are done through DTO, data transfer object. However, when the backend processes business logic, it converts DTO into a domain object with domain knowledge and uses the domain object as the storage unit of the database.

In order to achieve Read/Write Splitting, in the write path on the left, the client sends up DTO to the backend to perform CUD (create/update/delete) operation on the database, and the backend responses to the client with Ack for success and Nak for failure after processing. In restful API, usually 2xx represents success and 4xx represents failure. The read path on the right simply obtains the corresponding DTO through a read request.

I further explain the meaning of DTO for the client. DTO on the client usually contains all the data to render on the screen. For example, when you look at your profile on a social medium, it will include your name, account, and other personal information, as well as your own recent activity, and even the activity you followed. DTO contains all the information that needs to be presented on this page.

Why do we need to emphasize Read/Write Splitting? Can't we use the same procedure on both the read and write path? Because we want to better optimize our system in the future. The write path has a particular optimization method, and so is the read path. For instance, to make a cache, read aside caching can be used on the read path to reduce the response time. And, the write path can be improved by write through caching. Secondly, the writing may also be performed asynchronously. All the DTOs are written into the message queue and processed by the worker to handle the huge amount of written data. Moreover, each appropriate database may be used for writing and reading.

Therefore, Read/Write Splitting is essential. And it should be taken into consideration in the early stages of system design. The write path is to concentrate on data persistence; while the read path is to concentrate on data query.

Nevertheless, there are two main problems in this system design model.

  1. Anemic model. It is also known as CRUD model. When backend focuses on data conversion, it is difficult to have space to handle business logic, which will cause business logic to be scattered everywhere. Domain knowledge will also disappear, e.g., to an ecommerse website, we will say "purchase" instead of "inserting an order record".
  2. Insufficient scalability. From the perspective of system architecture, the database can easily become a bottleneck of the entire system. Both reading and writing must be onto it. The problem of RDBMS is even more serious due to no horizontal scaling.

Task-based Monolith

In order to solve the problems encountered by the above traditional monolith, here we try to introduce the concept of domains.

Image description

This diagram is basically the same as the above one. The only difference is replacing DTO with messages on the write path. Messages contain actions and data, not just the data itself like DTO. Thus, we can carry domain-specific actions in the message to make it easier for the backend to recognize each action, and have a corresponding domain implementation.

At this stage, C in CQRS has appeared, and message is a kind of commands. However, the problem of scalability is still unresolved.

In addition, although we have simplified DTO and changed to use messages to communicate, we still need DTO on the read path. Let's take the social medium as an example again. When modifying the nickname, the format of a message may be {"rename": "LazyDr"}. But when rendering the profile, we still need additional information such as activities. This information gap makes it necessary to do a lot of processing on the read path to retrieve DTO.

CQS (Command Query Segmentation)

The emergence of CQS is to solve the above pain points of Read/Write Splitting.

When reading, the client needs DTO, so the backend can do some optimizations dedicated to reading on the read path, such as pre-generating DTO from the original domain object, and storing DTO in a dedicated database for reading.

Image description

In this way, on the read path, the implementation of the application service becomes simpler. The application service can become a thin read layer, which only needs to be responsible for paging, sorting, etc. After requesting, the client can easily retrieve DTO from the database.

So the question is, who is going to generate those pre-built DTOs? It is the responsibility of the write path.

Image description

Although the diagram is similar with the examples seen before, in fact, in addition to persist the domain object, the application service must also persist DTO. In other words, most of the business logic will be pressed on the write path, and various read views need to be prepared.

At this stage, we have solved most of the problems encountered by the domain, but scaling still has no solution. Now, we further define scaling. Scaling has two different aspects.

  1. Traffic: increase in write volume.
  2. Extension: functional requirements increase, such as the need for a variety of different read views. Continuing to take the social medium as an example, there is one presentation on the profile, but there may be another presentation on the timeline.

CQRS

Why is the write path responsible for preparing the read view? Writing should focus on persistence, and those various read views should not be processed on the write path. But there is only reading on the read path, who should prepare those read views?

Therefore, the total solution is as follows.

Image description

The write path on the left and the read path on the right have been introduced in the CQS section. The only difference is that an eventually block is added, which is responsible for converting the database on the write path into the database used on the read path. Once data synchronization is involved, it is possible to encounter data consistency issues, so here is a list of several approaches for implementing eventually consistency, sorted by time consuming from short to long:

  1. Background thread: The typical representative is Redis. After the data is written to the primary, Redis will immediately send the data to the replicas in the background.
  2. Message queue plus workers: This is a common practice for asynchronous data replication. When writing to the database, an event is initiated into the message queue and processed by the workers.
  3. Extract-Transform-Load: This time interval is the longest, ranging from a few minutes to a few hours. Use map-reduce or other methods to write the results on the other side.

No matter which approach, the single source of truth is mandatory. That is to say if there is any failure occurred on converting, the system must be able to recover the unfinished jobs. Therefore, the data has to be unique and reliable.

Data usually laid on two types,

  1. state: State refers to what you see at the moment, such as the balance written on the bank passbook.
  2. event: An event is an action to modify each state, such as every transaction record on the bank passbook

Actually, we already have messages that can be stored as events. For the write path, it is very efficient to store messages in order. Through each different message, you can easily build a different read view according to your needs. This approach is also called event sourcing.

But only events are difficult to use efficiently. In order to obtain the final result, every conversion must be run from the beginning to the end to rebuild the read view. As a result, the hybrid method would be ideal. On the write path, both the state and the event are kept, and the conversion process can choose the data source based on the actual situation.

To sum up the whole life cycle of data in CQRS.

Image description

The data starts from the client and then enters to the backend in the command format. According to the business logic, it is converted into a domain object and stored in the database. These domain objects are converted into various read views and stored in different read special databases according to requirements. Finally, the client takes these read views back in the form of DTO.

Conclusion

There are many books and articles describing DDD and CQRS with many patterns. From my point of view, those patterns lead to the limitation on imagination of DDD like Entity, Value Object, Aggregate, etc. It results in most developers feel DDD is far from themselves and hard to realize as well as implementations. In fact, the concepts of DDD is not so complicated; instead, DDD is proposed to encapsulate the business logic and then facilitate expending the functional requirements.

CQRS is even simpler. In this article, we start from the process of system evolution, understand the entire system design process and the problems to be solved, and finally derive the conclusion of CQRS naturally.

There is no silver bullet in the system design. Each evolution is made for solving some specific problems, however, it may come out a new problem. Take the design processes of this article as an example, CQRS seems to resolve all mentioned problems, anemic model and insufficient scalability, but actually CQRS also brings new problems, such as data consistency. Each technical selection has its trade-off, as long as understanding all threats behind every option, you can choose relatively acceptable approaches.

Even if you choose CQRS, in practice, there are still three choices on implementing eventually consistency. System design is the result of continuous selection.

The purpose of this article is to tell you that DDD is not that scary, and CQRS is not that complicated, it is just a decision.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player