Before explaining the process of importing HTAP into Xiaohongshu, let's briefly introduce Xiaohongshu.
Xiaohongshu is known as the Chinese version of Instagram, but in addition to the usual social functions such as video and image sharing, it also includes an e-commerce platform. According to the official website of Xiaohongshu, as of 2019, active users have exceeded 100 million.
It is both a social media and e-commerce website, and the amount of data generated each day is in the billions. In order to support data-driven decision making, these massive amounts of data must go through multiple layers of cleansing, transformation and aggregation, and more importantly, the real-time nature of decision making also need to address.
In addition, Xiaohongshu's application scenarios and user numbers continue to grow, so the scalability of the entire data infrastructure is also a major challenge.
In the end, Xiaohongshu adopted TiDB as their real-time streaming storage.
Why Xiaohongshu chose TiDB?
In fact, Xiaohongshu already adopted TiDB for some projects in 2017, but it was widely used until 2018.
The three main reasons are as follows.
- Due to data-driven decision-making, there will be various OLAP needs of ad hoc queries, so the traditional OLTP database can not support the use.
- Many OLAP databases can handle queries efficiently, but the write performance is not good, while TiDB write also has great performance.
- In the scenario of horizontal scaling of data clusters, new instances must be available as fast as possible. TiDB keeps datasets consistent across all instances through a Raft majority mechanism.
Now, TiDB has been widely used in various domains of Xiaohongshu, including
- Data warehouse application
- Report Analysis
- Real-time dashboard for sales promotion
- Anti-fraud detection
- etc.
Past Pain Points
In general, the entire data architecture is as follows.
There are three typical use cases in such an architecture with poor performance.
Firstly, ad hoc queries are performed on the production environment database. To avoid increasing the overhead on the production database, a complete copy of the production database must be made, and all ad hoc queries are run on the replica.
However, the MySQL for production uses sharding, which creates a high complexity in operation. In addition to the difficulty of operation, the implementation of aggregation, such as group by
, join
, etc., on the middleware of sharding also leads to the complexity of implementation, and on the other hand, the transactions on the sharding MySQL is also a big challenge.
Secondly, in order to query reports more efficiently, report analysis is generated through pre-aggregation and stored in a standalone MySQL to avoid the latency of Hadoop direct query. However, since it is pre-aggregated, it cannot effectively respond to requirement changes. Moreover, a standalone MySQL has less scalability due to the lack of sharding.
Finally, to perform anti-fraud detection in various application scenarios, it must rely on the data stored in the data lake, which is not real-time, but T + 1
ingesion. This makes it impossible for anti-fraud to work within time, and even if fraud is detected, the damage is already done.
Solution
The answer to these three cases is TiDB.
In the use case of ad hoc query, there are two pain points must be dealt with, one is the difficulty of operation, and the other is the complexity of implementation. Then, just put all the data into TiDB and maintain one TiDB. Because TiDB fully supports MySQL 5.7, so it can be synchronized by MySQL binlog.
Of course, it is not as simple as that. To put the sharding databases into TiDB, it needs to process the data properly, i.e. merge them into one big wide table. Therefore, we need to deal with the issues of merging and auto-incrementing primary keys in the streaming. But eventually, all the production databases will be synchronized with TiDB, and even for the data volume of Xiaohongshu, the synchronization delay is within 1 second.
Then, the same practice can be applied to report analysis and anti-fraud.
Because of real-time streaming, anti-fraud no longer takes T + 1
time to respond, and real-time streaming can be handled directly through Flink SQL. On the other hand, report analysis is easier because all the data is available and can be operated on a single database.
New Problem Occurred
When TiDB was first introduced, the version of TiDB used in Xiaohongshu was 3.x
.
TiDB 3.x
is still a row-based storage engine, and although it can achieve good performance in OLAP, it is still not good enough for large data volume. Furthermore, all scenarios use the same TiDB cluster which cannot isolate ad hoc queries and production applications.
But these problems are solved after upgrading to TiDB 4.x
version.
The reason is TiDB 4.x introduces the HTAP architecture, which enables the coexistence of row storage engine and column storage engine. The new column storage engine called TiFlash is independent from the original row storage engine TiKV and will not interfere with each other.
When the application writes data to TiKV, the data is quickly synchronized to TiFlash through the Raft learner mechanism, so that the row storage and column storage are consistent.
In addition, TiDB uses internal CBO to know whether the query should use the row or column engine as soon as it comes in and automatically routes the query, which greatly reduces the complexity of the application.
TiDB 4.x also has a new mechanism that helps improve the performance of the entire data architecture: pessimistic locking. In TiDB 3.x, there was only optimistic locking, which made it easily possible for inter-transaction conflicts under huge data volume of continuous writes.
Once a transaction conflict occurs, it can only be retried on the client side, which significantly increases the complexity of application development. But pessimistic locking can effectively solve this problem, and the error handling of the application becomes more simple.
Conclusion
Actually, Xiaohongshu has also referred to ClickHouse when making technical selections, and the performance of ClickHouse has some better than TiDB. However, ClickHouse is relatively difficult to operate and maintain.
In addition, ClickHouse is a column storage engine, in the data update scenario performance is not good. If rewriting "update" to " append", it needs a lot of modification cost on one hand and extra de-duplication logic on the other.
Therefore, TiDB is the final choice.
From my point of view, HTAP is the future and can cover various use cases by merging row storage engine and column storage engine. It is cost effective to use only one database to handle various needs with big data.
In the past, in order to support a variety of data products, we had to continuously make technical selections among many databases and carefully consider various use cases, but with HTAP, these complexities no longer exist.
Personally, I am looking forward to seeing the gradual simplification of data engineering.