Name: Implementing Kafka to Run on S3 with a Hundred Lines of Code
Rating: 4.4 (6828 reviews)
Author: automq

TL;DR
Yes, you read that correctly. AutoMQ[1] currently supports being fully built on object storage like S3. You can refer to the quick start guide[3] to get started immediately. AutoMQ, with its existing stream storage engine, achieves features that other competitors pride themselves on by extending the top-level WAL abstraction with minimal code, enabling the entire stream system to be built on object storage like S3. Notably, we have made this part of the source code fully open, allowing developers to use the S3Stream[2] stream storage engine to easily deploy a Kafka service entirely on object storage in their environment, with extremely low storage costs and operational complexity.

The core stream storage engine of AutoMQ can achieve this capability so effortlessly due to its excellent top-level abstraction around WAL and shared storage architecture design. It is precisely based on this excellent top-level abstraction that we have implemented the highly innovative S3Stream[2] stream storage engine. In this article, we will share the design details of AutoMQ's shared stream storage engine, the underlying considerations, and the evolution process. After reading the previous content, you will truly understand why we say that only a hundred lines of code are needed to run Kafka on S3.

Embarking from the Shared Storage Architecture
Over a decade ago, Kafka emerged in an era where Internet Data Centers (IDC) were the primary scenarios. At that time, compute and storage resources were typically tightly coupled, forming an integrated Share-Nothing architecture. This architecture was highly effective in the physical data center environments of that period. However, as Public Cloud technology matured, the limitations of this architecture became apparent. The tightly coupled compute-storage nature of the Share-Nothing architecture made it impossible to decouple the storage layer completely and offload capabilities such as durability and high availability to cloud storage services. This also meant that the Share-Nothing architecture could not leverage the technical and cost benefits of scalable cloud storage services. Furthermore, the integrated compute-storage architecture made Kafka lack elasticity and difficult to scale. When adjusting Kafka cluster capacity, it involves substantial data replication, which affects the efficiency of capacity adjustments and impacts normal read and write requests during this period.

AutoMQ is committed to fully leveraging the advantages of the cloud, adhering to a Cloud-First philosophy. Through a shared storage architecture, AutoMQ decouples data durability and offloads it to mature cloud storage services like S3 and EBS, thereby fully exploiting the potential of these cloud storage services. Problems such as lack of elasticity, high costs, and complex operations associated with Kafka due to the Share-Nothing architecture are resolved under AutoMQ's new shared storage architecture.

Stream Storage Top-Level Abstraction: Shared WAL + Shared Object
The core architecture of AutoMQ's shared storage is Shared WAL and Shared Object. Under this shared storage architecture abstraction, we can have various implementations. The Shared WAL abstraction allows us to reassign this WAL implementation to any shared storage medium, enjoying the advantages brought by different shared storage media. Readers familiar with software engineering would know that every software design has trade-offs, and different shared storage media will have varying benefits and drawbacks as their trade-offs change. AutoMQ's top-level Shared WAL abstraction enables it to adapt to these changes. AutoMQ can reassign the Shared WAL implementation freely to any shared storage service and even combine them. Shared Object is primarily built on mature cloud object storage services, enjoying extremely low storage costs and the scalability benefits of cloud object storage services. As the S3 API becomes the de facto standard for object storage protocols, AutoMQ can also use Shared Object to adapt to various object storage services, offering multi-cloud storage solutions to users. Shared WAL can be adapted to low-latency storage media like EBS and S3E1Z, providing users with low-latency stream services.

[图片]

Best Shared WAL Implementation in the Cloud: EBS WAL
WAL was initially used in relational databases to achieve data atomicity and consistency. With the maturity of cloud storage services like S3 and EBS, combining WAL with low-latency storage and asynchronously writing data to low-cost storage like S3 balances latency and cost. AutoMQ is the first in the stream domain to use WAL based on a shared storage architecture, fully harnessing the advantages of different cloud storage. We believe that the EBS WAL implementation is the best for cloud stream storage engines because it combines the low-latency and high-durability advantages of EBS with the low-cost benefits of object storage. Through clever design, it also mitigates the expensive drawbacks of EBS.

The following diagram illustrates the core implementation process of EBS WAL:

The Producer writes data to EBS WAL through the S3Stream stream storage engine. Once the data is successfully persisted to disk, a success response is immediately returned to the client, fully leveraging the low-latency and high-durability characteristics of EBS.
Consumers can read newly written data directly from the cache.
Once the data in the cache is asynchronously and batch-written to S3 in parallel, it becomes invalid.
If consumers need to read historical data, they should directly access the object storage.

A common misconception is confusing the Shared WAL built on EBS with Kafka’s tiered storage. The primary way to distinguish between them is to check whether the compute node broker is entirely stateless. For tiered storage implementations by Confluent and Aiven, their brokers are still stateful. Kafka's tiered storage requires the last log segment of its partition to be on the local disk, hence their local storage data is tightly coupled with the compute layer brokers. However, AutoMQ’s EBS WAL implementation does not have this limitation. When a broker node crashes, other healthy broker nodes can take over the EBS volume within milliseconds via Multi Attach, write the small fixed-size WAL data (usually 500MB) to S3, and then delete the volume.

The Natural Evolution of Shared WAL: S3 WAL

S3 WAL is the natural evolution of the Shared WAL storage architecture. AutoMQ currently supports building the entire storage layer on S3, which is a specific implementation of Shared WAL. This WAL implementation built directly on S3 is what we refer to as S3 WAL. Thanks to the top-level abstraction of Shared WAL and the foundational implementation of EBS WAL, the core processes of S3 WAL are identical to those of EBS WAL. Therefore, the AutoMQ Team was able to support the implementation of S3 WAL within just a few weeks.

Implementing S3 WAL is a natural evolution of the AutoMQ Shared WAL architecture and helps AutoMQ expand its capability boundaries. When using S3 WAL, all user data is written to object storage, which leads to some latency increase compared to EBS WAL. However, with this trade-off, the entire architecture becomes more streamlined and efficient due to fewer dependent services. On "special" cloud providers like AWS, which do not offer cross-AZ EBS, and in private IDC scenarios using self-built object storage services like minio, the S3 WAL architecture provides stronger cross-AZ availability guarantees and flexibility.

S3WAL Benchmark
AutoMQ has optimized the performance of S3 WAL significantly, especially its latency. In our test scenarios, the average latency for S3 WAL Append is 168ms, with P99 at 296ms.

Kafka Produce request processing latency averages 170ms, with P99 at 346ms.

Average send latency is 230ms, with P99 at 489ms.

How AutoMQ Achieves S3 WAL with Hundreds of Lines of Code
In AutoMQ's GitHub repository, you can find the core stream storage repository, S3Stream [2]. The class com.automq.stream.s3.wal.WriteAheadLog contains the top-level abstraction for WAL, while the implementation class ObjectWALService includes more than 100 lines of implementation code for S3 WAL. In this sense, we leveraged over 100 lines of implementation code in conjunction with the existing EBS WAL infrastructure to fully build AutoMQ on S3.

Of course, implementing hundreds of lines of code does not mean you only need to write over 100 lines of code to run Kafka on S3. This is merely an appearance. The key lies in thoroughly understanding the WAL-based shared storage architecture concept of AutoMQ. Within this framework, whether achieving fully S3-based shared storage or implementing on other shared storage media in the future, the approach remains consistent. In AutoMQ's architecture, Shared WAL is one of the core components. By organizing the code through the top-level abstraction of Shared WAL, we can reassign the implementation methods of Shared WAL to any other shared storage media. Specifically, when implementing a shared storage WAL on AutoMQ, the actual workload and complexity have already been absorbed by the underlying architecture. You only need to focus on efficiently writing and reading WAL to the target storage media. Because AutoMQ's stream storage engine has already paved the way for you, once you fully understand the concept of Shared WAL and the S3Stream stream storage engine, implementing a fully S3-based S3WAL is as simple as writing 100 lines of code.

Summary
This article reveals the core concept of the shared storage architecture based on Shared WAL behind AutoMQ's storage architecture by introducing its thoughts and evolution. In the future, AutoMQ will continue to optimize the capabilities of the stream storage engine foundation based on this abstraction, building a more powerful Kafka stream service for everyone. In the near future, S3E1Z WAL will also officially meet everyone, so please stay tuned to us.

References
[1] AutoMQ: https://github.com/AutoMQ/automq
[2] S3Stream:https://github.com/AutoMQ/automq/tree/main/s3stream
[3] Direct S3 Cluster Deployment: https://docs.automq.com/automq/getting-started/deploy-direct-s3-cluster