Introduction
When it comes to Event Driven Architectures, people use to think about Fan-Out pattern, maybe Event-Sourcing... But there are numerous patterns that are working well with events.
When you're having multiple microservices with each their own database, you may face data propagation challenges.
Data propagation challenge in microservices
Let's take a simple example with a blog:
- User microservice: handle user's stuff and have basic information about registered users with its own users database.
- Article microservice: is responsible of CRUD on published articles.
But here is the thing : an article is written and edited by a user. So if my user is updating some information, like, for example, his username, every articles have to be updated to match the new name and not the old one.
This is where data propagation may occur. There are other solutions to tackle this problem but data propagation is a popular one.
The purpose of data propagation is to be able to detect data changes and to inform interested microservices with this change to properly handle it their way. In our example, we simply wanna update our username in article microservice whenever updated in uses microservice as follows.
1 - User create an article
2 - User change his username calling only user microservice
3 - Username update must be propagated to article
Common approaches
A pretty common, but not optimal approach I often see, is to use Fan-out pattern to fire multiple events from the business logic or data layer of microservices and to publish it into a topic. The problem with this approach is that it makes you write code to fire events whenever a data change occur and add complexity in those layers.
So, how do CDC pattern helps us to do it in a better way ?
The concept of Change Data Capture
CDC consist in being able to determine what changed in a database and to react to this change.
In Event Driven Architectures, CDC react to changes with events through topics, queues or streams.
A good example of CDC oriented database is AWS DynamoDB. With the DynamoDB Stream option activated, every change in every item of the database is sent through a stream as a shard. Each shard contains the nature of the change and a previous and new state.
You can natively react to shards with AWS lambdas for example but also, within any of your applications using AWS SDK with the programing language you're using.
The outbox pattern alternative
Also, if you are in a relational context and work wih an SQL database, you have to be aware of the Outbox pattern. The Transactional Outbox pattern consist in sending events only whenever a RDBMS transaction is fully achieved assuming that there are no desync of data across the distributed system.
A well known technology that is implemeting this pattern id Debezium. It's watching the transactions within a RDBMS to send events into a Kafka topic anytime the data is changing.
Using Outbox pattern with Debezium in our context
In our first example, we have an SQL database so Debezium should be a good fit to achieve our data propagation.
- The user name is updated in his own database
- The change is detected by Debezium
- Debezium fires an event into a topic with the previous usernmae and the new userName
- Article microservice is subscribed to the topic and react to the event by triggering an update of all of the articles written by the previous userName setting the authorName to new userName.
Conclusion
Finally, we saw that there are several ways to achieve data propagation in a distributed system. Fan-out can bring some extra code and complexity that could be handled by some technologies using CDC or Outbox pattern like DynamoDB Streams, Debezium, Redis pub/sub...
As always, those methods must be used whenever you feel it's answering the problem you're facing and not applied everywhere without consideration. But it's good to know that it exists to avoid introducing unnecessary complexity into the code.