Intro
GraphQL is a powerful and flexible query language for APIs that enables clients to request exactly the data they need, eliminating over-fetching and under-fetching of information. However, as GraphQL queries become more complex and involve multiple data sources, it can be challenging to efficiently retrieve and serve data to clients. This is where GraphQL data loaders come into play.
GraphQL data loaders are a critical component in optimizing GraphQL APIs, designed to tackle the notorious N+1 query problem, which occurs when a GraphQL server fetches the same data repeatedly for a list of related items. Data loaders help streamline the process of fetching data from various sources, such as databases, APIs, or even local caches, by batching and caching requests. By doing so, they significantly improve the efficiency and performance of GraphQL queries.
In this tutorial we will take a deep dive on the batching feature, for what we will explore how it does its magic having a look at the java implementation of the data loader.
Batching
Batching is the process of collecting multiple individual data retrieval requests into a single batch request, thus reducing the number of calls made to data sources. This is especially crucial when dealing with relationships in GraphQL queries.
Consider a typical scenario where a GraphQL query requests a list of items and, for each item, additional related data such as user information. Without batching, this would result in a separate database query or API request for each item, leading to the N+1 query problem. With batching, these individual requests can be efficiently combined into a single request, drastically reducing the number of round-trips to the data source
Java Data Loader Batching
Let’s say we have a graphql query like the below one
{
user {
name
friends {
name
}
}
}
It generates the following query result
{
"user": {
"name": "John",
“friends”: [
{
"name": "Jane",
},
{
"name": "Bob",
},
{
"name": "Alice",
}
]
}
}
A naive implementation would perform a call to retrieve a user object for every user in the query response, i.e. 4 calls, one for the root object and one for each friend in the list.
However the DataLoader
does not immediately perform the remote calls, it just enqueues the calls and returns a promise (CompletableFuture
) to deliver a user object. Once we have enqueued all the calls that build our query result we must request the DataLoader
to start executing them. At this point is where the magic happens. The DataLoader
will start to extract the user id for each call and put it into a list which will be used to query the backend we configured and retrieve the user list using just one request.
The batching usually happens by levels, in this case we have 2 levels. The root user and his friends. By using the DataLoader
batchig this response will require just 2 calls.
The code
Let’s put some code to show how it would be used.
First thing we need to have is a BatchLoader
. It will load users from the user backend in batches, thus reducing the amount of API calls to that backend.
List<User> loadUsersById(List<Long> userIds) {
System.out.println("Api call to load users = " + userIds);
return users.stream().filter(u -> userIds.contains(u.id())).toList();
}
BatchLoader<Long, User> userBatchLoader = new BatchLoader<>() {
@Override
public CompletionStage<List<User>> load(List<Long> userIds) {
return CompletableFuture.supplyAsync(() -> {
return loadUsersById(userIds);
});
}
};
Then we need to create a DataLoader
which will use the previous BatchLoader
to perform the loading of the whole user tree.
var userLoader = DataLoaderFactory.newDataLoader(userBatchLoader);
var userDTO = new UserDTO();
userLoader.load(1L).thenAccept(user -> {
userDTO.id = user.id();
userDTO.name = user.name();
user.friends().forEach(friendId -> {
userLoader.load(friendId).thenAccept(friend -> {
userDTO.friends.add(new FriendDTO(friend.id(), friend.name()));
});
});
});
userLoader.dispatchAndJoin();
System.out.println(userDTO);
It will produce the following debug output
Api call to load users = [1]
Api call to load users = [2, 3, 4]
UserDTO{id=1, name='John', friends=[FriendDTO[id=2, name=Jane], FriendDTO[id=3, name=Bob], FriendDTO[id=4, name=Alice]]}
If you are curious about how this internally works I will show you one custom implementation of the user DataLoader
. Not the real one. Just one simplified version to help you in getting the whole picture.
static class UserLoader {
BatchLoader<Long, User> userBatchLoader;
record QueueEntry(long id, CompletableFuture<User> value) { }
List<QueueEntry> loaderQueue = new ArrayList<>();
UserLoader(BatchLoader<Long, User> userBatchLoader) {
this.userBatchLoader = userBatchLoader;
}
CompletableFuture<User> load(long userId) {
var future = new CompletableFuture<User>();
loaderQueue.add(new QueueEntry(userId, future));
return future;
}
List<User> dispatchAndJoin() {
List<User> joinedResults = dispatch().join();
List<User> results = new ArrayList<>(joinedResults);
while (loaderQueue.size() > 0) {
joinedResults = dispatch().join();
results.addAll(joinedResults);
}
return results;
}
CompletableFuture<List<User>> dispatch() {
var userIds = new ArrayList<Long>();
final List<CompletableFuture<User>> queuedFutures = new ArrayList<>();
loaderQueue.forEach(qe -> {
userIds.add(qe.id());
queuedFutures.add(qe.value());
});
loaderQueue.clear();
var userFutures = userBatchLoader.load(userIds).toCompletableFuture();
return userFutures.thenApply(users -> {
for (int i = 0; i < queuedFutures.size(); i++) {
var userId = userIds.get(i);
var user = users.get(i);
var future = queuedFutures.get(i);
future.complete(user);
}
return users;
});
}
}
So, first look at CompletableFuture<User> load(long userId)
, it does not perform any userId lookup, it just:
- Enqueues the lookup
- Produces a
CompletableFuture
to let you chain further lookups based on the one you provided. So, the lookups are deferred until we actually request its execution usingdispatchAndJoin()
Now, look at List<User> dispatchAndJoin()
. That will be called once we are ready to retrieve the user list. It will:
- Call
CompletableFuture<List<User>> dispatch()
which will perform the following actions:- Group all userIds into one list and send it to the underlying
BatchLoader
which performs the actual API call to the backend. - Complete the CompletableFuture that was provided when we registered the lookup (when we called
CompletableFuture<User> load(long userId)
), thus adding more elements toloaderQueue
. At this point userId lookups for the next level got enqueued.
- Group all userIds into one list and send it to the underlying
- Repeat the process while there are elements remaining in
loaderQueue
.
References
https://www.graphql-java.com/documentation/batching/
https://github.com/graphql-java/java-dataloader