Rails N + 2 queries

Bruno Vezoli - Jun 6 '20 - - Dev Community

We usually try to solve performance problems by using #includes to get rid of N + 1 queries but this doesn't
always fix the issue; in fact, it can even create more queries under some circumstances. This post
shows a few examples of when that happens and how to deal with it.

Your typical N + 1 query problem

We learn how to deal with ActiveRecord performance issues with very plain examples, most blog posts show models that
look something like this:

class Post < ApplicationRecord
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :post
end
Enter fullscreen mode Exit fullscreen mode

Followed by your standard irb example:

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments }
Enter fullscreen mode Exit fullscreen mode

Which generates the following queries, where we can clearly see the N + 1 query problem: Rails is doing one query
to retrieve the posts and N additional queries to retrieve the comments, where N is the number of posts.

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (0.8ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 1]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 2]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 3]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 4]]
Comment Load (0.6ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 5]]
Enter fullscreen mode Exit fullscreen mode

And that the solution is to preload the data by calling the #includes method over the posts collection:

irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments }
Enter fullscreen mode Exit fullscreen mode

Which in turn produces the following optimized queries:

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (1.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5)  [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Enter fullscreen mode Exit fullscreen mode

And so we start using #includes on all our ActiveRecord queries and move on.

A little ways down the road: changes to queries invalidate data preloading

Time passes and requirements change, it's only normal. Now we need to only show posts that are uncensored and we change
our query to reflect that:

irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }
Enter fullscreen mode Exit fullscreen mode

Can you guess the amount of queries this is going to generate? I'll give you a hint: it's on the name of the post.

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (1.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5)  [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Comment Load (6.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 1], ["censored", false]]
Comment Load (0.5ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 2], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 3], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 4], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 5], ["censored", false]]
Enter fullscreen mode Exit fullscreen mode

Now we have the N + 1 queries we had in the beginning and also an additional query to preload data we are not going to use
(which will increase our application's memory footprint but that's a story for another day). You may wonder why doesn't Rails
just iterate the comments collection and select posts that are not censored. Well, in this simple example, it could. But if we start doing
some more complex queries and adding raw SQL fragments it's going to get a lot more difficult.

Adding Bullet to log N + 1 queries

I agree that Bullet is a must have in any project, no matter how big or small. It's
very difficult to catch every N + 1 in every query we build and it's even harder to detect cases like this where we should no longer
preload the data.

After installing bullet what we get is the following warning:

AVOID eager loading detected
  Post => [:comments]
  Remove from your query: .includes([:comments])
Enter fullscreen mode Exit fullscreen mode

Awesome! It knows about our unused preload and it tells us to remove it; let's do that.

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }
Enter fullscreen mode Exit fullscreen mode

And let's also check our logs for the generated queries and watch out for warnings from bullet.

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (6.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 1], ["censored", false]]
Comment Load (0.5ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 2], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 3], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 4], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 5], ["censored", false]]
Enter fullscreen mode Exit fullscreen mode

So we are back to square one but this time we have no bullet warnings, so what should we do?

What are the solutions to preloading custom queries?

There are two solutions (at least that I know of) to this performance problem. The first one is to use Rails' preloader
but as you may guess from the :nodoc: directive that's a private class not meant to be used outside the framework. I'm not going
to even talk about how to preload using that class but if you are curious here's a nice post
on how to deal with N + 1 queries on GraphQL using Rails' preloader.

The second solution, and the one I'm going to explain here, is using a scoped association and preloading it instead of the
comments association. This requires us to add one more line to our Posts model:

class Post < ApplicationRecord
  has_many :comments
  # we need to specify a new name, a lambda to filter the comments and the model class name
  has_many :uncensored_comments, -> { where(censored: false) }, class_name: 'Comment'
end
Enter fullscreen mode Exit fullscreen mode

And change our code to get the comments using the association:

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.uncensored_comments }
Enter fullscreen mode Exit fullscreen mode

Bingo! We get the following warning from bullet:

GET /posts
USE eager loading detected
  Post => [:uncensored_comments]
  Add to your query: .includes([:uncensored_comments])
Enter fullscreen mode Exit fullscreen mode

And sure enough if we add that preload

irb> posts = Post.includes(:uncensored_comments)
irb> all_comments = posts.map { |p| p.uncensored_comments }
Enter fullscreen mode Exit fullscreen mode

We no longer get the warning and our queries are optimized:

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (0.9ms)  SELECT "comments".* FROM "comments" WHERE "comments"."censored" = $1 AND "comments"."post_id" IN ($2, $3, $4, $5, $6)  [["censored", false], ["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Enter fullscreen mode Exit fullscreen mode

Caveats: Evaluate performance optimizations

As with most performance optimizations you should really measure and evaluate the changes you are about to make. It doesn't
really make sense to add an association to your models every time you want to preload, sometimes it is better to have a small
performance penalty rather than a model full of associations.

But if your queries are taking too long I really encourage you to add the corresponding associations and preload the data you need.

Bonus: how to preload belongs_to associations

This method also works for cases when you need to just fetch one record. Let's use the blog example and add a use case where
we need the most liked comment from each Post:

irb> posts = Post.includes(:comments)
irb> most_liked_comments = posts.map { |p| p.comments.order(likes: :desc).first }
Enter fullscreen mode Exit fullscreen mode

Once again we have an N + 1 and no warning from bullet

Post Load (0.4ms)  SELECT "posts".* FROM "posts"
Comment Load (0.9ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 1], ["LIMIT", 1]]
Comment Load (0.7ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 2], ["LIMIT", 1]]
Comment Load (0.6ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 3], ["LIMIT", 1]]
Comment Load (0.3ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 4], ["LIMIT", 1]]
Comment Load (0.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 5], ["LIMIT", 1]]
Enter fullscreen mode Exit fullscreen mode

And here is where has_one comes to the rescue, let's modify our Posts model one last time

class Post < ApplicationRecord
  has_many :comments
  has_many :uncensored_comments, -> { where(censored: false) }, class_name: 'Comment'
  # rails will automatically limit the number of records for us
  has_one :most_liked_comment, -> { order(likes: :desc) }, class_name: 'Comment'
end
Enter fullscreen mode Exit fullscreen mode

Bullet now complains about data not being preload us and gives us the solution to our problems

irb> posts = Post.includes(:most_liked_comment)
irb> most_liked_comments = posts.map { |p| p.most_liked_comment }
Enter fullscreen mode Exit fullscreen mode

And we get neat SQL queries once more

Post Load (0.4ms)  SELECT "posts".* FROM "posts"
Comment Load (0.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5) ORDER BY "comments"."likes" DESC  [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Enter fullscreen mode Exit fullscreen mode

Notice though that in this case we don't get the LIMIT clause on the SQL query and so Rails loads all these comments on memory and then
loads them on each post, that's also something to consider: memory usage vs SQL query time. As in most cases there's not a clear
answer and you should really measure to see your specific case.

If you like this post be sure to check out the original post and checkout other related goodies.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player