Preventing Useless Database Hits

Molly Struve (she/her) - Dec 16 '18 - - Dev Community

After reading the title of this post many of you are probably thinking...

DUH, I already know how to do that. But let’s just hold up a minute, because it might not be as obvious as you think. For example, how many of you have written code like this? I know I have!

User.where(:id => user_ids).each do |user|
  # Lots of user processing 
end
Enter fullscreen mode Exit fullscreen mode

This code looks pretty good, right? If there are no user_ids this block will skip all of the user processing. That seems like a pretty good deal, so it must be fine. Unfortunately, that assumption is not quite right. Let me explain why.

When you execute that where clause even with an empty array it is actually going to hit MySQL.

(pry)> User.where(:id => [])
User Load (1.0ms)  SELECT `users`.* FROM `users` WHERE 1=0
=> []
Enter fullscreen mode Exit fullscreen mode

Notice the where 1=0 statement at the end of the SQL. That is how ActiveRecord ensures no records are returned. Sure, it's a quick 1ms query, but if you are executing this chunk of code millions and millions of times that quick query can easily overwhelm your database and slow you down. So how do you update this code to make it more performant?

You have two options. The first is by not running the MySQL lookup unless you absolutely have to. You can do this by doing an easy peasy array check using Ruby before you execute the block.

return unless user_ids.any?
User.where(:id => user_ids).each do |user|
  # Lots of user processing 
end
Enter fullscreen mode Exit fullscreen mode

By doing this you can save yourself a worthless database hit and ensure that your database is not being overwhelmed by useless calls. In addition, this will also speed up your code. Say you were running this code 10k times. It is going to take you over half a second to run that MySQL lookup 10k times.

(pry)> Benchmark.realtime do                                                           
>   10_000.times { User.where(:id => []) }                                                    
> end                                                                                          
=> 0.5508159045130014
Enter fullscreen mode Exit fullscreen mode

If instead, you skip that MySQL query by checking to see if any user_ids are present first, then running a similar code block 10k times takes less than a hundredth of a second!

(pry)> Benchmark.realtime do                                                           
>   10_000.times do                                                                            
>       next unless ids.any?                                                                     
>       User.where(:id => [])                                                                   
>   end                                                                                        
> end                                                                                          
=> 0.0006368421018123627
Enter fullscreen mode Exit fullscreen mode

As you can see, there's a significant time difference between hitting MySQL unnecessarily 10k times and executing plain old Ruby 10k times. This difference can have a significant impact on the performance of your application. A lot of people will look at this chunk of code

User.where(:id => user_ids).each do |user|
  # Lots of user processing 
end
Enter fullscreen mode Exit fullscreen mode

and the first thing they say is well “Ruby is slow.” But that could not be further from the truth since we just saw that the pure Ruby code is hundreds of times faster! In this case, Ruby is not slow, hitting the database is slow! Keep an eye out for situations like this in your code where you might be making database calls you don't expect.

Now some folks might be looking at this code and thinking I’m not exactly writing code like that. Actually, I chained a bunch of scopes to my where clause

users = User.where(:id => user_ids).active.short.single
Enter fullscreen mode Exit fullscreen mode

so I NEED to pass that array of empty user_ids otherwise the scope chain breaks. Thankfully, even though ActiveRecord doesn't handle empty arrays well, it does give you an option for handling empty scopes and that is the none scope.

none is an ActiveRecord query method that allows you to return a chainable relation with zero records without querying the database. Let’s see this in action. From earlier we know that a where clause with an empty set of IDs is going to hit the database.

(pry)> User.where(:id => []).active.tall.single
  User Load (0.7ms)  SELECT `users`.* FROM `users` WHERE 1=0 AND `users`.`active` = 1 AND `users`.`short` = 0 AND `users`.`single` = 1
=> []
Enter fullscreen mode Exit fullscreen mode

But, if we replace that where clause with the none scope you can see no database call is made and we can still chain our scopes together.

(pry)> User.none.active.tall.single
=> []
Enter fullscreen mode Exit fullscreen mode

Be on the lookout for tools like this in your frameworks that will allow you to work smarter with empty datasets. Even more importantly, NEVER ever assume your framework or gem is not making a database call when asked to process an empty dataset. For more tips on finding calls like these, checkout my blog post on logging.

Real Life Application

This concept of using Ruby to prevent database hits is not just limited to MySQL; it can apply to any type of database! Where we found it useful at Kenna was when it came to building what we call reports. Every night we build these colorful PDF reports for our clients.

These reports start with a reporting object, which holds all the information needed to build that report. Then, to build that pretty report page every night we have to make over 20 Elasticsearch requests along with multiple requests to Redis and MySQL.

We did a lot of work to make sure all these requests were super fast, but it was still taking us hours to build the reports. Eventually, the number of saved reports grew so much that we couldn’t finish them all over night. When my team and I started trying to figure out how to solve this problem, the first thing we did was jump in a console to take a close look at what data our existing reports contained. After some digging, we found that out of the 25k reports in our system over a third of them were blank!!!

(pry)> Report.blank_reports.count
=> 10805
Enter fullscreen mode Exit fullscreen mode

This means they contained no data, and if the reports contained no data, then what was the point of making all of those Elasticsearch, MySQL, and Redis requests when we knew they were going to return nothing?

LIGHTBULB! Don’t hit the databases if the report is empty! By skipping the reports that had no data we took our processing time from over 10 hrs down to 3. By adding a simple line of Ruby

def build(report)
  return if report.blank?
  # Processing
end
Enter fullscreen mode Exit fullscreen mode

we were able to prevent a bunch of worthless database hits, which in turn, sped up our processing tremendously. This strategy of using Ruby to shield your databases from requests, I like to refer to it as using database guards. In practice it’s simple, but I think its one of the easiest things to overlook when you are writing code. Every database hit is using resources so make them all count!

If you are interested in other ways to prevent database hits using Ruby checkout my Cache Is King speech from RubyConf which is what inspired this post.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player