I know, how dare I suggest running a script in production. I am a Site Reliability Engineer, I should never condone such craziness. But truth is, there will likely come a time when you need to run a script in production to update or cleanup some data. In this post I am going to give you some tips about how to write and execute a script in production as safely as possible.

1) Track Your Progress

Nothing is worse than writing a giant block of code, pasting it into a console, then hitting enter and watching it sit there. You have no idea where the code is in the script or what it is doing and that, at least for me, is terrifying.

For this reason, you always want to make sure you output some sort of progress meter from your scripts. This allows you to follow along and know where you are in your process. In the event you are using Ruby, consider some well placed puts statements. Below is a script that we recently used at DEV to clean up some incorrectly cached data. Notice the puts statements throughout the script that allow us to follow along as it does its work.

invalid_articles = []
Tag.where(id: tag_ids).find_each do |tag|
  puts tag.taggings.count

  tag.taggings.each_with_index do |tagging, index|
    puts index if index%100 == 0
    article = tagging.taggable
    next unless article

    result = article.update(cached_tag_list: article.tags.pluck(:name).join(", "))
    if result
      puts "Artcle update success #{article.id}"
    else
      puts "Artcle update failure #{article.id}"
      invalid_articles << article
    end
  end
end

Also notice that we are keeping track of any invalid articles that we might find while running this script. Especially when you are cleaning up bad data, always assume you might stumble across more of it and prepare for that in your script. Here we use an if/else statement to catch any invalid articles. You could also use a begin/rescue block.

2) Record Before and After States

When you are updating records there is always a chance something will go off the rails. In order to have the ability to "roll back" track your before state as you are making the updates. If we update our script above to do this, here is what it would look like.

invalid_articles = []
before_update_tag_lists = {} 
Tag.where(id: tag_ids).find_each do |tag|
  tag.taggings.each_with_index do |tagging, index|
    puts index if index%100 == 0
    article = tagging.taggable
    next unless article
    # Record the current cached tag list for every article
    before_update_tag_lists[article.id] = article.cached_tag_list

    result = article.update(cached_tag_list: article.tags.pluck(:name).join(", "))
    if result
      puts "Artcle update success #{article.id}"
    else
      puts "Artcle update failure #{article.id}"
      invalid_articles << article
    end
  end
end

If anything goes wrong while this script is running, the before_update_tag_lists hash has all of our original data in it. Using this original data we can loop back through the articles and reupdate them with the old lists if necessary.

3) Write Production Quality Code

It can be tempting when you are writing a script to use as little syntax as possible. Usually, this means throwing in single letter variables everywhere. You probably won't ever use this code again, so why waste time making it look pretty and readable? The reason you want to make it pretty and readable is because then the script is easier to understand and follow. Having a script that is easy to understand will help you avoid writing bugs.

In my script example above I clearly write out what each object is that I am working with. This allows nearly anyone to look at the script and be able to understand what it is doing. This leads me to my next script writing tip.

4) Have Your Script Reviewed

The same way you never want to push code out to production without a code review, you shouldn't run a script in production without a code review. This is another reason why you want to make sure your script is understandable and readable, because you want someone else to be able to also figure out what it is doing.

We all know the value a second set of eyes on our code brings. Even if you find yourself in a situation where time is tight and you need to run a script ASAP, try as hard as you can to get a second set of eyes on it. I can't tell you the number of times a fresh set of eyes has kept me from botching a script update.

5) Use Screen or Tmux for Long Running Scripts

Tmux and Screen allow you to start an ssh session in a shell and keep that shell active even through network disruptions. This ensures that if you lose connection while your script is running, the script run will not be interrupted. Thanks @kinduff for the reminder!

Alejandro AR • Jun 17 '20

If the task takes a good amount of time, you have risks of being disconnected from either the SSH session, your internet provider, etc.

To avoid this, I recommended running these scripts (although, I do not recommend running scripts like this at all) using screen. It's super easy to use:

SSH into the desired instance
Start a new screen session using screen
Run the long running script
Press Ctrl + a followed by d to detach the session
You can now close everything, even the SSH session
You can reattach to the screen session using screen -r

Screen has a lot of awesome things, make sure to check out the man page.

Running a script in production is never ideal, but if you use these tips when you do it, it can make the experience much less daunting.

Happy scripting!

Tips for Running Scripts in Production

1) Track Your Progress

2) Record Before and After States

3) Write Production Quality Code

4) Have Your Script Reviewed

5) Use Screen or Tmux for Long Running Scripts