Originally posted on indeliblebluepen.com.
The old adage holds true – when you have a hammer, everything looks like a nail. The past ten+ years of the programming industry have demonstrated that in a most unfortunate way. We dynamically allocate everything.
One disclaimer before I begin: Dynamic allocation is an earth-shatteringly vital tool, without which most feats of programming would be at best impractical, and at worst impossible. Do not throw the baby out with the bathwater. If you're writing any sort of serious software, you will have to dynamically allocate something at some point. The key is knowing, not if, but when to use dynamic allocation.
The problem, then, is exactly as my opening proverb suggests: we've stopped discerning when we should use this tremendously powerful tool, and we've started just using it for everything. It's a little like the Vicks Vapo-Rub of the programming world – some people seriously believe that it cures everything, but misuse causes considerable pain, tears, and misery.
That's a pretty bold statement, as I basically just threw water on the programming philosophy of a number of major software firms, several high-level programming languages, and a good portion of C++'s standard library. If you weren't standing there, arms crossed, waiting for a solid case, twice encased in steel and asbestos, I'd be very disappointed.
In order to give you just that, I need to start out with an explanation of what dynamic allocation is. Another disclaimer – I really can't take full credit for this analogy. I first picked it up from Robert Nystrom, in his surprisingly humorous book Game Programming Patterns – a book I strongly believe should be owned and read by every single programmer with a pulse.
In explaining the whole problem of cache misses, Nystrom paints the following picture.
“Imagine you're a accountant in a tiny office. Your job is to request a box of papers and then do some accountant-y stuff with them…you can finish an entire box in, say, a minute. There's a little problem, though. All of those boxes are stored in a warehouse in a separate building. To get a box, you have to ask the warehouse guy to bring it to you. He goes and gets a forklift and drives around the aisles until he finds the box you want. It takes him, seriously, an entire day to do this. – Robert Nystrom, Game Programming Patterns, pg. 270
That little snippet hardly does the two-page analogy justice, so you'll just have to go read it yourself. The author made the entire book free to read on that link above, though I highly recommend buying a copy for your bookshelf.
If you've known me for any length of time, you know I am an absolute sucker for good analogies, and this one immediately made my list. I wound up repurposing the idea of a disgrunted warehouse guy to explain many issues of RAM management, not least of which being dynamic allocation.
That warehouse, of course, is RAM, and there are a lot of people that need to store stuff in there. In order to keep things sane, Warehouse Guy has numbered all of the shelves. Every person that needs space in the warehouse is assigned a few shelves to use however they want. That's the stack.
Often, however, someone needs to store more than will fit on those shelves. Maybe this Office Person doesn't know how much space they'll need at the start of the workday. And, of course, Warehouse Guy needs to know exactly how many shelves to set aside for you. This massive area of the warehouse is the heap, and a lot of people need to use it. You can't give him some open-ended number.
So, you begin working, and you realize that you need space for three palettes. You summon Warehouse Guy and request three shelves. Warehouse Guy gets on his forklift, drives around, and finds three consecutive open shelves. Then, he gives you the shelf numbers. The world is right again.
You keep working, and decide that you need five more shelves next to those three. You request those from Warehouse Guy, but when he goes back to the warehouse, he finds that there is something else next to your three shelves. So, finds eight consecutive open shelves, and moves your palettes for you. Then, he gives you the new shelf numbers. Great, problem solved.
Within ten minutes, you decide you need another four shelves in that batch. Back to the warehouse, poor Warehouse Guy goes, getting a little irritated by your constant demands.
Two more shelves. Oops, don't need four of those again. Now you need seven more. Six more. Nine more. Two more. Just one more! Hmm, maybe five more. Twelve more? Sixteen more? Nine more? Four more?
By this point, Warehouse Guy wants to tell you to take a permanent vacation to the warmer land that no one speaks of, but he's too out of breath to talk. You've not only wasted his time, but he'd been slower in responding to everyone else's requests, too. After all, there's only him and three other guys to take care of the warehouse! (If you fellow propeller heads haven't figured it out yet, this is a quad-core machine.) Most of those other people are just as annoyingly demanding as you, and now his boss (read the computer user) is screaming at him – “WHY ARE YOU SO SLOW?”
Absurd scenario? Not at all – I have just described the normal behavior of many standard library data structures, among other things. Of course, to equate my story with a normal computer, we have to multiply those numbers by a few million, but the principle remains the same. Dynamic allocation actually does take quite a few CPU cycles (a cycle is roughly one instruction.)
Warehouse Guy is now regretting several career choices, and wondering “Why can't they just ask for a hundred shelves up front and be done with it?”
His suggestion, in fact, is a very smart design choice, and one that is advocated by many expert programmers, including Jason Gregory, a developer at Sony's Naughty Dog Studios, and author of Game Engine Architecture. In its extreme form, this strategy necessitates custom memory allocators, object pools, and other brilliant design patterns that scare off the less serious programmers.
But in its more common form, Warehouse Guy's suggestion is simple to execute. In C++, if we need to store between 15 and 200 integers, with a rare chance of more, the common habit is to think “hmm, well, I don't want to create an array of 200 objects. I could run out of space, or I might be wasting space. I'll just use std::vector. And that is where the mistake is made. We've skipped the whole process of thinking about how much space we're likely to use, and instead we are abusing Warehouse Guy as if he doesn't have anything better to do.
In many cases, the wiser approach is to dynamically allocate an array of, say, 50 integers. Worst case, we have 15 integers and 35 unused bytes for however long that array exists. Even on a system with 256 MB (256 million bytes) of RAM, that doesn't really equate to much. (This isn't an excuse for laziness in memory usage; that, however, is a topic for another article altogether.)
If we wind up needing to store more than those 50 integers, we ask Warehouse Guy for another 50 shelves. Okay, yes, he has to move 50 palettes over, but he isn't complaining – had we asked for more shelf space every single time, he would have moved at least one palette to a new shelf 50 times by now. We saved him a lot of work by timing our requests this way, and now we can store 50 more integers before we need more space. Even if we go nuts and store 3,000 integers by the time we're done with our array, we've only bothered Warehouse Guy 60 times, as opposed to 3,000 times.
Yes, this design pattern takes extra time and effort on our part. We have to anticipate the most common number of elements we need, and the likelihood that we'll need more beyond that. For all that extra work, however, we are actually making our program run much faster.
How much of an impact does this really have on performance, especially on our modern machines? “I have a 3.75 GHz quad core processor! you crow. “It doesn't matter how many times I dynamically allocate!”
Oh, really? I guarantee that you have been annoyed at how slow your computer is at least once this week. Something lagged, took too long to load, or froze up for some length of time. If you had pulled up your system monitor, you would have seen that one or more CPU was maxed out.
Let's put that into English – 3.75 GHz means your CPU is processing 3.75 BILLION INSTRUCTIONS PER SECOND! If a single program is causing the CPU to go from 5% to 100% for a mere five seconds, that means it has used (and/or wasted) some 3.5625 billion instructions! If the program is set up to use all four processors, then you actually have to multiply that number by four.
Given the fact that we really aren't doing all that much significantly new from what we did ten years ago (graphics being a massive exception), something is seriously wrong with this picture. Ten years ago, we were getting by with processor speeds in the MHz (million instructions per second). Yipes.
Have you ever wondered why it is that Windows 98 could boot to usability (you can click on something and have it respond) in under a minute, while Windows 8.1 takes ten minutes? It isn't all bloatware – my own machine exhibited that problem, and it only had three common startup applications and no malware. A large majority of Windows machines (between XP and 8.1) that I've repaired show the same symptoms.
I fail to see why, when we compare two versions of the same software with largely the same features and design, we should find the old one is fast on a ten-year-old personal computer, and the new one lags on a three-day-old computer with the latest operating system.
The problem is, far too many programmers, even some in the big offices, have chucked good design out the window in favor of saving themselves a few neuron sparks of effort. Yes, dynamic allocation is only one piece of that puzzle, but I assert that it is probably a very large piece. We have yet to reap the benefits of Moore's Law because we are collectively hobbling our computers!
Now, I must also digress. I am not claiming that we should dismiss standard library data structures and tools. They have many critical uses, and I use them on a regular basis, albeit judiciously. It is ideal for those situations (among others) where you need to store a bunch of large objects, between 1 and infinity, every few minutes.
I am also not claiming that we should hang up our high-level languages, in which nearly everything is dynamically allocated. I love Python, and habitually try to find reasons to use it. Writing code in a low-level language like C++ that can be just as efficiently and elegantly built in a high-level language is, in general, a waste of time.
We should, however, take a step back and re-evaluate how many of our coding habits are good design, and how many are just because we're too plum lazy to find a better way.
My software company runs on a very simple design principle: if our software can run responsibly and efficiently on a run-of-the-mill Windows 98 machine, it will blow the doors off of any modern software on the market. It's actually not such a novel idea.
So, it is here that I shall make my petition. Please, for the love of all things digital, don't annoy the warehouse guy! He works hard, and he has a lot to do.
Editorial Note: Before anyone protests, yes, I'm aware that some standard data structures don't allocate one element at a time. This is an argument for knowing your abstractions. You should understand how the data structure you're using allocates memory - many really do behave as badly as I describe.
Cover photo credit: "40+216 Faces" by bark | Licensed under CC-BY 2.0.