Name: In Defense of Defensive Programming
Rating: 3.6 (8274 reviews)
Author: bytebodger

[NOTE: In this article I reference a validation library that I wrote called allow. It's now in an NPM package that can be found here: https://www.npmjs.com/package/@toolz/allow]

My regular readers (both of them) know that I've written a lot about the integrity of values that are passed between different parts of an application. Sometimes, we add manual validations. Sometimes, these values aren't checked at all. Sometimes, we check them at compile time, but we assume they'll be correct at runtime (I'm looking dead at you, TypeScript).

Whatever the approach, I've only recently become aware that the term "defensive programming" is generally used as a pejorative by many programmers. My impression is that "defensive programming" is often interpreted as "jumping through a ridiculous number of hoops to validate data - data that probably doesn't really need to be validated at all." And I don't entirely disagree with this assessment. But I fear some may have become so averse to the idea of defensive programming that they don't recognize the other loopholes they're incorporating into their own code.

Basic Assumptions

Let's ensure that we're all on "the same page" here. I'm sure there are multiple definitions for defensive programming. So, for the sake of this article, this is the definition I'll be using:

Defensive Programming: The practice of treating all inputs to a program as "unknown" - hostile, even. This practice guards such inputs from the main application flow until they've been validated as conforming to the "expected" type/value/format.

I'm focusing on inputs. It would be possible to validate data within the same code block where it was defined. And such a practice would certainly be defensive. But it would also be extreme. And silly.

But inputs represent the strongest case for defensive programming. Because inputs come from... somewhere else. And you don't want this program to be aware of the inner workings of another program for it to do its business. You want this program to be a standalone unit. But if this program stands alone, then it must also assume that any input to the program is potentially hostile.

Validation Hell

This is where "defensive programming" becomes a dirty word. When we talk about validating all of our inputs, we fear it will lead to something like this:

const calculatePassAttemptsPerGame = (passAttempts = 0, gamesPlayed = 0) => {
  if (isNaN(passAttempts)) {
    console.log('passAttempts must be a number.');
    return;
  }
  if (isNaN(gamesPlayed)) {
    console.log('gamesPlayed must be a number.');
    return;
  }
  if (gamesPlayed === 0) {
    console.log('Cannot calculate attempts-per-game before a single game has been played.');
    return;
  } 
  return passAttempts / gamesPlayed;
}

The function has inputs. And the function shouldn't be aware of where those inputs originated. Therefore, from the perspective of the function, the inputs are all potentially dangerous.

That's why this function already has some significant baggage attached to it. We can't necessarily trust that passAttempts or gamesPlayed are numbers. Because passAttempts and gamesPlayed are inputs to this program. And if we feel the need to program "defensively", we end up stuffing extra validations inside our program.

Honestly, the validations shown above aren't even adequate, as far as I'm concerned. Because, while we're ensuring that the inputs are numbers. We're not validating that they're the right kind of numbers.

Think about this: If we're logging the pass attempts per game, does it make sense that either could be negative? Would it make sense if either of them are fractional?? I can't remember the last time a player threw 19.32 passes in a single game. I can't remember the last time a player played in -4 games. And if we want to ensure that our function is truly equipped to always provide the most logical returns, we should also ensure that it is always given the most logical inputs. So if we really wanted to go all-in on defensive programming techniques, we'd add even more validations to ensure that the inputs are non-negative integers.

But who really wants to do all of that?? All we wanted was a simple function that returns the result of passAttempts divided by gamesPlayed, and we ended up with a bloated mess of code. Writing all of those defensive validations feels laborious and pointless.

So how do we avoid the nuisances of defensive programming? Well, here are the approaches (excuses) that I most frequently encounter.

Missing The Forest For The Trees

Is the picture above a bunch of trees? Or is it a single forest? Of course, depending upon your frame of reference, it may be either (or both). But it can be dangerous to assume that the picture above shows no "trees" and only shows a single "forest".

Similarly, what do you see when you look at code like this?

const calculatePassAttemptsPerGame = (passAttempts = 0, gamesPlayed = 0) => {
    //...
}

const calculateYardsPerAttempt = (totalYards = 0, passAttempts = 0) => {
    //...
}

const getPlayerName = (playerId = '') => {
    //...
}

const getTeamName = (teamId = '') => {
  //...
}

Is this one program (a "forest")? Or is it a bunch of individual programs ("trees")??

On one hand, they're presented in a single code example. And they all seem related to some kind of central player/team/sport app. And it's entirely possible that these functions will only ever be invoked in a single runtime. So... they're all part of a single program (a "forest"), right??

Well, if we think beyond our overly-simplistic example, the simple fact is that we should always be trying to write our functions as "universally" as possible.

This means that the function might only ever be used in the context of this particular example. But the function also might be referenced dozens of different times across the app. In fact, some functions prove to be so utilitarian that we end up using them across multiple applications.

This is why the best functions operate as standalone, atomic units. They are their own "thing". And as such, they should be able to operate irrespective of the broader app from which they're called. For this reason, I believe, religiously, that:

Every single function is: a program.

Of course, not everyone agrees with me on that front. They argue that each function is a tree. And they only need to worry about the inputs that are provided to their overall program (the forest).

This gives devs a convenient way to avoid the headaches of acid-testing their code. They look at the example above and they say things like, "No one will ever pass a Boolean into getPlayerName() because getPlayerName() is only ever called from within my program and I know that I'll never pass something stupid into it - like a Boolean." Or they say, "No one will ever pass a negative number into calculateYardsPerAttempt() because calculateYardsPerAttempt() is only ever called from within my program and I know that I'll never pass something stupid into it - like a negative number."

If you're familiar with logical fallacies, these counterarguments basically fall under Appeal to Authority. These devs treat the program as the "authority". And they simply assume that, as long as the input is provided from somewhere else within the same program, there will never be any problems. In other words, they say, "The inputs to this function will be fine because 'the program' says they're fine."

And that is fine - as long as your app is miniscule . But as soon as your app grows to the point that it's a "real", robust app, this appeal falls flat. I don't know how many times I've had to troubleshoot code (often... my code), when I realized that something was failing because the wrong "kind" of data was passed into a function - even though the data came from somewhere else inside the same program.

If there are (or will ever be) two-or-more devs on the project, this "logic" is woefully insufficient. Because it relies on the silly idea that anyone else who works on the project will never ever call a function in the "wrong" way.

If the project is (or will ever be) large enough that it's impractical to expect a single developer to have the entire program in their head, this "logic" is, again, woefully insufficient. If an end-user can put ridiculous values in a form field, then it's equally true that another programmer can try to call your function in a ridiculous way. And if the logic inside your function is so brittle that it blows up whenever it receives bad data - then your function sucks.

So before we move on, I want to make this crystal clear: If your excuse for not validating your function inputs is simply to lean on the fact that you know all the ways the function will be called by you in your app, then we really never need to be on the same dev team. Because you don't code in a way that is conducive to team development.

The Testing Shell Game

I've found that many devs don't try to solve the problem of brittle inputs by writing a bunch of defensive code. They "solve" it by writing a metric crap-ton (technical term) of tests.

They'll write something like this:

const calculatePassAttemptsPerGame = (passAttempts = 0, gamesPlayed = 0) => {
  return passAttempts / gamesPlayed;
}

And then they shrug off the brittle nature of this function by pointing to the incredible pile of integration tests they wrote to ensure that this function is only ever called in the "right" way.

To be clear, this approach isn't necessarily wrong. But it only shunts the real work of ensuring proper application function to a set of tests that don't exist at runtime.

For example, maybe calculatePassAttemptsPerGame() is only ever called from the PlayerProfile component. Therefore, we could try to craft a whole series of integration tests that ensure this function is never actually invoked with anything other than the "right" data.

But this approach is tragically limited.

First, as I've already pointed out, tests don't exist at runtime. They're typically only run/checked prior to a deployment. As such, they are still subject to developer oversight.

And speaking of developer oversight... trying to acid-test this function through integration tests implies that we can think of all the possible ways/places where the function can be called. This is prone to short-sightedness.

It's much simpler (in the code) to include the validations at the point where the data needs to be validated. This means that there are usually fewer oversights when we include the validations directly in-or-after the function signature. So let me spell this out simply:

Tests are great. But they are never a one-for-one replacement for data validation.

Obviously, I'm not telling you to eschew unit/integration tests. But if you're writing a pile of tests just to ensure proper functionality when a function's inputs are "bad", then you're just doing a shell-game with your validation logic. You're trying to keep your application "clean" - by shoveling all of the validation into the tests. And as your application grows in complexity (meaning that: there are more conceivable ways for each function to be called), your tests must keep pace - or you end up with glaring blindspots in your testing strategy.

The TypeScript Delusion

There's a large subset of Dev.to readers who would read this with a cocky smirk and think, "Well, obviously - this is why you use TypeScript!" And for those cocky devs I'd say, "Yeah, ummm... sorta."

My regular readers (both of them) know that I've had some real "adventures" over the last half-year-or-so with TS. And I'm not against TS. But I'm also wary of the over-the-top promises made by TS acolytes. Before you label me as a Grade-A TypeScript Haterrr, lemme be clear about where TS shines.

When you are passing data within your own app, TS is incredibly helpful. So for example, when you have a helper function that's only ever utilized within a given app, and you know that the data (its arguments) only ever emanate from within the app, TS is incredible. You pretty much catch all of the critical bugs that might occur throughout the app whenever that helper function is called.

The utility of this is pretty obvious. If the helper function requires an input of type number and, at any point in the rest of the app, you try to call that function with an argument of type string, TS will immediately complain. If you're using any kind of modern IDE, that also means that your coding environment will immediately complain. So you'll probably know, immediately, when you're trying to write something that just doesn't "work".

Pretty cool, right???

Except... when that data emanates from outside the app. If you're dealing with API data, you can write all the comforting TS type definitions that you want - but it can still blow up at runtime if the wrong data is received. Ditto if you're dealing with user input. Ditto if you're dealing with some types of database inputs. In those cases, you're still resigned to either A) writing brittle functions, or B) adding additional runtime validations inside your function.

This isn't some knock on TS. Even strongly-typed OO languages like Java or C# are susceptible to runtime failures if they don't include the proper error handling.

The problem I'm noticing is that far-too-many TS devs write their data "definitions" inside the function signature - or inside their interfaces - and then... they're done. That's it. They feel like they've "done the work" - even though those gorgeous type definitions don't even exist at runtime.

TS definitions are also (severely) limited by the basic data types available in JS itself. For example, in the code shown above, there is no native TS data type that says passAttempts must be a non-negative integer. You can denote passAttempts as a number, but that's a weak validation - one which is still vulnerable to the function being called the "wrong" way. So if you really want to ensure that passAttempts is the "right" kind of data, you'll still end up writing additional, manual validations.

The Try-Catch Hail Mary

There is one more avenue we could explore to avoid defensive programming: the try-catch.

Try-catch obviously has its place in JS/TS programming. But it's quite limited as a tool for defensive programming when it comes to validating inputs. This happens because try-catch is really only meaningful when JS itself throws an error. But when we're dealing with aberrant inputs, there are frequently use-cases where the "bad" data doesn't result in an outright error. It just provides some kind of unexpected/undesired output.

Consider the following example:

const calculatePassAttemptsPerGame = (passAttempts = 0, gamesPlayed = 0) => {
  try {
    return passAttempts / gamesPlayed;
  } catch (error) {
    console.log('something went wrong:', error);
  }
}

const attemptsPerGame = calculatePassAttemptsPerGame(true, 48);
console.log(attemptsPerGame); // 0.0208333333

The try-catch is never triggered, because true / 48 doesn't throw an error. JS "helpfully" interprets true as 1 and the function returns the result of 1 / 48.

It's Not That Hard

At this point, for those still reading, you're probably thinking, "Well then... there's no good answer to this. Defensive programming is cumbersome and slow. Other techniques are prone to oversights and failures. So... what's to be done???"

My answer is that defensive programming doesn't need to be so hard. Some people read "defensive programming" as "validate ALL inputs" - and they jump to the conclusion that validating ALL inputs must, by definition, be a nightmare. But that's not the case.

I've written before about how I do runtime validation on ALL of my functions that accept inputs. And for me, it's easy. (If you'd like to read about that, the article is here: https://dev.to/bytebodger/better-typescript-with-javascript-4ke5)

The key is to make the inline validations fast, easy, and concise. No one wants to clutter every one of their functions with 30 additional LoC of validations. But - you don't have to.

To give you a tangible example of my approach, consider the following:

import allow from 'allow';

const calculatePassAttemptsPerGame = (passAttempts = 0, gamesPlayed = 0) => {
  allow.anInteger(passAttempts, 0).anInteger(gamesPlayed, 1);
  return passAttempts / gamesPlayed;
}

The entire runtime validation for this function is handled in a single line:

passAttempts must be an integer, with a minimum value of 0.
gamesPlayed must also be an integer, with a minimum value of 1.

That's it. No TS needed. No fancy libraries. No spaghetti code crammed into every function to manually validate all of the arguments. Just a single call to allow, that can be chained if there are two-or-more arguments expected in the function.

To be absolutely clear, this is not some kind of (long-winded) advertisement for my silly, little, homegrown validation library. I couldn't care less which library you use - or whether you roll your own. The point is that runtime validation doesn't need to be that hard. It doesn't need to be verbose. And it can provide much greater overall security to your app than any kind of compile-time-only tool.

The Arrogance of the Entrenched

So should you reconsider any aversions you have to "defensive programming"?? Well, umm... probably not.

I understand that, you probably already have a job where you're paid to program. And in that job, you probably already work with other programmers who set all of their coding ideas in stone years ago. They've already allowed those programming bromides to sink deep into their soul. And if you question any of that, you'll probably be shot down - and quietly scorned.

Don't believe me? Just take a look at the article that I linked to above. There was some nice feedback in the comments. But one, umm... "gentleman" decided to respond with nothing but: "Yuck..."

That's it. No constructive feedback. No rational logic. Just: "Yuck..."

And that is basically what soooo much of programming comes down to these days. You could develop a way to do nuclear fusion merely by writing JavaScript code. But someone will come along, with no additional explanation, and just say, "Yuck..."

So... I get it. I really do. Keep writing your TS. And your copious tests. And keep refusing to validate your function inputs. Because that would be "defensive programming". And defensive programming is bad, mmmmkay????

And I'll keep writing applications that are more fault-tolerant, with fewer lines of code.