Random Can "Break" Your App

Adam Nathaniel Davis - Jul 6 '20 - - Dev Community

If you're writing a lot of business applications, you may not have much need for randomization. After all, when a customer checks out in your shopping cart, you don't want to charge them a random price. Or add a random amount of sales tax. Or send them a random product.

But there are definitely some times when "random" is a critical feature. And... this is where things get tricky. Because many devs underestimate how difficult it can be to represent "randomness" in an application. They also underestimate the public's general ignorance about randomness and probabilities.


Alt Text

Random(ish)

Most languages make it pretty simple to create virtual "randomness". For example, in JavaScript we can do this:

const dieRoll = Math.floor(Math.random() * 6) + 1;
Enter fullscreen mode Exit fullscreen mode

This line of code rolls a virtual six-sided die. If you've done any reading about the inner plumbing of computer science, you may already know that this line of code doesn't provide true randomness. To put it another way, the "random" result of this line of code is actually a predictable outcome if we were to peer under the covers and track the seed that's being used to generate this so-called "random" number. This is often referred to as pseudo-randomness.

Another way to think of pseudo-randomness is that it's random to you. In theory, if you were tracking, in real-time, all the inputs that the algorithm is using to generate the "random" number, it wouldn't be random at all. You could predict, with 100% certainty, what every subsequent "random" number would be, every time we ran this line of code.

But you're probably not staring at the guts of your microprocessor. You probably have no idea what exact seed was used the last time this code was run. So, for all practical purposes, the number is random - to you. And for most applications that require "randomness", this lower-level pseudo-randomness is just fine.

This article actually is not a deep-dive into the surprisingly-difficult pursuit of true randomness. For the rest of this article, I'm only going to deal with pseudo-randomness. Because the deeper problem that affects many applications has nothing to do with the academic pursuit of true randomness. The deeper problem is that most people don't even recognize randomness when they see it. And when they misunderstand the nature of randomness, they tend to blame the application that's generating a supposedly-random sequence.


Alt Text

Random Occurrences vs. Random Sets

In my experience, most people have a very limited grasp of probabilities. (And as a poker player, I have a fair amount of experience with this.) They can usually give you a reasonable estimate on the probability that a single event might occur. But when you ask them how likely it is that a given set of events will occur over a specific period, the accuracy of their predictions quickly falls apart.

For example, if I ask people:

What are the odds of rolling a 1 on a single throw of a six-sided die?


The vast majority of everyone I know will (accurately) say that the chance is 1-in-6 (16.6%). But if I ask those same people:

What are the odds of rolling a 1 at least one time if I throw a six-sided die six times??


Too often, people consider this scenario and respond that the answer is: 100%. Their (flawed) reasoning goes like this:

If the odds of rolling a 1 are 1-in-6, then the odds of rolling at least one 1, over six throws, is 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 6/6 = 100%.


(If you're unsure of the answer yourself, the chance of rolling a 1, at least one time, over the course of six rolls of a six-sided die is: 66.6%.)

In general, people also perform poorly when they're asked to assess the distribution of an entire random set. For example:

Let's imagine that we have a single, six-sided die. And we're gonna roll that die six times. But before we make those die rolls, we're gonna ask people to predict how many times each number will occur. Most people would write down a prediction that would look something like this:

Number of rolls that will result in `1`: 1
Number of rolls that will result in `2`: 1
Number of rolls that will result in `3`: 1
Number of rolls that will result in `4`: 1
Number of rolls that will result in `5`: 1
Number of rolls that will result in `6`: 1
                                        --
Total rolls that will occur              6
Enter fullscreen mode Exit fullscreen mode

So here's the critical question:

What are the odds that the above prediction would be correct??


The answer would surprise a lot of people.

There is a 1.5% chance that each of the six numbers will only occur once over the course of six different rolls.


In other words, there's a 98.5% chance that those six rolls will not result in every number occurring once (and only once).


Alt Text

Phantom Patterns

Just as we can fail to understand the likelihood of random occurrences, we can also "perceive" non-random events that occur in the middle of otherwise-random noise. The human brain is, essentially, an analog pattern-matching machine. This trait evolved over millions of years - and we wouldn't be here today if it hadn't.

You can't wait to react until a lion is leaping at you. You must be able to discern the pattern of its face - even when it's mostly obscured through the bush.

You can't wait to to pay the chieftain your respects until he's standing right in front of you. You must be able to discern the pattern of his appearance - even when he's some ways off down the street.

In other words, pattern-matching is generally a good thing. We want to identify patterns as early and as often as possible. But this ingrained ability can often work against us - because we sometimes perceive patterns where they don't exist. (BTW, the name for this is: pareidolia.) And when we become convinced that a pattern has emerged, we also become convinced that the so-called "random" generator has failed.

We assume that patterns don't exist in random noise. And therefore, if we perceive a pattern in the random noise, we jump to the conclusion that this "noise" is not actually random at all. To see how this plays out in real life, let's consider a scenario with some playing cards.

Imagine that I have a standard deck of 52 cards. We'll assume that it's a "fair" deck (no magician's props here) and that I've given it an extensive shuffling using thorough and "accepted" techniques. Once the deck has been thoroughly randomized, I pull the top card off the deck, and it's:

The ace of spades


Would that result surprise you? I hope not. Because, assuming that the deck is "fair" and my shuffling skills are complete, the ace of spades has the same odds of ending up on the top of the deck as any other card.

So now I put the ace of spades back into the deck. And I again conduct a thorough-and-extended shuffling of all 52 cards. Once I'm done, I pull the top card off the deck, and it's:

The ace of spades(!)


Would that result surprise you? Maybe. If nothing else, it certainly feels like an odd coincidence, no? But I imagine that even the most hardcore conspiracy theorist would admit that it's possible for the exact same card to be shuffled to the top of the deck twice in a row.

So now I put the ace of spades back into the deck. And I again conduct a thorough-and-extended shuffling of all 52 cards. Once I'm done, I pull the top card off the deck, and it's:

The ace of spades!!!!!


OK. I can almost hear you thinking right now. You're saying, "C'monnn... The ace of spades? Three times in a row?? This must be rigged!" But here's my question to you:

How many times must the ace of spades come off the top of the deck before we can prove that the deck and/or the shuffling technique and/or the person doing the shuffling - are rigged??


The answer is very simple. As long as we are assessing nothing but the observable results, it is impossible to ever conclude, definitively, that any part of the process is "rigged". This is because, with no deeper analysis of the processes that surround the ever-repeating ace of spades, it's impossible to definitively state that this is not, simply, an incredible sequence of events.

To be clear, I understand that, on a practical level, at a certain point the incredible nature of the sequence becomes soooo improbable, and soooo mind-blowing, as to throw the integrity of the whole exercise into question. To put this another way, you can reach a point where "statistical improbability" becomes indistinguishable from "impossibility".

But I'm pointing out these phantom patterns because your users will be far quicker to claim "impossibility" than you will.


Alt Text

Who Cares??

This article will be a two-parter. If I try to cram this into a single blog post, no one will ever read it. Part two will explain, in some detail, why programmers can't ignore these issues.

It may feel like the "problems" I've outlined are just cognitive biases that have nothing to do with your code. But in part two, I'm gonna outline how these mental traps are not simply the users' problem. Even if your code is "perfect" and your randomization is mathematically flawless, that won't do you much good if the users don't trust your process.

Specifically, I'm going to outline some real-life use cases from Spotify where they've alienated some of their own subscribers because they failed to account for all the ways in which people can't comprehend randomness. I'm also going to illustrate how ignoring the issue can turn off your own customers - but trying too hard to "fix" it can also make the problem worse.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player