Make Invalid States Unrepresentable

Andrew (he/him) - Feb 2 - - Dev Community

Suppose you have a Person class in your program, and that a Person has an age. What type should the age be?

age as a String

case class Person(age: String)
Enter fullscreen mode Exit fullscreen mode

"Of course it shouldn't be a String" you might think. But why? The reason is that we can then end up with code like

val person = Person("Jeff")
Enter fullscreen mode Exit fullscreen mode

If we ever wanted to do anything with an age: String, we would need to validate it everywhere

def isOldEnoughToSmoke(person: Person): Boolean = {
  Try(person.age.toInt) match {
    case Failure(_) => throw new Exception(s"cannot parse age '${person.age}' as numeric")
    case Success(value) => value >= 18
  }
}

def isOldEnoughToDrink(person: Person): Boolean = {
  Try(person.age.toInt) match {
    case Failure(_) => throw new Exception(s"cannot parse age '${person.age}' as numeric")
    case Success(value) => value >= 21
  }
}

// etc.
Enter fullscreen mode Exit fullscreen mode

This is cumbersome for the programmer writing the code, and makes it difficult for any programmer reading the code, as well.

We could move this validation to a separate method

def parseAge(age: String): Int = {
  Try(age.toInt) match {
    case Failure(_) => throw new Exception(s"cannot parse age '$age' as numeric")
    case Success(value) => value
  }
}

def isOldEnoughToSmoke(person: Person): Boolean =
  parseAge(person.age) >= 18

def isOldEnoughToDrink(person: Person): Boolean =
  parseAge(person.age) >= 21
Enter fullscreen mode Exit fullscreen mode

...but this is still not ideal. The code is a bit cleaner, but we still need to parse a String into an Int every time we want to do anything numeric (comparison, arithmetic, etc.) with the age. This is often called "stringly-typed" data.

This can also move the program into an illegal state by throwing an Exception. If we're going to fail anyway, we should fail fast. We can do better.

age as an Int

Your first instinct might have been to make age an Int, rather than a String

case class Person(age: Int)
Enter fullscreen mode Exit fullscreen mode

If so, you have good instincts. An age: Int is much nicer to work with

def isOldEnoughToSmoke(person: Person): Boolean =
  person.age >= 18

def isOldEnoughToDrink(person: Person): Boolean =
  person.age >= 21
Enter fullscreen mode Exit fullscreen mode

This

  • is easier to write
  • is easier to read
  • fails fast

You cannot construct an instance of the Person class with a String age now. That is an invalid state. We have made it unrepresentable, using the type system. The compiler will not allow this program to compile.

Problem solved, right?

val person = Person(-1)
Enter fullscreen mode Exit fullscreen mode

This is clearly an invalid state as well. A person cannot have a negative age.

val person = Person(90210)
Enter fullscreen mode Exit fullscreen mode

This is also invalid -- it looks like someone accidentally entered their ZIP code instead of their age.

So how can we constrain this type even further? How can we make even more invalid states unrepresentable?

age as an Int with constraints

at runtime

We can enforce runtime constraints in any statically-typed language

case class Person(age: Int) {
  assert(age >= 0 && age < 150)
}
Enter fullscreen mode Exit fullscreen mode

In Scala, assert will throw a java.lang.AssertionError if the assertion fails.

Now we can be sure that the age for any Person will always be within the range [0, 150). Both

val person = Person(-1)
Enter fullscreen mode Exit fullscreen mode

and

val person = Person(90210)
Enter fullscreen mode Exit fullscreen mode

will now fail. But they will fail at runtime, halting the execution of our program.

This is similar to what we saw in "age as a String", above. This is still not ideal. Is there a better way?

at compile time

Many languages allow compile-time constraints, as well. Usually this is accomplished through macros, which inspect the source code during a compilation phase. These are often referred to as refined types.

Scala has quite good support for refined types across multiple libraries. A solution using the refined library might look something like

case class Person(age: Int Refined GreaterEqual[0] And Less[150])
Enter fullscreen mode Exit fullscreen mode

A limitation of this approach is that the field(s) to be constrained at compile-time must be literal, hard-coded values. Compile-time constraints cannot be enforced on, for example, values provided by a user. By that point, the program has already been compiled. In this case, we can always fall back to runtime constraints, which is often what these libraries do.

For now, we'll continue with runtime constraints only, since often that's the best we can do.

age as an Age with constraints

From simplest to most complex implementation, we moved left to right in the diagram below

String => Int => Int with constraints
Enter fullscreen mode Exit fullscreen mode

This increase in complexity directly correlates with the accuracy with which we're modelling this data

"The problems tackled have inherent complexity, and it takes some effort to model them appropriately." [source]

The move left-to-right above should be driven by the requirements of your system. You should not implement compile-time refinements, for example, unless you have lots of hard-coded values to validate at compile time: otherwise you aren't gonna need it. Every line of code has a cost to implement and maintain. Avoiding premature specification is just as important as avoiding premature generalization, though it's always easier to move from more specific types to less specific types, so prefer specificity over generalization.

Every bit of data has a context, as well. There is no such thing as a "pure" Int value floating around in the universe. An age can be modelled as an Int, but it's different from a weight, which could also be modelled as an Int. The labels we attach to these raw values are the context

case class Person(age: Int, weight: Int) {
  assert(age >= 0 && age < 150)
  assert(weight >= 0 && weight < 500)
}
Enter fullscreen mode Exit fullscreen mode

There is one more problem for us to solve here. Suppose I'm 81kg and 33 years old

val me = Person(81, 33)
Enter fullscreen mode Exit fullscreen mode

That compiles, but... it shouldn't. I swapped my weight and age!

An easy way to avoid this confusion is to define some more types. In this case, newtypes

case class Age(years: Int) {
  assert(years >= 0 && years < 150)
}

case class Weight(kgs: Int) {
  assert(kgs >= 0 && kgs < 500)
}

case class Person(age: Age, weight: Weight)
Enter fullscreen mode Exit fullscreen mode

The name newtype for this pattern comes from Haskell. This is a simple way to ensure that we don't accidentally swap values with the same underlying type. The following, for example, will not compile

val age = Age(33)
val weight = Weight(81)

val me = Person(weight, age) // does not compile!
Enter fullscreen mode Exit fullscreen mode

We could also use tagged types. In Scala, the simplest possible example of this looks something like

trait AgeTag
type Age = Int with AgeTag

object Age {
  def apply(years: Int): Age = {
    assert(years >= 0 && years < 150)
    years.asInstanceOf[Age]
  }
}

trait WeightTag
type Weight = Int with WeightTag

object Weight {
  def apply(kgs: Int): Weight = {
    assert(kgs >= 0 && kgs < 500)
    kgs.asInstanceOf[Weight]
  }
}

case class Person(age: Age, weight: Weight)

val p0 = Person(42, 42) // does not compile -- an Int is not an Age
val p1 = Person(Age(42), 42) // does not compile -- an Int is not a Weight  
val p2 = Person(Age(42), Weight(42)) // compiles!
val p3 = Person(Weight(42), Weight(42)) // does not compile -- a Weight is not an Age
Enter fullscreen mode Exit fullscreen mode

This makes use of the fact that function application f() is syntactic sugar in Scala for an apply() method. So f() is equivalent to f.apply().

This approach allows us to model the idea that an Age / a Weight is an Int, but an Int is not an Age / a Weight. This means we can treat an Age / a Weight as an Int and add, subtract, or do whatever other Int-like things we want to do.

Mixing these two approaches in one example, you can see the difference between newtypes and tagged types. You must extract the "raw value" from a newtype. You do not need to do this with a tagged type

// `Age` is a tagged type
trait AgeTag
type Age = Int with AgeTag

object Age {
  def apply(years: Int): Age = {
    assert(years >= 0 && years < 150)
    years.asInstanceOf[Age]
  }
}

// `Weight` is a newtype
case class Weight(kgs: Int) {
  assert(kgs >= 0 && kgs < 500)
}

// `Age`s can be treated as `Int`s, because they _are_ `Int`s
assert(40 == Age(10) + Age(30))

// `Weight`s are not `Int`s, they _contain_ `Int`s
Weight(10) + Weight(30) // does not compile

// To add `Weight`s, we must "unwrap" them
Weight(10).kgs + Weight(30).kgs
Enter fullscreen mode Exit fullscreen mode

In some languages, the "unwrapping" of newtypes can be done automatically. This can make newtypes as ergonomic as tagged types. For example, in Scala, this could be done with an implicit conversion

implicit def weightAsInt(weight: Weight): Int = weight.kgs

// `Weight`s are not `Int`s, but they can be _converted_ to `Int`s
Weight(10) + Weight(30) // this now compiles
Enter fullscreen mode Exit fullscreen mode

Further refinements

The important point of the above discussion is that, as much as possible, we want to make invalid states unrepresentable.

  • "Jeff" is an invalid age. Age isn't a string, it is a number.
  • -1 is an invalid age. Age cannot be negative, it should be 0 or positive, and probably less than about 150.
  • My age is not 88. An age should be easily distinguishable from other integral values, like weight.

Everything discussed above implemented these refinements on the concept of "age", one at a time.

We can make further refinements if there is a need for those refinements.

For example, suppose we want to send a "Happy Birthday!" email to a Person on their birthday. Rather than an Age, we now need a date of birth.

case class Date(year: Year, month: Month, day: Day)

case class Year(value: Int, currentYear: Int) {
  assert(value >= 1900 && value <= currentYear)
}

case class Month(value: Int) {
  assert(value >= 1 && value <= 12)
}

case class Day(value: Int) {
  assert(value >= 1 && value <= 31)
}

case class Person(dateOfBirth: Date, weight: Weight) {
  def age(currentDate: Date): Age = {
    ??? // TODO calculate Age from dateOfBirth
  }
}
Enter fullscreen mode Exit fullscreen mode

The amount of information provided by dateOfBirth is strictly greater than the amount of information provided by Age. We can calculate someone's age from their date of birth, but we cannot do the opposite.

The above implementation leaves much to be desired, though -- there are lots of invalid states. A better way to implement this would be for Month to be an enum, and for Day validity to depend on the Month (February never has 30 days, for example)

case class Year(value: Int, currentYear: Int) {
  assert(value >= 1900 && value <= currentYear)
}

sealed trait Month

case object January extends Month
case object February extends Month
case object March extends Month
case object April extends Month
case object May extends Month
case object June extends Month
case object July extends Month
case object August extends Month
case object September extends Month
case object October extends Month
case object November extends Month
case object December extends Month

case class Day(value: Int, month: Month) {
  month match {
    case February => assert(value >= 1 && value <= 28)
    case April | June | September | November => assert(value >= 1 && value <= 30)
    case _ => assert(value >= 1 && value <= 31)
  }
}

case class Date(year: Year, month: Month, day: Day)
Enter fullscreen mode Exit fullscreen mode

Always prefer low-cardinality types to high-cardinality types, when possible. It limits the number of possible invalid states. In most languages, enums are the way to go here (in Scala 2, an enum can be modelled using a sealed trait, as shown above). But there are still invalid states hiding above. Can you find them?

In some cases, stringly-typed data validated using regular expressions can be replaced entirely by enums. Could you model Canadian postal codes such that it's impossible to construct an invalid one?

Use the above knowledge to go forth and make invalid states unrepresentable.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player