Name: Java Daily Coding Problem #007
Rating: 1.1 (6727 reviews)
Author: awwsmm

Daily Coding Problem is a website which will send you a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Java, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!

Problem

Given the mapping a = 1, b = 2, ... z = 26, and an encoded message, count the number of ways it can be decoded.

For example, the message '111' would give 3, since it could be decoded as 'aaa', 'ka', and 'ak'.

You can assume that the messages are decodable. For example, '001' is not allowed.

Strategy

Spoilers! Don't look below unless you want to see my solution!

This problem involves us building a binary tree of all possible solutions, then counting the leaves of that tree. For instance, it we're given the message 11212, our tree might look like:

start ------------  1 --------  1 ------  2 ----  1 -- 2
      \               \           \         \
       \               \           \         `-- 12
        \               \           \
         \               \           `-- 21 ----  2
          \               \
           \               `-- 12 ------  1 ----  2
            \                     \
             \                     `---- 12
              \
               `-- 11 --------  2 ------  1 ----  2
                      \           \
                       \           `---- 12
                        \
                         `---- 21 ------  2

I say a binary tree because there are only 26 letters in the alphabet, so we are only interested in a subset of the 1- and 2-digit integers. This means that -- at any given point within a message -- either the next character and/or the next two characters are a legal encoding of a letter. So each node of the tree can have at most two children.

In our example above, you can count the eight nodes with no children (leaves), which indicate that the encoded message 11212 has eight legal decodings. As it turns out, this is the maximum possible number of legal encodings for a message of length 5, because at any point in the message followed by at least two digits, the next 2-digit number is a legal encoding of a letter, as well as the next 1-digit number.

But the prompt stated that the messages must be decodable, so why even mention this?

Well, it's possible that the message is decodable without an arbitrary two digit number within the message being decodable. For instance, take the message

dangitbobby

Encoded, this is

411479202152225

Written out like that, it's obvious that there are some 2-digit sequences that are not valid letters. Like 47 or 92 or 02. The presence of these "undecodeable subsequences" means that this message will have fewer than the maximum possible number of decodings for an arbitrary 15-digit message. So we have an upper bound on the number of possible decodings, as a function of the length of the message, in digits.

For a 3-digit message, it's 3, as shown in the prompt. For a 5-digit message, it's 8, as shown above. Trivially, a 1-digit message only has 1 legal decoding and a 2-digit message has at most 2 legal decodings. By "pruning" our binary tree, above, after the first digit, we can see that a 4-digit message has at most 5 legal encodings:

start ------------  1 ----  /  ----  1 ------  2 ----  1 -- 2
                      \    /           \         \
                       \  /             \         `-- 12
                         /   \           \
                        /     \           `-- 21 ----  2
                       /       \
                      /         `-- 12 ------  1 ----  2
                     /                 \
                    /                   `---- 12

This sequence, 1, 2, 3, 5, 8, ... might look familiar to you. Let's add one more number to the beginning of our 5-digit message (say, 2, so the resulting message is 211212) and look at the maximum possible number of legal decodings for a 6-digit message:

start ----------  2 ------------  1 --------  1 ------  2 ----  1 -- 2
      \             \               \           \         \
       \             \               \           \         `-- 12
        \             \               \           \
         \             \               \           `-- 21 ----  2
          \             \               \
           \             \               `-- 12 ------  1 ----  2
           |              \                     \
           |               \                     `---- 12
           |                \
           |                 `-- 11 --------  2 ------  1 ----  2
           |                        \           \
           |                         \           `---- 12
           |                          \
           |                           `---- 21 ------  2
            \
             `-- 21 ------------  1 --------  2 ------  1 ----  2 
                    \               \           \
                     \               \           `---- 12
                      \               \
                       \               `---- 21 ------  2
                        \
                         `------ 12 --------  1 ------  2
                                    \
                                     `------ 12

When we have a 6-digit message, the maximum number of possible decodings is 13. We see now that this is the Fibonacci sequence: 1, 2, 3, 5, 8, 13, ... The maximum possible number of encodings for an N digit sequence is F(N), where F(N) is the N-th Fibonacci number, F(1) = 1, and F(2) = 2. This gives us a sanity check on our solution, which we should get back to, actually.

So how do we go about this? The most direct way is to simply step through the tree, one digit at a time, and check if the next two digits are a valid encoding of a character (the next single digit will almost always be a valid encoding, unless it's 0). If they are, create a branch in our solution and continue. The easiest way to code this would be through a recursive algorithm. Maybe we can write this in a clever way so the compiler can do some tail-call optimisation?

Code

My gut is telling me that we don't want to be doing lots of String operations, so let's first take the given message and convert it to an int array. Something like:

jshell> Stream.of("1234567".split("")).mapToInt(s -> Integer.parseInt(s)).toArray()
$15 ==> int[7] { 1, 2, 3, 4, 5, 6, 7 }

Let's wrap this up in a function:

public class Message {

  public static int[] digitise (String message) {
    return Stream.of(message.split("")).mapToInt(s -> Integer.parseInt(s)).toArray();
  }

}

Next, we need a function to look at the next two digits and determine whether or not they're a valid character encoding (if they form a number less than 27):

  public static boolean startsWithValidTwoDigitSeq (int[] message) {
    if (message.length < 2) return false;
    if (message[0] > 2 || message[0] < 1) return false;
    if (message[0] == 2 && message[1] > 6) return false;
    return true;
  }

Now, given a particular message, we can check if the next two digits encode a character. If they do, we fork the search, creating two new searches with two new message strings -- one where the first digit has been removed and one where the first two digits have been removed.

It is possible that the first digit could be a 0, in which case neither the first one nor the first two digits encode a valid character. We abort these searches.

Whenever we get to the end of a valid message (meaning we're at the final one or two digits and those digits are a valid encoding of a character), we increment the number of total valid decodings.

This naïve process is expressed below:


  public static int sum = 0;

  public static int nValidDecodings (int[] digits) {
    sum = 0;
    sumDecodings(digits);
    return sum;
  }

  private static void sumDecodings (int[] digits) {

    // (1) if there are no digits left...
    if (digits.length < 1) return;

    // (2) if there's only 1 digit left...
    if (digits.length < 2) {

      // (2a) ...and it's not a valid encoding of a character
      if (digits[0] == 0) return;

      // (2b) ...and it's a valid encoding of a character
      sum += 1;
      return;
    }

    // (3) if there are only 2 digits left...
    if (digits.length < 3) {

      // (3a) ...and it's a valid encoding of a character
      if (digits[0] == 1 || (digits[0] == 2 && digits[1] < 7)) sum += 1;

      // (3b) ...fork by removing first digit
      sumDecodings(Arrays.copyOfRange(digits, 1, digits.length));
      return;
    }

    // (4) if there are 3+ digits left...

    // (4a) ...and the first digit is not a valid encoding of a character:
    if (digits[0] == 0) return;

    // (4b) ...and the first two digits are a valid encoding of a character
    if (digits[0] == 1 || (digits[0] == 2 && digits[1] < 7)) {

      // ...fork with 1-digit number and 2-digit number removed
      sumDecodings(Arrays.copyOfRange(digits, 1, digits.length));
      sumDecodings(Arrays.copyOfRange(digits, 2, digits.length));
      return;

    }

    // (4c) ...and the first digit is a valid encoding of a character:
    sumDecodings(Arrays.copyOfRange(digits, 1, digits.length));
    return;

  }

Like most of these puzzle solutions, we start from the edge cases and work our way backward. The easiest case occurs when we pass a message with no digits:

    // (1) if there are no digits left...
    if (digits.length < 1) return;

...we don't increment the class variable sum and simply abort the process (or this particular fork of the process). The next case occurs when the message has only a single digit:

    // (2) if there's only 1 digit left...
    if (digits.length < 2) {

      // (2a) ...and it's not a valid encoding of a character
      if (digits[0] == 0) return;

      // (2b) ...and it's a valid encoding of a character
      sum += 1;
      return;
    }

...if the single digit is a 0, this message is invalid and we abort without incrementing sum. Otherwise, increment sum and then quit. Note that this could be rearranged, significantly reducing the line count:

    // (2) if there's only 1 valid digit left...
    if (digits.length < 2 && digits[0] != 0) sum += 1;
    return;

...but I think the former is a bit clearer. Our third case occurs when the message has exactly two digits. Here we need to be concerned about forking the process:

    // (3) if there are only 2 digits left...
    if (digits.length < 3) {

      // (3a) ...and it's a valid encoding of a character
      if (digits[0] == 1 || (digits[0] == 2 && digits[1] < 7)) sum += 1;

      // (3b) ...fork by removing first digit
      sumDecodings(Arrays.copyOfRange(digits, 1, digits.length));
      return;
    }

If there are only two digits in the message and those digits are a valid encoding of a character, we increment the sum total of valid decodings. Whether or not that two-digit number is a valid encoding, we consider the possibility that the next two single-digit numbers are both valid encodings by removing the first digit of the array (using the Arrays API) and passing the resulting array back to sumDecodings().

The final -- and most complex -- case occurs when we have more than two digits in our message:

    // (4) if there are 3+ digits left...

    // (4a) ...and the first digit is not a valid encoding of a character:
    if (digits[0] == 0) return;

    // (4b) ...and the first two digits are a valid encoding of a character
    if (digits[0] == 1 || (digits[0] == 2 && digits[1] < 7)) {

      // ...fork with 1-digit number and 2-digit number removed
      sumDecodings(Arrays.copyOfRange(digits, 1, digits.length));
      sumDecodings(Arrays.copyOfRange(digits, 2, digits.length));
      return;

    }

    // (4c) ...and the first digit is a valid encoding of a letter:
    sumDecodings(Arrays.copyOfRange(digits, 1, digits.length));
    return;

Here, there are three sub-cases. First (4a), the initial digit could be a zero, in which case there is no valid decoding of this message and we exit early. Second (4b), it could be that the first two digits of the message are a valid encoding of a character, in which case we fork the process twice, once with the first digit removed and once with the first two digits removed from the message. Finally (4c), if the first two digits don't compose a valid encoding of a character, then at least the first one does, in which case we simply continue the process by removing only the first digit.

Again, this last step can be rearranged a bit for brevity:

    if (digits[0] != 0) {

      if (digits[0] == 1 || (digits[0] == 2 && digits[1] < 7)) {
        sumDecodings(Arrays.copyOfRange(digits, 2, digits.length));
      }

      sumDecodings(Arrays.copyOfRange(digits, 1, digits.length));
    }

    return;

...but again I think that the way I wrote it initially is a bit clearer. Finally, note that wherever we have:

  if (digits[0] == 1 || (digits[0] == 2 && digits[1] < 7))

...we can replace this with our helper function that we wrote earlier:

  if (startsWithValidTwoDigitSeq(digits))

The solution in its entirety can be found on my GitHub.

...but does this solution work? It seems to work on all of the examples we've seen so far:

jshell> /open Message.java

jshell> Message.nValidDecodings(Message.digitise("111"))
$80 ==> 3

jshell> Message.nValidDecodings(Message.digitise("11212"))
$81 ==> 8

jshell> Message.nValidDecodings(Message.digitise("1212"))
$82 ==> 5

jshell> Message.nValidDecodings(Message.digitise("211212"))
$83 ==> 13

...giving us the Fibonacci numbers we expect. It also works when we throw in digits that result in invalid character encodings:

jshell> Message.nValidDecodings(Message.digitise("131"))
$85 ==> 2

jshell> Message.nValidDecodings(Message.digitise("333"))
$86 ==> 1

jshell> Message.nValidDecodings(Message.digitise("11111"))
$87 ==> 8

jshell> Message.nValidDecodings(Message.digitise("11191"))
$88 ==> 5

jshell> Message.nValidDecodings(Message.digitise("11911"))
$89 ==> 6

jshell> Message.nValidDecodings(Message.digitise("19111"))
$90 ==> 6

jshell> Message.nValidDecodings(Message.digitise("91111"))
$91 ==> 5

Discussion

So that's it! We needed to do some mechanical things at the beginning, converting Strings to int arrays and so on, but after that initial hurdle, we just needed to think carefully through the edge cases.

While trying to solve this problem, I was sure that there was some analytical solution which didn't involve brute-force walking the tree. But I couldn't find it. If anyone can help out, I would appreciate it!

All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.

Suggestions? Let me know in the comments.

If you enjoyed this post, please consider supporting my work by buying me a coffee!