Daily Coding Problem is a website which will send you a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Java, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!
Problem
Given the root to a binary tree, implement
serialize(root)
, which serializes the tree into a string, anddeserialize(s)
, which deserializes the string back into the tree.For example, given the following Node class
class Node: def __init__(self, val, left=None, right=None): self.val = val self.left = left self.right = right
The following test should pass:
node = Node('root', Node('left', Node('left.left')), Node('right')) assert deserialize(serialize(node)).left.left.val == 'left.left'
Strategy
Spoilers! Don't look below unless you want to see my solution!
Okay, so this clearly wasn't written with Java in mind, as the example code and test are given in Python. The first thing we should do is rewrite the code above in Java.
__init()__
in Python acts like -- but is not a direct analog of -- an object constructor in Java. So we should define a Java Node
class with a constructor that takes a val
(value) and two Node
s as children.
In the Python example, the child nodes default to None
, which is again sort-of, kind-of, not really like a null
value in Java. So we should allow left
and/or right
to be null
.
Python passes an explicit self
to any method defined for an object, unlike most other programming languages. This is why there's a self
in the __init__()
method (the first of four arguments), but there are only three arguments passed to create node
in the test. In Java, this
is passed implicitly to the methods of any object, so we can drop it in the Java version.
The crux of this problem is asking for a String Node::toString()
method on our Java Node
class which converts the Node
to a serialized (String
) representation of the object. That representation should then be able to be converted directly into a Node
with a Node Node::fromString()
method or something similar. These two methods are the serialization and de-serialization methods, respectively.
Code
Building the Node
class
Let's start by building this bare-bones Node
class as defined by the prompt:
public class Node {
private int val;
private Node left;
private Node right;
public Node (int val, Node left, Node right) {
this.val = val;
this.left = left;
this.right = right;
}
}
This can be instantiated very easily in the jshell
:
jshell> /open Node.java
jshell> Node n = new Node(3, null, null)
n ==> Node@f6c48ac
This is -- I think -- the simplest possible implementation of this class which satisfies the requirements of the prompt. Note that val
must have a type of some kind, so I made it an int
above. We can make the node generic with:
public class Node<T> {
private T val;
private Node<T> left;
private Node<T> right;
public Node (T val, Node<T> left, Node<T> right) {
this.val = val;
this.left = left;
this.right = right;
}
}
Though now we have to explicitly state the type of data held within the Node
:
jshell> /open Node.java
jshell> Node<Integer> n = new Node<>(3, null, null)
n ==> Node@14bf9759
Since every Java class is a subclass of Object
, we can declare all Node
s as Node<Object>
s if this is too much of a burden. (But if we're going to do that, we might as well just make the type of val
Object
and forgo the generics.)
Anyway, back to the task at hand. Let's create a serialization method which we'll call toString()
(the Java standard). By default, toString()
called on an object returns the type of that object and its hash:
jshell> n.toString()
$13 ==> "Node@5f341870"
We want our serialization method to include a full description of the Node
so it can be reconstructed from the returned String
, if necessary. Here we get into a bit of trouble.
Unless we restrict ourselves to some subset of objects (say numbers, characters, and strings) and write some code which can easily parse these objects from their String
representations, it's going to be very difficult to serialize our Node
s in this manner.
Although it's not explicit in the prompt, only strings are used as data in the test. I think it's fair if we restrict ourselves to using only String val
s. In that case, we can again rewrite our Node
:
public class Node {
private String val;
private Node left;
private Node right;
public Node (String val, Node left, Node right) {
this.val = val;
this.left = left;
this.right = right;
}
}
In the example, we can easily access the left
(or right
) children of a Node
by calling left()
or right()
methods. We can also get the val
with a val()
method. Let's add those:
public class Node {
private String val;
private Node left;
private Node right;
public Node (String val, Node left, Node right) {
this.val = val;
this.left = left;
this.right = right;
}
public Node left() { return this.left; }
public Node right() { return this.right; }
public String val() { return this.val; }
}
This works as expected in the jshell
...
jshell> Node n = new Node("3", null, new Node("right", null, null))
n ==> Node@10bdf5e5
jshell> n.left()
$19 ==> null
jshell> n.right()
$20 ==> Node@617faa95
jshell> n.val()
$21 ==> "3"
...but we still need those serialization / deserialization methods. One last thing to do before we add them: although Python is more flexible than Java in that you can provide the arguments to a method in any order using named parameters, in the test above, one of the nodes is created with only a single (left
) child by ommitting the right
child. Another is created with no children nodes at all. Let's add some alternative constructors with one and zero child Node
s to emulate this behaviour:
public class Node {
private String val;
private Node left;
private Node right;
public Node (String val, Node left, Node right) {
this.val = val;
this.left = left;
this.right = right;
}
public Node (String val, Node left) {
this.val = val;
this.left = left;
this.right = null;
}
public Node (String val) {
this.val = val;
this.left = null;
this.right = null;
}
public Node left() { return this.left; }
public Node right() { return this.right; }
public String val() { return this.val; }
}
To avoid repeating ourselves, we could also do:
public class Node {
private String val;
private Node left;
private Node right;
public Node (String val, Node left, Node right) {
this.val = val;
this.left = left;
this.right = right;
}
public Node (String val, Node left) {
this(val, left, null);
}
public Node (String val) {
this(val, null, null);
}
public Node left() { return this.left; }
public Node right() { return this.right; }
public String val() { return this.val; }
}
Now we can create the same node as in the example, using almost the exact same syntax:
jshell> /open Node.java
jshell> Node node = new Node("root", new Node("left", new Node("left.left")), new Node("right"))
node ==> Node@4e1d422d
Serialization
Finally, we can create our serialization and de-serialization methods. Let's start with serialization. We need to write a method which encodes within it all of the information about a given Node
, its val
, and all of its children and their val
s. This sounds quite complex.
In the simplest case, we have a Node
with no children (where left
and right
are both null
). Let's tackle that first. We could do something like:
public String toString() {
StringBuilder sb = new StringBuilder("Node(");
if (val == null) {
sb.append("null");
} else {
sb.append("\"");
sb.append(val);
sb.append("\"");
}
if (this.left == null) {
sb.append(", null");
}
if (this.right == null) {
sb.append(", null");
}
sb.append(")");
return sb.toString();
}
Note that I use StringBuilder
rather than String
, as it's more performant when we're concatenating lots of small String
s together.
When an object is created in jshell
, it uses the object's toString()
method to print it to the terminal, so let's see how our method works for a childless Node
:
jshell> /open Node.java
jshell> Node node = new Node("test", null, null);
node ==> Node("test", null, null)
Looking good! Now, when a Node
has children, we need to perform these steps recursively. In place of one of the null
s above, there should be another Node(...)
printed. This is an easy fix -- if left
or right
is non-null
, we just call left.toString()
or right.toString()
within the toString()
method:
public String toString() {
StringBuilder sb = new StringBuilder("Node(");
if (val == null) {
sb.append("null");
} else {
sb.append("\"");
sb.append(val);
sb.append("\"");
}
if (this.left == null) {
sb.append(", null");
} else {
sb.append(", ");
sb.append(this.left.toString());
}
if (this.right == null) {
sb.append(", null");
} else {
sb.append(", ");
sb.append(this.right.toString());
}
sb.append(")");
return sb.toString();
}
}
How does this work?
jshell> /open Node.java
jshell> Node node = new Node("test", null, new Node("right", null, null));
node ==> Node("test", null, Node("right", null, null))
jshell> Node node = new Node("root", new Node("left", new Node("left.left")), new Node("right"))
node ==> Node("root", Node("left", Node("left.left", null, ... Node("right", null, null))
...just fine! So we've successfully implemented our serialization method, toString()
. What about deserialization? This is a bit more complicated.
To deserialize, there are a few things we need to make note of:
- all
val
s are eithernull
orString
s surrounded by quotes - all
Node
s are eithernull
or begin withNode(
-
Node
s can have 0, 1, or 2 children
This is complex. The serialization method was designed to make the output human-readable and not much else. Let's see if we can tweak it to make it more easily parseable by the de-serialization method:
public String toString() {
StringBuilder sb = new StringBuilder("Node: ");
if (val == null) {
sb.append("null");
} else {
sb.append("\"");
sb.append(val);
sb.append("\"");
}
if (this.left == null) {
sb.append("\n null");
// } else {
// sb.append(", ");
// sb.append(this.left.toString());
}
if (this.right == null) {
sb.append("\n null");
// } else {
// sb.append(", ");
// sb.append(this.right.toString());
}
sb.append("\n");
return sb.toString();
}
Now, the output looks like this:
jshell> /open Node.java
jshell> Node node = new Node("test");
node ==> Node: "test"
null
null
jshell> System.out.print(node.toString())
Node: "test"
null
null
In this revised version, the val
is printed after a Node:
and the left
and right
children are printed beneath the val
. What does it look like when the Node
has some children? Does a straightforward approach work?
public String toString() {
StringBuilder sb = new StringBuilder("Node: ");
if (val == null) {
sb.append("null");
} else {
sb.append("\"");
sb.append(val);
sb.append("\"");
}
if (this.left == null) {
sb.append("\n null");
} else {
sb.append("\n ");
sb.append(this.left.toString());
}
if (this.right == null) {
sb.append("\n null");
} else {
sb.append("\n ");
sb.append(this.right.toString());
}
sb.append("\n");
return sb.toString();
}
jshell> /open Node.java
jshell> Node node = new Node("test", null, new Node("right"));
node ==> Node: "test"
null
Node: "right"
null
null
jshell> System.out.print(node.toString())
Node: "test"
null
Node: "right"
null
null
...ooh, not quite. We've got too many blank lines at the end, and right
's children aren't indented as expected. We can create a second toString()
method that takes a parameter indent
. This can be the number of levels the child Node
should be indented in the output.
After a bit of tweaking, I end up with something like this:
public String toString() {
return toString(0);
}
public String toString (int indent) {
String spacer = " ";
String bump = String.join("", Collections.nCopies(indent, spacer));
StringBuilder sb = new StringBuilder(bump);
sb.append("Node: ");
bump = bump + spacer;
if (val == null) {
sb.append("null");
} else {
sb.append("\"");
sb.append(val);
sb.append("\"");
}
if (this.left == null) {
sb.append("\n");
sb.append(bump);
sb.append("null");
} else {
sb.append("\n");
sb.append(this.left.toString(indent+1));
}
if (this.right == null) {
sb.append("\n");
sb.append(bump);
sb.append("null");
} else {
sb.append("\n");
sb.append(this.right.toString(indent+1));
}
return sb.toString();
}
...which looks something like this:
jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> System.out.print(node.toString())
Node: "test"
Node: "left"
null
null
Node: null
null
null
This is a slightly different serialization method which will hopefully make deserialization a bit easier. Now, to deserialize, we work our way from left to right. The left-most Node
will always be the root Node
. If we move two spaces to the right and look up and down that column, we'll find (on the top) the left
child Node
, which is either null
, or a Node
itself. On the bottom, we find the right
child Node
. The val
held by a node always comes after the Node:
marker and is always a String
surrounded by quotes, unless it is null
.
Deserialization
To deserialize the above output, we need to parse the nodetree from left to right. The val
held by a Node
appears after the Node:
marker and its children (left
and right
) are found beneath it. Let's again begin with the simplest example, a Node
with no children:
jshell> Node node = new Node("test");
node ==> Node: "test"
null
null
jshell> System.out.print(node.toString())
Node: "test"
null
null
To start, we find the val
by simply looking at everything after "Node:
". If the resulting substring is surrounded by quotes, we remove them. Otherwise, the substring must be null
:
public static Node fromString (String serialized) {
String marker = "Node: ";
int valStart = serialized.indexOf(marker) + marker.length();
int valEnd = serialized.indexOf("\n");
String val = serialized.substring(valStart, valEnd);
if (val.charAt(0) == '"')
val = val.substring(1, val.length()-1);
else
val = null;
System.out.println("val:");
System.out.println(val);
return null;
}
This will print:
jshell> /open Node.java
jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> node.toString()
$84 ==> "Node: \"test\"\n Node: \"left\"\n null\n null\n Node: null\n null\n null"
jshell> Node.fromString(node.toString())
val:
test
$85 ==> null
So val
was parsed correctly! Next is just parsing the children. If they're null
, we add a null
child. If they're not, we recursively call the fromString()
method on them! But they could be pretty far down the tree (left
might have 17 generations of children and grandchildren, for instance). So how do we find right
?
We know that it's indented exactly two spaces in our serialized representation, so we can split the String
into lines and find the only two lines which should be indented exactly two spaces before a null
or a Node:
appears.
Or, since all lines after the first begin with a \n
followed by some number of spaces, we can replace all instances of \n
plus two spaces with just \n
, then remove the first line, then look for the two lines which don't begin with any spaces.
public static Node fromString (String serialized) {
String marker = "Node: ";
int valStart = serialized.indexOf(marker) + marker.length();
int valEnd = serialized.indexOf("\n");
String val = serialized.substring(valStart, valEnd);
if (val.charAt(0) == '"')
val = val.substring(1, val.length()-1);
else val = null;
String modified = serialized.replaceAll("\n ", "\n");
modified = modified.substring(valEnd+1);
System.out.println(modified);
return null;
}
The result of the above is "de-denting" the entire serialized output two spaces:
jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> System.out.print(node.toString())
Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> Node.fromString(node.toString())
Node: "left"
null
null
Node: null
null
null
$102 ==> null
Now, from the first line down, we have the left
Node
, and from the first non-indented line after the first, we have the right
Node
. We need to handle null
nodes, as the code we've written now would throw an error for them:
public static Node fromString (String serialized) {
String marker = "Node: ";
int valStart = serialized.indexOf(marker);
if (valStart < 0) return null;
valStart += marker.length();
int valEnd = serialized.indexOf("\n");
String val = serialized.substring(valStart, valEnd);
if (val.charAt(0) == '"')
val = val.substring(1, val.length()-1);
else val = null;
String modified = serialized.replaceAll("\n ", "\n");
modified = modified.substring(valEnd+1);
System.out.println(modified);
return null;
}
...and then we need to find the two non-indented lines so we can run this process again on the left
and right
child Node
s. A non-indented line will be either null
or a Node
, so it will be the only instance where the \n
line break character is immediately followed by an n
or an N
:
jshell> /open Node.java
jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> System.out.print(node.toString())
Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> Node.fromString(node.toString())
Node: "left"
null
null
Node: null
null
null
27
$110 ==> null
...so right
starts at character index 27
. Finally, we can split the children into right
and left
nodes and run this process on them again, recursively (comments added, as well):
public static Node fromString (String serialized) {
// is this a non-null Node?
String marker = "Node: ";
int valStart = serialized.indexOf(marker);
// if null, return a null Node
if (valStart < 0) return null;
// otherwise, get the `val` of the Node
valStart += marker.length();
int valEnd = serialized.indexOf("\n");
String val = serialized.substring(valStart, valEnd);
// is `val` null?
if (val.charAt(0) == '"')
val = val.substring(1, val.length()-1);
else val = null;
// de-dent the serialized representation and look for children
String modified = serialized.replaceAll("\n ", "\n");
modified = modified.substring(valEnd+1);
// at what character does the `right` child start?
int rightStart = Math.max(
modified.indexOf("\nN"),
modified.indexOf("\nn")
) + 1;
// child node `left`
Node left = null;
// if `left` is not `null`
if (modified.substring(0, 4) != "null")
left = fromString(modified.substring(0, rightStart));
// child node `right`
Node right = null;
// if `right` is not `null`
if (modified.substring(rightStart, rightStart+4) != "null")
right = fromString(modified.substring(rightStart));
return new Node(val, left, right);
}
And here it is running in the jshell
:
jshell> /open Node.java
jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> System.out.print(node.toString())
Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> Node copy = Node.fromString(node.toString())
copy ==> Node: "test"
Node: "left"
null
null
Node: null
null
null
jshell> System.out.print(copy.toString())
Node: "test"
Node: "left"
null
null
Node: null
null
null
Let's check the test example given in the prompt:
jshell> Node node = new Node("root", new Node("left", new Node("left.left")), new Node("right"))
node ==> Node: "root"
Node: "left"
Node: "left.left" ... "right"
null
null
jshell> Node.fromString(node.toString()).left().left().val()
$129 ==> "left.left"
jshell> Node.fromString(node.toString()).left().left().val().equals("left.left")
$130 ==> true
It works as expected!
Discussion
This took much longer than I thought it would initially. My first attempt at serialization led to -- what I thought would be -- a very complex deserialization. So I refactored to a less elegant-looking serialization which had a much easier deserialization.
Both of my serialization and deserialization methods rely on recursion to get the job done, which I think is the best way to go about this. (You don't know ahead of time how deep the nodetree is going to be.)
The deserialization method might not be optimal, as its written in such a way that lots of data needs to be held on the stack. It's not tail call optimizable as written. I think, ideally, we would want to find the most deeply-nested Node
s first, create them, and move up the tree, rather than moving from the top-down. It's not immediately obvious to me how we would go about doing that.
I've written nodetrees in the past with much nicer-looking toString()
implementations, though they're not serialized, as all of the data is not contained within the String
representation. This is because these nodetrees often allow data which is not strictly String
in nature.
One last small thing to note is -- because I'm using \n
to separate the nodes, if \n
appears in a val
anywhere, there could be problems. Input should be sanitised or a special exception should be added so that a \n
within a val
doesn't break the deserialization.
All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.
Suggestions? Let me know in the comments.