Breaking Away from JSON: A New Approach to Data Transport in Web Development

OftenSometimes - Oct 11 - - Dev Community

If you have been coding in the web development industry, you are most likely pretty familiar with JSON. It is the all-encompassing de facto standard that is never challenged. It is used everywhere, and you have become accustomed to it. All your REST calls transfer data via JSON. You know the format's limitations, and you accept them.

Or do you have to?

(Note: all links to packages and code are in the Links-section of the article)

Brief history

My background is heavily in the Java and JavaScript/TypeScript world, so I have learned how to deal with their own quirks. And many years ago, I began a hobby web project (TypeScript/Node) that had a problem which JSON could not solve well.

I wanted to break free from the RESTful mindset to a more relaxed, message-based transport between browser and server. And for that, I really wanted to utilize the JavaScript type system to differentiate messages from each other. You know, one would have classes like AddDocument, GetUsers, GiveMeAllYourMoney, etc. And instead of having many HTTP endpoints, I would have just one, and the messages would flow from browser to server and back in a more ad hoc manner.

But I did not really have an elegant solution for my needs, because JSON destroys all type information during serialization. Of course, I could use some dedicated property to transfer types, but that would require custom processing, and I just felt that it was not the route I wanted to go. I just wanted a protocol that would take my object as is and serialize it in such a way that it would be exactly the same when deserialized. It would retain all type information, and that's it. So I needed an alternative.

One could argue that there are plenty of alternatives to JSON, like Protocol Buffers or MessagePack. But the nature of these alternatives is that they are all binary protocols. And even when I searched information where they are used, web development was totally absent from the scene. And I personally didn't feel that any of them would satisfy my requirements.

Thus, I began the challenge of creating a better JSON for myself.

Introducing Cbot (Character Based Object Transport) -protocol

About five years ago, I began this journey by creating the first version of the protocol. It didn't have a name then; I just referred to it as EJS (Enhanced JSON).

Through gradual improvements, I developed the second iteration. And now, with the third evolution, which I have named Cbot, I finally feel it is mature enough to introduce to others who might be interested.

What are the main features of this protocol and why are they important?

As I mentioned before, the original spark for the project was the capability to retain types during the serialization process. But I quickly realized that I could then also embed even more information that JSON could not.

JSON has a bad habit of not guaranteeing anything. You can put whatever you like, and you normally have to just trust that your name property actually contains a string and not an array of booleans. Or you check everything at runtime to make sure. Of course, there are libraries to check the model. But again, I asked myself, why can't the actual protocol implementation already do it so that I could trust that what was deserialized was the thing I wanted?

Also, the native types of JSON are quite limited. For instance, JSON has an array for collections. But JavaScript already has sets and maps. Could I add them too? And what about dates? There have been numerous times when I have struggled with date formats. Maybe you have a date with a timezone, or maybe you don't. Was it even intended to have one? You never know, because JSON does not really tell you anything.

So in essence, I wanted to rectify this kind of shortcomings in some way.

Why not a binary protocol?

This is a good question. First reason is, that there are already a plethora of binary protocols out there. So why create another one? And the second question is that, if there are better alternatives, why haven't they taken over JSON years ago? There must be good reasons for it.

My guess is simply that working with binary data is not that easy with Javascript. It is easier to work with strings. And JSON is easy to understand and view as humans. And of course browsers have a native JSON-support.

Because Cbot is targeted to work in browser environment, it was more clear to create a character-protocol instead.

What does it look like?

This article is not meant to be a tutorial for Cbot because such a tutorial already exists. However, because you are most likely a developer/engineer, you need at least some sort of understanding of what is happening. So I formulated an example for this purpose. In the example, I am using Cbot as a simple JSON replacement. Using more advanced features requires the use of a metamodel, which is also discussed in the actual tutorial.

But anyway, here is the object:

{
  name: "John Smith",
  age: 41,
  address: {
    street: "Second Avenue",
    postalCode: "1356-A",
    city: "Yorkistan"
  },
  isNiceGuy: true,
  hobbies: [
    "Playing cards",
    "Shopping",
    "Asking odd questions"
  ],
  favouritePoem: {
    title: "Digital Dreams",
    created: new Date("2024-09-16T12:13:00"),
    content: "In the code, we drift and weave,\n"
      + "A dance of data we perceive.\n"
      + "With each keypress, a world unfolds,\n"
      + "Infinite stories, yet untold."
  }
}
Enter fullscreen mode Exit fullscreen mode

When this object is converted to a Cbot-message, it looks like this:

112345abb
E
A  name
B  JKJohn Smith
A !age
B !Id41
A "address
B "E
A #street
B #JKSecond Avenue
A $postalCode
B $JK1356-A
A %city
B %JKYorkistan
F
A &isNiceGuy
B &Iet
A 'hobbies
B 'C
JKPlaying cards
JKShopping
JKAsking odd questions
D
A (favouritePoem
B (E
A )title
B )JKDigital Dreams
A *created
B *Ih2024-09-16T12:13:00.000+03:00
A +content
B +JL
OIn the code, we drift and weave,
OA dance of data we perceive.
OWith each keypress, a world unfolds,
NInfinite stories, yet untold.
M
F
F
Enter fullscreen mode Exit fullscreen mode

Cbot format is designed to be primarily machine-readable. It has a predictable and straightforward syntax, and it could be seen as a kind of small assembly language. Each command is separated by a newline, and each line begins with an opcode that explains how objects are to be constructed.

Because this format is meant to be read programmatically, it does not really make any sense to be read as is. However, it can be visualized in disassembly format, which explains the content much better:

MCSM 12345abb
OBJB (plain)
  DEFN 0 name
  ASGV 0 (name) STRN SSTR John Smith
  DEFN 1 age
  ASGV 1 (age) NATV FLOAT64 41
  DEFN 2 address
  ASGV 2 (address) OBJB (plain)
    DEFN 3 street
    ASGV 3 (street) STRN SSTR Second Avenue
    DEFN 4 postalCode
    ASGV 4 (postalCode) STRN SSTR 1356-A
    DEFN 5 city
    ASGV 5 (city) STRN SSTR Yorkistan
  OBJE
  DEFN 6 isNiceGuy
  ASGV 6 (isNiceGuy) NATV BOOLEAN TRUE
  DEFN 7 hobbies
  ASGV 7 (hobbies) ARRB
    STRN SSTR Playing cards
    STRN SSTR Shopping
    STRN SSTR Asking odd questions
  ARRE
  DEFN 8 favouritePoem
  ASGV 8 (favouritePoem) OBJB (plain)
    DEFN 9 title
    ASGV 9 (title) STRN SSTR Digital Dreams
    DEFN 10 created
    ASGV 10 (created) NATV ZONED_DATETIME 2024-09-16T12:13:00.000+03:00
    DEFN 11 content
    ASGV 11 (content) STRN STBG
      STNL In the code, we drift and weave,
      STNL A dance of data we perceive.
      STNL With each keypress, a world unfolds,
      STPA Infinite stories, yet untold.
    STEN
  OBJE
OBJE
Enter fullscreen mode Exit fullscreen mode

In the disassembly, one can see a number of commands, some explanations, and data. Here is a brief summary of the opcodes:

  • MCSM is a Model Checksum, which is used as a sanity check that the message is understood by both parties.
  • OBJB / OBJE denotes the beginning and the end of an object
  • DEFN / ASGV pair means that first an index is assigned to a property name, and then ASGV uses that index to assign a value to an object. Therefore, if the same property name is encountered again within the message, it does not have to be repeated.
  • SSTR SSTR denotes a simple ordinary string
  • NATV FLOAT64 denotes a native value for a 64-bit float
  • NATV BOOLEAN TRUE, well you guessed it already
  • ARRB / ARRE -pair denotes the beginning and the end of an array
  • NATV ZONED_DATETIME denotes a zoned datetime, which is the default for Javascript Date
  • STRN, STBG, STNL, STPA, and STEN are a set of instructions that define a string builder. Because strings may contain newlines and they can be indefinitely long, a string builder pattern is used to split the string into more manageable pieces.

Is this a TypeScript-only thing?

No, it is not.

Due to my background and the use case, the implementation naturally started from the JavaScript side. But because Cbot is language-agnostic, it can be extended to other languages as well. In fact, there is already a working Java implementation that supports basically everything that the protocol is capable of doing.

Is there a specification somewhere?

Kind of. I found that creating a proper specification was actually really hard to do. I did try to use some sort of EBNF format to create one, but my first problem was that there is no single specification for such a format (the irony). Just a bunch of interpretations of it. Also, even if I had used one of the versions, I wouldn't have any means to actually validate specifications correctness.

So instead, I decided to create a TypeScript file that contains the validation logic as types and classes. And I used that spec file to validate my tests. Thus, it became the validating specification. That spec file is then the master specification that other implementations must use as the source of truth.

What is the status of the project right now?

As I am writing now, I feel that it is basically feature-complete for most use cases. There are some functionalities that need more research, for instance, enums, binary-type support and non-nullable-property support in the meta-model.

However, what I actually need is feedback. I get it that for some, making a JSON replacement is utter nonsense and TypeScript smells like fart. But those who actually feel that Cbot may solve a use case, I would like to know how it fares what support is considered important.

In essence, the next step is just to get some constructive feedback to make sure that the protocol can be stabilized to a first actual version.

Links

Repositories

Documentation

.
Terabox Video Player