Skip to content

The MessagePack Type System

Kashyap edited this page Jul 28, 2015 · 3 revisions

The MessagePack Type System

MessagePack is a compact, schema-less binary protocol. You can find its formal specification here.

Composite Types

MessagePack, like JSON, supports two composite types: maps and arrays.

Maps

A map of length N is defined as having N pairs of objects of any type, where the first object in each pair is the key, and the second object is the value. Maps are marked by a "header" that denotes the size of map, and takes up between 1 and 5 bytes on the wire. (Although maps with non-str keys are a perfectly legal construct in MessagePack, many implementations of the encoding do not support them, as the language-specific behavior of non-string hash maps may be undefined.)

Arrays

An array of length N is defined as N sequential objects (of any type). Arrays are prefixed by a header that denotes their size, and takes up between 1 and 5 bytes on the wire.

Base Types

MessagePack supports the following base types:

Int (signed integer)

A MessagePack int is a signed integer between -(1<<63) and (1<<63)-1. An int takes up 1 to 9 bytes on the wire.

Uint (unsigned integer)

A MessagePack uint is an unsigned integer from 0 to (1<<64)-1 Like an int, a uint takes up 1 to 9 bytes on the wire.

Bool (boolean)

A MessagePack bool is a simple boolean. It always takes up exactly 1 byte.

Null (null)

Null is analogous to JSON's null; it is meant to signify the absence of an object. It always takes up exactly 1 byte.

Float (floating point number)

Float is a floating point number, encoded as a 32- or 64-bit IEEE 754 float. 32-bit floats take up 5 bytes, and 64-bit floats take up 9 bytes.

Bin (binary)

MessagePack bin is between 0 and (1<<32)-1 bytes of arbitrary data. The encoded object requires between 2 and 5 extra bytes beyond the size of the binary.

Str (string)

MessagePack str is a UTF-8-encoded string between 0 and (1<<32)-1 bytes long. The encoded object requires between 1 and 5 extra bytes beyond the size of the string.

Ext (extension)

A MessagePack ext object is a tuple of a signed, 8-bit integer and arbitrary binary data up to (1<<32)-1 bytes. Users can use the ext type to extend the MessagePack type system. An ext takes up between 2 and 6 extra bytes beyond the size of the data.

Comparison with JSON

Succinctly, JSON's type system is a subset of the MessagePack type system.

The two most important differences between the encodings, beyond the fact that one is binary and one is plaintext, is that JSON maps must always be keyed by strings, and that JSON has no notion of floats, integers, or unsigned integers; those are all simply a "number" of arbitrary precision.

Consequently, it is possible to translate arbitrary valid MessagePack to arbitrary valid JSON deterministically, provided the following can be guaranteed:

  1. All MessagePack map types are keyed by string-able fields (e.g. str or safe bin)
  2. All MessagePack ext types have a valid, application-defined JSON representation The rest of the translation process is fairly straightforward; str fields need to be properly escaped, and bin fields should be converted to quoted base-64 strings.

It is also possible to convert JSON to MessagePack; however, there are multiple possible valid MessagePack objects that can represent the same JSON object. (Consider that in {"number":1}, the "number" field could be encoded as a float32, float64, uint, or int in MessagePack.) Consequently, translation in this direction requires that the translator have some a-priori knowledge about the objects being decoded in order to produce deterministic results.