3.2 Structured Data Compression
3.2.2 Concise Binary Object Representation
Concise Binary Object Representation (CBOR [BH13]) is a compact data format based on the JSON data model. CBOR is optimized for simplicity, processing speed, minimum resource usage and implementation compactness.
Although it is designed to be used on its own, the CBOR specification defines a JSON mapping that can be used to directly transform data between JSON and CBOR formats. JSON objects are converted to CBOR maps where the JSON property names are used as the keys of the CBOR map.
CBOR follows a very straightforward approach to encode data items. The first byte of each data item gives information about the data type of the data item. Thus, CBOR includes the information about the CBOR data types in-line, in the coded stream itself. The data type byte is further divided into two distinct fields, the major type (high order 3 bits) and additional information (remaining low-order 5 bits).
The additional information field provides further data-type specific information that is used to decode the data item’s value. For instance, if the additional information value is less than 24 it represents a small unsigned integer (i.e. an integer between 0 and 23) while if the value is between 24 and 27, it indicates that the actual value of the additional information field is encoded in the following bytes and its length is 1, 2, 4 or 8 bytes long respectively.
The additional information field interpretation is done according to the major type field semantics. For instance, if the major type indicates an array, the additional information field provides the length of the array.
Section 3.2. Structured Data Compression 35
3.2.2.1 CBOR Data Type Codification
The following list summarizes the major types as well as the type-specific additional information and any complementary fields.
• Major Type ‘0’: the major type ‘0’ is used to represent an unsigned integer. The additional information field encodes the integer value either directly (if less than 24) or in the following bytes (if between 24 and 27).
• Major Type ‘1’: this type represents a negative integer. The additional information field encodes the integer value in the same way as unsigned integers (major type ‘0’).
• Major Type ‘2’: major type ‘2’ is used to represent byte strings. The additional informa- tion field is interpreted as an unsigned integer and specifies the length of the string. The actual byte string follows the data type encoding.
• Major Type ’3’: this type follows the same encoding specification as byte strings (major type ‘2’) but it specifically represents a string of UTF-8 characters.
• Major Type ‘4’: this type is used to represent arrays of data items. The additional information field is interpreted as an unsigned integer and specifies the number of items in the array.
• Major Type ‘5’: the major type ‘5’ represents a map. A map is composed of key/value data item pairs that are concatenated together to form the map. The first data item of each pair is the key, followed by the value. The additional information field is interpreted as an unsigned integer and specifies the number of data item pairs contained in the map. CBOR applications need to agree on what type(s) of keys are used.
• Major Type ‘6’: major type ‘6’ is a especial type used to semantically tag data items. A tag data item is used to give semantic meaning to the following data item. The CBOR specification includes predefined values for the tag data item’s additional information field. For example, dates and big numbers (bignums).
• Major Type ‘7’: this type is used to encode floating-point numbers and special data types (such as true or false).
Finally, CBOR specification provides some hints on a CBOR-to-JSON transformation. Basically, base data types are directly mapped from CBOR to JSON (and vice-versa). For instance, CBOR integers (major types 0 or 1) are mapped to JSON numbers and CBOR arrays (major type 4) are mapped to JSON arrays. Notably, JSON objects are converted to CBOR maps where each key/value pair represents one of the properties of the object. The map the keys are CBOR string data items containing the name of the JSON property.
Some specifications based on CBOR may provide integer substitutes for the JSON property names. In these cases, property names are first transformed into the assigned integer substitutes and then used as keys for the CBOR map. Thus, the keys are integer data items instead of string data items achieving a more efficient encoding. For instance, this is the approach followed by
3.2.2.2 Conclusions
CBOR does not rely on schema information and codifies the data types within the coded stream. This design decision simplifies the implementation as no cross references to context information is needed in order to decode data values. However, every data value must be preceded by a data type description (in the form of one or more data type bytes) with the added overhead on the resulting compression size. This is the case even when a JSON mapping is used to define compact keys for the CBOR maps (which are transformed into JSON objects).
The compression approach proposed in this thesis takes full advantage of JSON Schema information to achieve good compression while resulting in simple enough implementations that fit the requirements of resource-constrained devices.
The most efficient CBOR compression is achieved by mapping the JSON property names to inte-
gers and using them as keys for the CBOR map, such as the approach followed SenML [JSA+18].
However, CBOR does not define any formal concept of schema and does not provide any mechanism to define and distribute the mapping structure.