Example - CTC Codification Algorithm - A Context- and Template-Based Data Compression Approach

4.2 CTC Codification Algorithm

4.2.2 Example

In order to clarify the application of the rules and equations explained in the previous section, the step by step codification of the XML instance document shown in Figure 4.9 (which follows the “Notebook” schema of Figure 4.6) is described here. For simplicity reasons, the example below only expands the first occurrence of the XML element “note”.

First, the SchemaId of the schema is codified, followed by the root eContext: CodS(sNOTEBOOK) ⇒ SchemaId(sNOTEBOOK) ⊕ CodEC(CROOT)

Next, the prolog Element of the root eContext is processed, followed by the content of the data model instance:

CodEC(CROOT) ⇒ CodE(CROOT, ePROLOG) ⊕ CodE(CROOT, eCONTENT)

Section 4.3. Summary and Conclusions 77

The “notebook” XML element is codified into the stream taking into account that the eContext’s Type is choice:

CodEC(CCONTENT) ⇒ 01⊕ CodE(CCONTENT, enotebook)

CodE(CCONTENT, enotebook) ⇒ CodT(enotebook) ⇒ CodEC(Cnotebook)

The “notebook” eContext contains two Elements, one for the XML attributes and another for the “note” XML element:

CodEC(Cnotebook) ⇒ CodE(Cnotebook, enotebook_att) ⊕ CodE(Cnotebook, enote)

The “note” Element is an array with length two:

CodE(Cnotebook, enote) ⇒ 11⊕ CodT(enote) ⊕ 11⊕ CodT(enote) ⊕ 01

The “note” eContext contains three Elements, one for the XML attributes and another two for the “subject” and “body” XML elements:

CodT(enote) ⇒ CodEC(Cnote) ⇒ CodE(Cnote, enote_att) ⊕ CodE(Cnote, esubject) ⊕

CodE(Cnote, ebody)

The “note_att” eContext contains the attributes of the “note” XML element. It is a dynamic eContext with two child Elements:

CodE(Cnote, enote_att) ⇒ CodT(enote_att) ⇒ CodEC(Cnote_att) ⇒

⇒ 12 ⊕ CodE(Cnote_att, edate) ⊕ 22 ⊕ CodE(Cnote_att, ecategory)

Finally, basic type Elements are directly encoded using the EXI codification standard for built-in EXI data type representations. For instance, for the “subject Element of type string, the value is codified as:

CodE(Cnote, esubject) ⇒ CodT(esubject) ⇒ 38 ⊕ “EX I00

4.3 Summary and Conclusions

In this chapter, we presented Context- and Template-based Compression (CTC), a compression approach for standard data model representation formats. CTC provides a data model representation encoding targeted at resource-constrained devices that is more efficient than standard

formats but that allows seamless transformation between the CTC format and the original format. The specification of the core components of CTC (context table and template table) is included as well as how these core components are created from standard data format schemas. We also provided two specific examples for XML and JSON Schema mappings. Finally, the chapter described in detail the CTC Algorithm used to encode/decode CTC streams based on the information stored in the context table and template table.

The verbosity of text-based data formats requires system resources that might be beyond the capabilities of the resource-constrained devices typically used into IoT networks. CTC tackles this problem by enabling the interoperable integration of heterogeneous devices at the data representation-level while requiring very low resource needs in terms of communication bandwidth, memory usage and processing power.

Additionally, CTC supports interoperability-driven approaches such as the Web of Things. CTC eases the seamless use of Web Services by enabling the native use of standard data model representation formats Web Services are based on.

5 |

CTC Communication Model

The previous chapter focused on the description of the core CTC components and mechanisms. However, CTC coding/decoding components alone do not provide all the functionalities needed to be directly used together with a distributed application. This chapter describes the CTC communication model, how it fits within a distributed system, and the complementary mechanisms needed to be effectively used.

Although the CTC communication model is mainly designed to be used by resource-constrained devices, it is very flexible and simple. The CTC communication model is easily adapted to vari- ous scenarios and in conjunction with distinct technologies, targeted at resource-constrained domains or not.

Hence, although in this chapter we assume that the compression technology used is CTC, the proposed solution can be also applicable to other data compression technologies for structured data, such as EXI or CBOR.

The following sections describe the general communication architecture followed by CTC enabled systems, the complementary mechanisms needed to manage the interchange of schemas as well as a specific and practical implementation of CTC based on CoAP to show the applica- bility of the CTC communication model on a standard communication protocol.

5.1 Communication Architecture

CTC is conceived as a component within a distributed system such as the one shown in Figure 5.1: connected nodes (usually resource-constrained devices) are deployed in a local network and an edge router or gateway is used to access external networks and nodes. This architecture is similar to communication architectures found in traditional Low Power Wireless Personal Area Networks (LPWPAN) and the IoT in general.

CTC communication architecture can be integrated into networks and architectures with other topologies such as clusters of local networks or two local networks connected by an Internet link. Nevertheless, this section considers the basic architecture depicted in Figure 5.1 because it is easily scalable and extrapolated to other, more complex, architectures.

Depending on the application domain, nodes belonging to a (sub-)network interchange data with other nodes that may reside in the same (sub-)network or in an external network, i.e.,

Constrained Devices Network Gateway / Schema Repository Internet External node External node

Figure 5.1: CTC communication model general architecture.

a network accessed through an edge router. If two connected nodes codify the transmitted data following the same encoding/format, no data transformation will be required in order for the two nodes to understand each other’s data. If the two nodes are separated by a gateway, no application-level transformation will be needed and the communication will be effectively end-to-end, with the gateway acting as a mere router. This is the simplest communication use case.

However, compression technologies targeted at resource-constrained systems (such as CTC) are specially conceived for those cases in which an (external) node uses a data format not suitable for resource-constrained nodes. Thus, data needs to be translated to CTC in order to be efficiently used within the constrained nodes’ network. In this case, one of the connected nodes does not implement CTC (i.e., it makes use of data in their original format) and the gateway will act as an application-level gateway, translating the original data format to CTC and vice-versa. In order for the gateway to fulfil this role, it needs to meet three requirements: 1) it must contain a CTC implementation, including schema management, 2) it must have access to the data (i.e., the payload of the messages) and 3) it must have access to the schema information of the interchanged data.

Regarding the third requirement, CTC enabled gateways and nodes need to know the context tables and template tables (and their identifiers) associated with the data models they are using. As explained in Chapter 4, this information is extracted from the schemas of the data models themselves. Thus, the schemas of the data models must be disseminated before CTC can be applied. Additionally, schemas must be uniquely identified within the CTC enabled (sub-)network. This requirement is because, in order to decode a CTC stream, the identifier of the schema against it has been encoded must be inserted in the stream itself. This identifier must be as compact as possible (as opposed to traditional URIs which tend to be verbose) in order to avoid unnecessary overhead.

In the CTC communication model, schema information is collected and made available by the schema repository. Nodes communicate with the schema repository in an initial dissemination phase, in which schema information is distributed and registered. Thus, a schema repository acts as a centralized resource information base (where the resources are schemas) and provides

In document A Context- and Template-Based Data Compression Approach to Improve Resource-Constrained IoT Systems Interoperability (Page 98-103)