• No results found

3.2 Structured Data Compression

3.2.4 Other Proposals on Compression for Structured Data

This section describes other compression proposals found in the state of the art not covered in the previous sections. These works did not result in formalized compression technologies or standards, and could be considered as still being open researches. However, they are mentioned here because of their relation and significance to this thesis. Some of these compression proposals make explicit use of templates or pattern repetitions in order to represent structured data in a more compact encoding. Other proposals covered in this section are specifically targeted at resource-constrained devices.

Hoeller et al. [HRN+08, HRN+10a, HRN+10b] identified the inefficient management of XML

formatted data in resource-constrained devices as a barrier to overcome in order to achieve full interoperability. They defined a series of mechanisms to efficiently manage XML in terms of processing, storage and transmission.

In the solution proposed by Hoeller et al. identifiers (such as XML element names) are first separated from the original XML data and stored in program/flash memory. In a second step, the XML data is processed to extract repeating structures and templates. In this step, the static and dynamic data of the XML data are separated and embedded into a structure denoted an XML Template Stream (XTS). The XTS is encoded in a memory efficient way. For instance, the repeated structures are substituted with references to the templates. Finally, the XTS is encoded using a binary format based on Huffmann encoding [Huf52].

Hoeller et al. also developed a pre-compiler tool called XOBESN [HRN+10b]. This tool allows

an easy integration of XML structures into C programs that are later translated into plain C (compilable with standard compilers). Additionally, it makes use of reused structures to efficiently store and process XML documents. The client/server communication model is based on XPath queries and optimized for this purpose.

The use of pseudo XML structures in the code makes the solution proposed by Hoeller et al. heavily tied to XML. This thesis proposes a more natural data binding technique by using native C structures, giving a convenient abstraction of the underlying original data representation format.

Section 3.2. Structured Data Compression 39

The work presented by Hoeller et al. does not provide a formal encoding or compression format for data transmission. The use of templates is suggested for the transmission of data but few details are given.

Käbisch et al. propose in [?] a solution to generate optimized XML-based Web Services using EXI and targeted at resource-constrained devices. Basically, the solution uses a SOAP WSDL as input in order to generate the required EXI grammar, optimized EXI processor and binding code stubs. The paper also includes performance results that show the significant efficiency improvement regarding message size and code footprint.

Later, Käbisch et al. extended the core approach presented in [?] to propose a solution for efficient processing and storing of RDF documents in resource-constrained devices: µRDF [?,

?]. This solution enables the efficient use of semantic data in resource-constrained devices,

following a similar approach to WoT. The paper defines an XML Schema to describe the RDF document structure and uses it to generate the EXI grammar. Additionally, the paper presents µRDF, a semantic repository that efficiently represents and stores RDF data.

Käbisch et al. also explored other uses of EXI to optimize network traffic based on service filtering [?]. These filters are applied directly over EXI grammars and they avoid the trans- mission of unwanted/useless data. The main contribution of the research line of Käbisch et al. to this thesis, is to be an interesting use case of an EXI application and solutions targeted at resource-constrained devices.

TinyPack XML [SML12] is an XML compression method for WSN that takes advantage of the structured nature of XML and the similarity between data messages consecutively transmitted. Each data message to be transmitted is analysed and compared to previous messages. The common sections of the XML data are extracted and set as “format strings”. A compact identifier is assigned to the format strings and they are advertised to the (sub-)network. Dynamic data is encoded using techniques specific to the data type.

In TinyPack, the format string is refined with each every transmitted data message. This means that the data message has to be preprocessed every time in order to check for variations on the structure. The paper argues that there is indeed additional processing involved but that it is compensated by the savings in transmission time. Each time the format string is modified, it has to be advertised to the network. If the nature of the messages changes often, this implies sending the format string very frequently.

They also propose other optional methods to extract the format string. These optional methods include the extraction of the format string from the XML Schema, although this would imply sending a lot of overhead data because, usually, all the elements within an XML Schema are hardly used all together. They also claim that the format string could be defined by hand on a message by message basis. Although this would result in an optimal assignment of format strings, it would rely on the end users skills and could be cumbersome for development processes. Nevertheless, no data regarding the performance of these methods is provided and their benefits are qualitatively exposed.

Packedobjects [Moo09, Moo10] was first designed to implement network protocols in a com- pact format. The compression approach used by Packedobjects is based on the efficient encoding

of the data types, which are previously extracted from a schema. Later on, Packedobjects was applied to XML compression in [MKB13, MKB14, KMB13]. The data types used for the encod- ing are extracted from XML Schemas for which Packedobjects implement a subset of the XML Schema specification.

The experiments performed in [KMB13] show that Packedobjects outperforms XML compres- sion technologies. However, the XML compression technologies used for the comparison are mainly not based on schema information (such as XMLPPM and XMILL), which are the ones showing the best performance. Additionally, they do not compare it to EXI which was the leading XML compression technology at the time.

A performance evaluation of Packedobjects against EXI is presented in [BMKR15]. The paper shows that Packedobjects and EXI have similar compression ratios but that Packedobjects shows a better processing performance. However, the comparison was made between a C Packedobjects implementation and a java EXI implementation (EXIProcessor) that is not intended for resource-constrained devices nor is as optimized as a C application. Thus, the comparison is not made under fair conditions.