• No results found

C. HEADER

4. Options (Header Part 4 of 5)

EXI provides several encoding options that enable EXI streams to be created with lossless or lossy considerations; lossy encoding being unable to recreate the exact input XML document at decode, and lossless being able to recreate an exact replica of the input XML document. Table 29 lists the available EXI options, their default settings and a short description of the options impact on an EXI stream. These options when exercised enable a higher level of compactness; although such compactness can come at an

efficiency cost depending on the domain of interest. An introductory example of when

these options must be carefully exercised is when dealing with XML security and XML encryption, both discussed later in this chapter.

EXI Options Description Default

Value Alignment Alignment of event codes and

content items Bit-packed

Compression EXI compression is used False

Strict Strict interpretation of schema is

used False

Fragment Body is a fragment and not a full

XML document False

Preserve Specifies whether or not to

preserve All false

Self Contained Enables self contained elements False Schema ID Schema ID used to encode the EXI None DatatypeRepresentationMap Datatype used to encode values in

EXI body None

Block Size The blocking size for EXI

compression 1,000,000

Value Max Length Largest string that can be added to

the string table Unbounded

Value Partition Capacity

Maximum capacity of the VALUES portion in the string

tables Unbounded

User Defined User defined options none

Table 29. EXI Header Options and Default Values (From W3C, 2008) When options are employed, they are defined within an EXI options XML document based on the options schema listed in Table 30. This EXI options XML

document is then encoded into the EXI options header field using the default EXI options settings except the option of byte-alignment is used instead of the default bit.

<xsd:schema  

Table 30. Options Schema for EXI Header Options XML Generation (From W3C, 2008)

There is no specified method defining how to handle options when options are not listed within the EXI header but were used in the encoding. That is, the EXI stream’s header options present bit is 0 and the options field is empty, but options are present in the encoding of XML to EXI. The specification does note that the use of anything other than the default values would likely be in a control system where specific preexisting knowledge about the EXI format is known. The likely best choice outside of preexisting domain knowledge would be to assume the default options, as indicated in the header.

Potentially this loophole might be used to hide information within an EXI stream if the EXI processor does not note it.

a. Alignment Options

As the title of the option suggests, this controls the alignment of the body of the EXI stream. The available encodings are: bit-packed, byte-aligned,

pre-compressed, which is blocked and channelized but not pre-compressed, and compressed. The default alignment is bit-packed. Byte-aligned is required for many domain-cases, such as digital signatures and encryption, and is also a good option to use when troubleshooting given the results are in ASCII aligned format, readable in notepad and other simple text-editors.

b. Strict Option

The Strict option is normally set to false, but when true, this option prunes (discards or ignores) namespace, comment, processing instructions and self-contained events from the input XML document. The strict option relies strictly on the supplied schema for the EXI stream structure, only using the XML document for values: attribute values and element content. At decode, through the same scheme used at encode, all pruned events, other than comments, can be reconstructed based on the schema. When exercised this option provides for a more compact stream.

Caveats to this option are it can only be used if a schema is provided, and any deviation within the input XML document from the provided schema will result in a fatal EXI processing error. Strict, as its name implies, enforces strict schema compliance for input XML documents.

c. Fragment Option

Normally false, this option indicates whether the steam is a document or a fragment. A fragment is a sequence of well-formed XML elements or processing

instructions, that although appears like a XML document, are not standalone valid XML.

For example, fragments do not have an XML header <?xml version="1.0"

encoding="UTF-8"?> which a standalone XML document contains, but a fragment may contain elements and attributes.

d. Preserve Options

Normally false, EXI enables the ability to prune certain events from an XML document that do not impact the content of the document for some applications.

Often these event items can be removed without impacting any aspect of the XML file, and for compactness considerations, can be stripped from the EXI stream while retaining the spirit of the original input XML document. Table 31 describes the pruneable event options. However, none of the preserve options can be exercised if the strict option is used.

Fidelity Option Effect

Preserve.comments Retains any XML comments within the document Preserve.pis Retains any processing instructions within the

document

Preserve.dtd Retains any DTD within the document

Preserve.prefixes Retains any namespace prefixes within the document Preserve.lexicalValues Lexical form of elements and attribute values

preserved

Table 31. EXI Fidelity Options: Event Preservation Options (From W3C, 2008)

Any pruned events during the encoding of an EXI stream from an XML document are lost, and cannot be reconstructed directly from the EXI stream when decoding, and so such encoding is lossy. These pruning options can normally be

employed without risk, but domain-case specifics must be taken into consideration before their employment is used.

e. Self-contained Option

Normally false, this option enables faster indexing through elements that are read independently of the rest of the EXI body. The self-contained cannot be used if the compression or the pre compression alignments are used; it is only applicable to bit and byte alignments.

f. Schema ID Option

Normally omitted, this option identifies the schema used to encode the stream. The format of this field is not defined and is left to the implementers and users.

This field might be a URI or other indicator that enables identification and retrieval of the schema of interest. This allows for domain-specific schema-aware document encoding algorithms so that unique domain-cases can leverage their architecture to achieve the maximum compactness possible. An example of this approach might be a production system that uses only one schema, and always uses that schema. Such a production system could forgo the identification of the schema for a hard-coded schema design directly within the EXI processor, by knowing that all EXI steams it will encounter are expected to conform to the one known schema.

g. Datatype Representation Map Option

Normally omitted, like the schema ID field, the Datatype Representation Map field uniquely identifies a list or map of datatypes used in the encoding of this stream. Data maps are covered in more detail later in this chapter within the datatype

h. Block Size Option

This defines the data block size for compression windows with default size of 1,000,000. The EXI compression technique works on blocks of data, and this field defines the maximum size of each block. The EXI compression methodology is covered with greater detail later in this chapter within the compression section.

i. Value Max Length Option

Normally unbounded, this indicates the largest string that can be added to the “value” portion of the string tables. Note that string tables are essential components of EXI, and are covered in detail later in this chapter. If the character count of a string is larger than this maximum length, the string is not added to the string table. Instead, it is written directly to the EXI stream as an ASCII string literal. This ensures one-time long string values do not clog up a string table with unique occurrences, such as a multi-sentence paragraph within an element. The reasoning for this approach is that one-time paragraphs will not likely be repeated exactly within an XML document. Adding them to the string tables and creating indexes cost more in terms of compactness than writing the original paragraph directly to the EXI stream.

j. Value Partition Capacity Option

Normally unbounded, this option specifies the maximum number of strings allowed for the global and value string tables. Note, string tables are essential components of EXI, and are covered in detail later in this chapter. Each new string encountered within the XML document is added to the string tables only if there are less than Partition Capacity items currently within the tables. In either case, the string is written to the EXI stream as ASCII for the first occurrence.

k. User-Defined Option

This field is undefined for future or domain-case specific needs.