B. VERIFICATION OF CANDIDATE BINARY XML FORMATS:
7. Testing Framework - Results for Round-Trip Conversions
technique. The round-trip decoded XML document is compared to the original input XML document with a differencing process that supported all the fidelity options and
utilized PSVI criteria to be XML structure aware, e.g., aware of XML attribute and element order variances that are allowed. The results of this test are a simple pass or fail;
the round-tripped document either matched the original or did not.
The conclusion of round-tip testing was that the candidates Xebu, FXDI, Fast Infoset, and EXI passed all tests with the other candidates receiving at least one failure.
C. EXI SELECTION AND BASELINE (GZIP) TESTING
Given the results from the W3C framework testing, EXI was the only technique that stood out in all three measurements, EXI was therefore selected as the candidate technique for further study and possible standardization into the XML stack.
Since GZip is the traditional method to compress XML, as well as other text-based files, a full EXI-to-GZip benchmark test was then conducted evaluating both in terms of size, efficiency, and W3C use case property demands compliance. Similar to the candidate selection testing, the baseline testing is also defined within the W3C Efficient XML Interchange Measurements Note (W3C, 2007) and evaluation results from Efficient XML Interchange Evaluation (W3C, 2008).
1. Compactness Comparison
One of the W3C’s property goals from the XBC effort was to produce a general non-domain-specific algorithm that delivers a compact, less-than-original XML
document, binary XML output. In other words, the algorithm must always, and for all cases, deliver a compressed file size that is less than the original file. If any technique ever delivers results over 100% of the original raw XML file, then that technique should be discarded, even if such occurrences are rare. This does not mean the selected
candidate had to always outperform the other techniques just that it has to always deliver a result file less than the original input XML document. Of course, the best candidate should more often than not deliver a result file superior to all other candidates in addition
of the GZip and EXI comparison test results for the test-corpus of documents; EXI is the blue line, GZip is the pink line, and the raw text XML document is the red line.
Figure 23. EXI Compactness Comparison to Traditional GZip (From W3C, 2008)
A summary of the results of the baseline compactness comparison:
EXI at worst is equal to GZIP, though in general is a more compact representation of XML.
EXI scales better than GZIP as documents grow in size.
EXI was smaller than the original XML document size for every case, while GZip had a number of cases that exceeded the original document file size.
2. Property Comparison
A comparison of EXI and GZip in terms of the W3C XBC working group required properties are summarized in both Table 17 and Table 18 (W3C, 2008).
XBC Property GZip EXI
Directly Readable and
Writable No Requires the creation
of an intermediate file Yes
EXI is a dynamic event drive technique with support API
Fragmentable Yes Yes Can represent any fragment
Streamable Yes Yes
Property GZip EXI Processing
Efficiency Prevents Does Not
Prevent
Both memory footprint and speed are better
Small Footprint Does Not Prevent
Efficiency Prevents Does Not
Prevent
Table 18. Comparison of W3C Binary XML Property Demands Between GZip and EXI (Must Not Prohibit) (After W3C, 2008)
EXI is able to meet nearly all of the W3C properly demands for a compact binary XML format. GZip in a line-by-line comparison does not indicate it is a terrible
consideration, but in terms of compactness, EXI excels.
GZip’s success is from its wide acceptance in the IT world as the standard
compression technique, much like the way XML achieved its success as the standard data exchange format. However, unlike XML’s success story, for GZip, there is a better compression algorithm for the XML family of technologies, and that is EXI.
3. Generality Comparison
Similar to the property comparison, the overall generality results of EXI-to-GZip-to-XML itself is listed in Table 19, where an indicates compliance and an empty cell is not compliant. Out of the 20 general comparisons, EXI was able to meet 95% of the criteria (19/20); namely all criteria except for exact preservation of whitespace formatting, which by definition of PSVI rules is allowed to convey significant information.
Criteria XML GZip EXI
Can represent documents without a schema
Can represent documents that include elements and attributes not defined in the associated schema (i.e., open content)
Can represent any schema-invalid document
Can leverage available schema information to improve compactness, processing speed, and resource utilization
Can leverage available schema information to improve compactness, processing speed, and resource utilization even when documents contain elements and attributes not defined in the schema
Can leverage available schema information to improve compactness, processing speed, and resource utilization for any schema-invalid document Can leverage document analysis to improve compactness
Can suppress document analysis to increase speed and reduce Resource utilization
[optional] Can adjust document analysis to meet application performance and resource utilization criteria
Can structure the binary XML stream to increase net compactness when off-the-shelf compression software is built in to the communications infrastructure
[optional] Supports high fidelity XML representations that preserve an exact copy of the original XML document, including all whitespace and formatting
Supports reduced fidelity XML representations that preserve all data model items, but discard whitespace and formatting to improve compactness
Supports reduced fidelity XML representations that preserve all information needed by a particular application, but discard specified information items that are not needed (e.g., comments and processing instructions) to improve compactness
Supports reduced fidelity XML representations that preserve the logical structures and values of an XML document, but discard lexical and syntactic constructs to improve compactness
Can consistently produce XML representations that are close to the same size or smaller than XML documents compressed using GZip
Can consistently produce more compact XML representations than XML documents compressed using GZip
Can consistently produce more compact XML representations than binary XML documents created with document analysis suppressed, then compressed using GZip
Can consistently produce XML representations that are close to the same size or smaller than the equivalent ASN.1 PER encoding plus 20%
Can consistently produce XML representations that are more compact than the equivalent ASN.1 PER encoding plus 20%
[optional] Can consistently produce XML representations that are more compact than the equivalent ASN.1 PER encoding plus 20% compressed using GZip
Totals 8 10 19