Transformation rules from XML to systems of set equations

10.2 Practical representation of WDB as XML

10.2.3 Transformation rules from XML to systems of set equations

Let us show how any XML-WDB document, as described above, can be treated as a system of set equations by using the following simple transformations (applicable, in fact, to arbitrary XML documents, but giving the desired system of set equations only for the XML-WDB documents). There are however currently some restrictions on XML-WDB in these transformation rules which can easily be relaxed, for example attributes having many valuesattr="value1 value2 ..."are not taken into account.

10.2.3.1 Elimination of attributes and text data

The first two transformation rules, applied recursively, will eliminate attributes and atomic (text) data from arbitrary XML element by treating them as tags.

Rule 1(Attribute elimination, except attributesset:id,set:refandset:href). XML tags which have attributes,

<tag attr="value" other-attributes> some-content </tag> transform to <tag other-attributes> <attr>value</attr> some-content </tag>

whereattris restricted to be any attribute name except the distinguished attributesset:id,

set:refandset:hrefbelonging to theset namespace which will be considered later. Additionally,some-contentmeans arbitrary XML content of an XML element.

In the case of empty element with attributes,

<tag attr="value" other-attributes /> transformation quite analogously gives the similar result,

<tag other-attributes> <attr>value</attr> </tag>

This rule is applied until all attributes, except those attributes beglonging to thesetnamespace (set:id,set:refandset:href), are eliminated. This way attributes are actually treated as tags.

Rule 2(Atomic data elimination). Text data with no white spaces

any-text-data

transforms to the empty XML element

<any-text-data/>

In the case of text data containing white characters (spaces, carriage-returns, tabs),

any text data

all white characters are ignored, and the result is the corresponding sequence of the empty elements,

As our set theoretic approach ignores order and repetitions (in contrast with the ordinary XML approach) this, in fact, means that a sentence (any text data) is considered rather as an unordered set of words. This way text data are actually treated as tags. (An another alternative would be to replace all white characters by the underscore symbol, thus giving rise to<any_text_data/>, like above.)

Iterated application of rules 1 and 2 eliminates all atomic (text) data and attributes except those attributes belonging to thesetnamespace (set:id,set:refandset:href).

10.2.3.2 Elimination of tags

The remaining rules below allow transformation of XML elements with (simple) attributes and text data eliminated by the above rules into bracket expressions (possibly involving set names), and into set equations if there are tags set:eqns and set:eqn occurring as described in Definition 5. In the intermediate steps, the expression transformed will be in the mixed language.

Rule 3(Tag elimination, except the tagsset:eqnsandset:eqn).

For arbitrary XML tags, exceptset:eqnsandset:eqn, which have no attributes, <tag>

some-content </tag>

transforms into

tag:{some-content}.

Those possibly remaining tags in sub-elements of some-content will be eliminated recursively by application of transformation rules 3 and 4. Quite analogously for the case of the empty element,

<tag/> transforms to

tag:{}

Rule 4(Elimination of tags withset:refandset:hrefattributes).

<tag set:ref="set-name" /> transforms to the sequence

tag:set-name

Recall that other attributes were already eliminated by Rule 1. Furthermore, according to the definition of well-formed XML document an attribute name must only appear once in any tag, however,set:refandset:hrefmay participate together in any tag. The above elimination is considered as typical if only the attributeset:reforset:hrefoccurs. Additionally, we must consider the following more general, however unlikely case when some content is present:

<tag set:ref="set-name1" set:href="set-name2"> some-content

transforms to

tag:set-name1, tag:set-name2, tag:{some-content}.

However, to be consistent with the first version of Rule 4, ifsome-contentis empty, then (as an exception) the result should not contain the labelled element,tag:{}.

The above rules hold also for the case of the attributeset:href, or when bothset:ref

and set:href are present within a tag. Note that after applying Rule 4, the difference between these two attributes is not taken into account in generating the result. Recall that

set:refrefers to a simple set name, whereas, set:hrefrefers to a full set name which is actually an URL together with simple set name (see Section 10.2.2). Such syntax explicitly differentiating between simple and full set names is convenient for implementation. After applying this rule this feature will disappear, but the difference between the shapes of simple and full set names will remain, so that nothing essential will be lost.

Rule 5(Elimination of tagsset:eqnandset:eqns).

<set:eqn set:id="simple-set-name">some-content</set:eqn>

is replaced by the equation,

simple-set-name = {some-content} and, <?xml ... > <set:eqns>some-content</set:eqns> is replaced by some-content

that is, by system of set equations (in the case of a well-formed XML-WDB document; cf. Definition 5 above).

Note that, all the above rules can be applied in arbitrary order, leading to a unique system of set equations.

In document Hyperset approach to semi-structured databases and the experimental implementation of the query language Delta (Page 156-160)