10.2 Practical representation of WDB as XML
10.2.1 XML-WDB document format
In general, an arbitrary XML-WDB document is defined as follows.
Definition 5 (XML-WDB; see also Section 10.2.4 for the corresponding XML schema). A well-formed and valid XML-WDB file is an XML document with the root element
<set:eqns> containing possibly several <set:eqn> sub-elements. The<set:eqns>
element should contain no attributes, whereas, the element <set:eqn> should contain the requiredset:idattribute only. The value of the attributeset:idshould have a unique value (across the whole document) called thedefined set nameand can only be be a string of symbols which is anysimple set name(according to the syntactical category<simple set name>
in the BNF). The elements<set:eqns>, <set:eqn>, and the attributeset:idare not allowed to appear anywhere else in the document. The element <set:eqn> can contain possibly several arbitrary XML sub-elements. The attributesset:refandset:hrefcan appear (at any depth) in those arbitrary elements under <set:eqn>. The values of the attributes set:refand set:hrefare called referenced set names, and must correspond to some existingset:idvalue in the same XML-WDB document in the case ofset:ref, orset:idvalue in some other XML-WDB document in the case ofset:href. To this end, the value of the attributeset:hrefshould befull set name(as discussed in Section 10.2.2; cf. the syntactical category<set name>in the BNF) consisting of an (XML-WDB file) URL and simple set name defined in that file (delimited by #).
Everything else allowed by XML standard, what is not forbidden by the above restrictions, is permitted in the XML-WDB format.
Note 6. The important feature of this definition is that XML-WDB documents can contain quite arbitrary XML elements under<set:eqn>, thus allowing to include arbitrary XML data with any nesting, any text data and any attributes2(exceptset:id, and with restrictions on values of set:refandset:href, as described above) into our hyperset approach to WDB. However, the order and repetitions of data will be irrelevant for our approach, and the usual XML attributes (except the attributesset:refandset:hrefwhich have a special role, as described above) will be treated rather as tags which permit no further nesting.
2
10.2.2 Distributed WDB
Any WDB system of set equations may be divided into several subsystems (as XML-WDB files) with the possibility for the set namessparticipating in one subsystem (XML-WDB file) to be defined by set equations s = {. . .} either in the same or in some other subsystems (XML-WDB files). Thus, strictly speaking, we should always consider the corresponding full versions of set names defined in set equations of distributed WDB, even when a simple set name is used for simplicity. That is, each simple set name occurring as a value of set:id
or set:ref attributes within an WDB-XML file should be understood as full set name obtained from the URL of this file by concatenating it with the simple name using#to delimite these parts. Moreover, this technique allows to avoid unintended simple set name clashes without cooperation or collaboration between the authors of distributed WDB-XML files. (Unfortunately, unintended clashes for using the same label for different intuitive meanings is still possible, however, this is not formal contradiction in our approach. Here the well-known idea of namespaces in XML could be used.)
Figure 10.2: Example distributed WDB representing two fictitious families, divided into two fragments represented as white and grey nodes
Defined set names appearing in some XML-WDB file can participate as referenced set names in the same or other XML-WDB files. Those set names defined in the same XML-WDB file are referenced as simple set name values of the attributeset:ref, whereas, set names defined in some other XML-WDB file are referenced as full set name values of the attributeset:href. It is required that each full set name should refer to an existing XML-WDB file and the set equation within that file for the simple set name part (after the#symbol).
Let us now consider an example of distributed WDB, representing two families (visualised in Figure 10.2) and the corresponding XML-WDB filesfamily1.xmlandfamily2.xml
(XML files 2 and 3) appearing below. Both simple and full set names participate as referenced set names in this example distributed WDB. For example, take the labelled element
daughter:emmarepresented in XML-WDB filefamily1.xmlas
where the attributeset:refrefers to simple set nameemmadefined within the same file. As an illustration of distribution, consider the labelled elementfriend:markrepresented as
<friend set:href="...family2.xml#mark" />
where the attribute set:hrefrefers to set name markdefined in the file family2.xml. Note that, the URL in this example has shorted for the sake of simplicity.
XML-WDB file 2Family database fragment (cf. grey nodes Figure 10.2): family1.xml <?xml version="1.0"?> <set:eqns xmlns:set="http://www.csc.liv.ac.uk/˜molyneux/XML-WDB"> <set:eqn set:id="bob"> <daughter set:ref="emma" /> </set:eqn> <set:eqn set:id="alice"> <daughter set:ref="emma" /> </set:eqn> <set:eqn set:id="emma"> <friend set:href="...family2.xml#mark" /> </set:eqn> </set:eqns>
XML-WDB file 3Family database fragment (cf. white nodes Figure 10.2): family2.xml <?xml version="1.0"?> <set:eqns xmlns:set="http://www.csc.liv.ac.uk/˜molyneux/XML-WDB"> <set:eqn set:id="paul"> <son set:ref="mark" /> </set:eqn> <set:eqn set:id="amy"> <son set:ref="mark" /> </set:eqn> <set:eqn set:id="mark"> <friend set:href="...family1.xml#emma" /> </set:eqn> </set:eqns>
The analogy of WDB with the WWW and, in particular possible distributed character of WDB does not imply it is necessarily so huge and unorganised as the WWW. It could be distributed between several sites, and supported by specialised WDB servers of some departments of an organisation owning this WDB and maintaining some specific structure of this WDB.
Thus, WDB might, in fact, be much more structured than the WWW, however, the general approach imposes no restrictions. Therefore, the concept of WDBschemaortypingrelation between hypersets or graphs (much more flexible than for the relational databases and based on the notion of bisimulation or “one-way” simulation) relativised to some typing relation on labels/atomic values can be considered for such databases [9, 41, 57, 69]. Here we will not go into details of this important topic as our main concern is the straightforward implementation of querying WDB which does not take into account any such WDB schemas.