2.1 Data Integration towards XML & Relational Database Integration
2.1.8 XML data management and relational databases
2.1.8.4 Relational publishing in XML
The goal of relational publishing in XML is to transform the existing data in a relational database of a legacy system into an XML format. These data are usually stored in a pre-existing relational database (i.e. in the legacy system), and are updated through their interfaces. However, we need to provide XML wrappers to export the relational data into XML and make it accessible for Web publishing and/or data integration.
Many initiatives have been launched to fulfil this need. The DB2XML proposed in [Turau 1999] was one of the first tools to transform the result of a query or the complete content of a relational database into XML documents. It generates a description of the data in terms of a DTD and allows a transformation step of the XML documents using the XSLT stylesheets. The used mapping approach is similar to a straightforward relational to XML translation algorithm, called Flat Translation where the flat relational model maps to the flat XML model in a one to one mode.
However, DB2XML does not allow importing the XML documents, even those that conform to the same specification of the exported documents.
The relational publishing in XML is mainly achieved by defining an XML view of the relational data and then XML queries are performed over this view. Thus, the XML view is a technology used to extract data from a relational database into a specific XML format, that means an agreed formatting of element and attributes based on the design of a new XML schema expressed in DTD or XSD languages. The XML views concept (or "XViews") was first introduced in [Baru 1999]. In this paper, the author presents various approaches to select a set of candidate XViews.
The final XView is then derived by a manual refinement over the candidate XViews. Indeed, an XView over a relational schema may be defined as an aggregation of some or all of the relations in this relational schema. The used approach to derive the candidate XViews is based on representing the relational schema as a directed graph. The graph processing techniques are used to enumerate the candidate XViews. Then, the nodes with the maximum degree (or alternatively with zero degree) are the possible roots in the candidate XViews. When a schema graph has cycles these cycles should be broken. Therefore user guidance is always required to complete the task of enumerating the XViews.
In [Carey et al. 2000a], [Carey et al. 2000b], [Shanmugasundaram et al. 2001a], XPERANTO is proposed as a middleware that allows existing relational data to be viewed and queried in XML. It provides XML views over the relational database to allow users querying and structuring XML data using an XML query language without dealing with the SQL language.
Indeed, this middleware uses XML Query Graph Model (XQGM) as an intermediate representation which is general enough to capture the semantics of a powerful language such as XQuery and flexible enough for easy translation to SQL. Thus, XQuery first generates the intermediate XQGM representation. The XQGM representation helps in the query rewriting step to perform a complex XML view composition, and XQGM is then translated into the SQL statements. Therefore, XPERANTO allows the user to publish relational data as XML using a high-level XML query language to eliminate the need for application code. However, the flat relational representation is only considered through the XML view composition over the relational database, and updates queries are not supported.
In [Shanmugasundaram et al. 2001b], authors propose a SQL-based language specification that extends SQL with new scalar and aggregate functions for constructing XML documents inside the relational engine which may improve significantly the performance.
SlikRoute [Fernández et al. 2000], [Fernández et al. 2002] is another tool for relational data publishing in XML. The publishing process is accomplished in three steps: (1) relational tables are represented in a canonical XML view; (2) a public, virtual XML view is specified in the XQuery language by the database administrator; (3) an application allows formulating Queries in XQuery language over the public view. Then, the proposed algorithm translates the XQuery expression into SQL and decomposes the XML view over the relational database into an optimal set of SQL queries. All the previous approaches only consider the flat representation of the relational tables and they do not consider the integrity
constraints in the XML view, which may cause a problem in writing the XML queries properly.
One important issue in the relational publishing in XML is the transformation from the relational schema into XML schema which is used then to publish the relational data in XML. This issue has been well-studied in the literature and several transformation approaches of a relational schema into an XML schema have been proposed. In [Lee et al. 2001] a straight way, called Flat Transformation method, is presented. This method converts relations and attributes of a relational schema respectively into elements and attributes of a DTD. Lee et al. in [Lee et al. 2002], [Lee et al.
2003] give two algorithms called NeT (Nested-based Transformation) and CoT (Constraints-based Transformation). The NeT algorithm derives nested structures of the flat relations by repeatedly applying the nest operator based on the tuples values of each relation. Therefore, the resulting nested structures are not based on the relational semantics of the schema. The CoT transformation considers the dependencies inclusion constraints to create a more intuitive DTD. An algorithm to transform a relational schema into a DTD by taking into account the functional dependencies has been proposed in [Lv and Yan 2007]. But all these previous works fail to capture all the semantic and integrity constraints of the relational schema and they introduce some redundancy, due to the use of DTD as a target schema language.
There are also several works taking XML Schema as the target schema language. In [Fong and Cheung 2005], the authors propose to translate a relational schema into an Extended Entity Relationship (EER) model which is then transformed into a conceptual XSD graph. The XSD graph is finally mapped into an XML schema. Thus, two steps are introduced which may result in some level of inefficiency. The method proposed in [Yang and Sun 2008] to automate the transformation of the relational schema into an XML Schema, only considers the semantic of inclusion dependencies. Two other recent works about the transformation from a relational schema into XML Schema are presented in [Liu et al.
2006], [Zhou R. et al. 2008]. However, they cannot capture the hierarchical view of the relational model, which is especially useful to provide a complete and consistent mapping between XML and relational schemata needed for performing a coherent series of SQL update queries in cascade.
In addition, they did not consider the automation process because their transformation algorithm needs the intervention of an expert to decide which relation is the “dominant relation”.
In [Nayak et al. 2010], a relational data publishing approach is proposed according to the customer request and his conditions. In this
approach the schema of the database is not exposed to the customer, which may be useful for security. The proposed middleware consists in four main components: (1) the XML to SQL schema mapping that maps the XML requested format to the database schema, (2) the XML to SQL translation which is designed for a set of statements for a particular business process, (3) the SQL to XML translation to convert the SQL query result into an XML file, and (4) the XML to XML schema translation to map the schema of the result XML file into the customer schema.