An XML file can be either centric or document-centric. The data-centric type reflects that data in XML is highly structured and is commonly stored in databases. The document-centric type concerns semi structured textual content such as books (Bourret, 2005; Sun and Wang, 2011; Noaman and Almansour, 2012). Only the data centric type is relevant here due to its relationship to databases application.
There has been a debate as to whether XML is a database or not. XML can be considered a technology that used to build databases since it has the ability to store and retrieve data like other types of databases (Bourret, 2005; Sun and Wang, 2011; Noaman and Almansour, 2012). It includes many common databases features: it stores data in XML files, owns schemas (DTD and XML Schemas) and query languages (XPath and XQuery), and provides interfaces based on programming languages such as DOM and SAX. At the same time it lacks many features of database management systems such as security, multi- access, and recovery (Steegmans et al., 2004; Bourret, 2005; Noaman and Almansour, 2012). These limitations call into question XML’s status as a database. Researchers have been concerned about these limitations and have tried to develop the XML database environment. This research is one such attempt. It aims to improve security in XML databases.
An XML database can be categorised into either an enabled XML database or a native XML database (Steegmans et al., 2004; Bourret, 2005;
Molina et al., 2009; Papamarkos et al., 2009; Elmasri and Navathe, 2011). The enabled XML database stores data based on existing approaches using traditional
19
databases such as relational databases. The most important feature of using this type is to support existing applications, since a large number of XML files are already stored in relational databases (Steegmans et al., 2004; Papamarkos et al., 2009; Abd El-Aziz and Kannan, 2012b). This type depends on well-known and familiar approaches. It requires mapping techniques to transfer data from the XML structure to the relational structure (Steegmans et al., 2004; Elmasri and Navathe, 2011). It suffers from limitations. It does not handle large XML files well due to number of joins (Papamarkos et al., 2009). It is not concerned about hierarchical structure, nested data, and elements order. Some information may be lost during the conversion (Steegmans et al., 2004; Bourret, 2005; Sun and Wang, 2011; Noaman and Almansour, 2012).
The second approach is native XML databases, which are based on an XML file as the basic unit. This type is an appropriate approach to manage XML databases (Fiebig et al., 2002; Steegmans et al., 2004; Sun and Wang, 2011). It can easily be searched and its content managed because it is all in one place (Bourret, 2005; Sun and Wang, 2011). The native approach supports XML query languages, which improves the retrieval process (Steegmans et al., 2004;
Bourret, 2005; Papamarkos et al., 2009; Sun and Wang, 2011). It is more flexible than XML-enabled databases (Bourret, 2005). The main limitation of this type is that it provides data in only XML format (Bourret, 2005; Abd El-Aziz and Kannan, 2012b). This approach can also be classified into two types according to Bourret (2005) and Papamarkos et al. (2009): text-based and model-based. The text-based approach handles the XML file as text and stores it as a file in the file systems or in relational databases as a CLOB/BLOB. The model-based type handles XML data as objects and the file is represented as a tree, as in DOM (Staken, 2001; Steegmans et al., 2004; Bourret, 2005; Harold, 2005; Sun and Wang, 2011; Noaman and Almansour, 2012). This research will focus only on native XML databases.
20
2.9 Conclusion
XML is a vast topic and not all of its aspects were covered in this limited Chapter. The basic points are included to give sufficient background to understand the research aims and the systems’ platforms. The next Chapter is a literature review of access control systems since the topic of the thesis is access control for XML.
21
3 R ELATED W ORK ON S ECURITY IN XML D ATABASES
3.1 Introduction
XML databases are widely used in many different areas. Like any databases, they are used to store, retrieve, and provide data and information in an organised manner. They are multiuser systems, meaning they can be accessed by millions of users and they can provide a huge amount of data. This large amount of data needs to be controlled, managed, and organised. In addition, this data can be sensitive and personal. All data and especially confidential data need to be protected and saved in a secure environment. Therefore, XML databases should manage data securely to protect user rights and data privacy from loss or misuse (Izadi et al., 2007; Li and Hong, 2008; Gollmann, 2011; Thimma et al., 2013).
This thesis focuses on the access control, which is one of the main techniques to improve security in XML databases.
In this Chapter, the general background of XML security is discussed in Section 3.2. The rest of the Chapter describes the work related to access control.
In Section 3.3, access control concepts are explained and compared. Section 3.4 provides a literature review of several types of access control. In Section 3.5, access control techniques that are currently applied to XML databases are discussed in detail. Labelling technique is also described in Section 3.6 due to its relationship to access control.
22