C hapter 6 S ummary

This chapter shall summarise the main results and contributions of this thesis and comment on future research by listing some open problems.

6 . 1 Main Results

The major contribution of this thesis is the provision of a mathematically well-founded framework that allows the study of different classes of dependencies in the presence of various combinations of data types. Data models are classified according to the data types that they support. The approach is therefore independent of any specific data model. Although it is not claimed that this is the ultimate unifying framework to investigate problems of dependency theory in complex-value data models, the presence of specific types in any data model do motivate the study of those kinds of problems investigated in this thesis. The examples used throughout the thesis illustrate that dependencies naturally occur among complex objects. The extension of the relational theory of various dependency classes to the presence of complex data types allows to specify more real-world constraints and increases therefore the number of application domains. Moreover, a formal foundation for automated reasoning about these constraints is provided and the start to a complex value database design theory made.

It has been demonstrated that the presence of such complex objects like records, lists, sets and multisets leads to the algebraic structure of a Brouwerian algebra. The relational data model is based on the Boolean powerset algebra (P(R) , �, u, n, (·)c, 0, R) . From a purely algebraic point of view, the gain in expressiveness due to the introduction of com plex objects results therefore in the loss of the involutional character of the complement operation. Throughout the thesis it is shown that Brouwerian algebras are sufficiently powerful to generalise and extend well-known results from relational databases.

In the presence of lists it is sufficient to consider the subattributes of a nested attribute in order to define functional and multi-valued dependencies while in the presence of sets or multisets, sets of subattributes need to be introduced. In that sense lists are simpler than sets and multisets. This is not really surprising as the list type possesses both features: the elements of a list are totally ordered and multiple occurrences of the same element are

allowed. Since multisets also allow the occurrence of duplicates, it must be the total order on the elements of a list that guarantees the soundness of the join axiom, i.e., that the values of the projections on subattributes determine the value of the projection on their join. While list and multiset type allow the reasoning about the number of their elements, the set type is only capable of distinguishing between empty and non-empty sets. The fact that elements of a list or multiset may occur more than once is decisive to that regard.

The dependency classes that have been studied add a complementary expressiveness to those that have previously been studied in the literature. MVDs have not been studied at all in the presence of lists. FDs have not been studied at all in the presence of lists and multisets.

Regarding the problem of axiomatising the class of FDs, Theorem 5.23 captures the main result. Once the framework of nested attributes is given, it is straightforward to obtain a generalisation of Armstrong's axioms for the class of FDs in the presence of records and lists. The introduction of set- and multiset-valued attributes calls for a more sophisticated definition of FDs. Left- and right-hand side are now sets of subattributes instead of single subattributes. Besides Armstrong's original axioms two more axioms are required to capture the class of FDs in the presence of sets or multisets. The completeness proof, which still remains constructive, uses rather deep arguments in case of set- and multiset-valued attributes. Theorem 5.23 provides minimal axiomatisations for the class of FDs in the presence of all combinations of records, lists, sets and multisets that at least include records.

Figure 5.7 summarises the upper complexity bounds for the implication problem of FDs in the presence of all previous type combinations. In the context of records and lists, a provably-correct and linear-time algorithm is proposed for computing the closure of a nested attribute with respect to a given set of FDs. The size of the input is defined in the number of join-irreducible subattributes and the number of FDs given. The representation theorem for Brouwerian algebras suggests a different, topological view of FDs. This alternative perspective is even more similar to the framework of relational databases, in the sense that operations are performed on (closed) sets. In the presence of sets or multisets, provably correct and polynomial-time algorithms are proposed for computing the closure of a set of nested attributes with respect to a given set of FDs. The size of the input, however, is now defined as the number of all subattributes and the number of FDs given. This is justified by the fact that a set of subattributes is semantically different from the join of

these subattributes.

Theorem 4.3 shows that MVDs are still equivalent to binary join dependencies, even in the presence of records and lists. Theorem 4. 13, Theorem 4.28 and Theorem 4.31 provide (minimal) axiomatisations for the class of FDs and MVDs, Theorem 4.43 and Theorem 4.44 propose minimal axiomatisations for the class of MVDs. An interesting fact is that the MVD

X

-* Y implies the non-trivial FD

X

--+

Y n ye

which gives the set of inference

rules a distinctive Brouwerian flavour. Recall that MVDs do not imply any non-trivial FDs in the context of the RDM. Further differences to the RDM are given by the minimal sets of inference rules. This is due to the fact that non-maximal join-irreducible subat tributes cannot be represented as the Brouwerian complement of any set of subattributes.

6.2. OPEN PROBLEMS Sebastian Link The provably-correct and polynomial-time Algorithm 4.4.1 computes dependency basis and nested attribute closure for a given subattribute and a given set of FDs and MVDs. It naturally generalises the well-known membership algorithm for FDs and MVDs in rela tional databases. This shows that the implication problem for FDs and MVDs is efficiently decidable in the presence of records and lists.

The applicability of efficiently solving the various implication problems is demonstrated by proposing efficient algorithms for computing non-redundant covers of sets of dependen cies and deciding whether a (set of) nested attribute(s) is a superkey with respect to a given set of dependencies.

Database design theory in terms of FDs is extended to the presence of records and lists.

Formal definitions of design criteria such as the absence of redundancies and the absence of abnormal update behavior are generalised and adapted to this framework. The Nested List Normal Form (NLNF) is proposed as a normal form that syntactically describes well designed nested attributes. NLNF is strictly weaker than a straightforward extension of Boyce-Codd Normal Form. The proposal is semantically justified by formally showing the equivalence to the absence of redundancy, strong insertion anomalies, and strong type-1 and strong type-2 replacement anomalies. Furthermore, strong type-3 replacement anomalies cannot occur for nested attributes in NLNF. In order to verify that an instance of a nested attribute in NLNF satisfies all FDs given, it is sufficient to verify that this instance satisfies all key dependencies and all inevitable FDs. Finally, a provably-correct algorithm is proposed which decomposes an arbitrary nested attribute with respect to a given set of FDs into subattributes that are all in NLNF with respect to the set of projected FDs. This decomposition is lossless in the sense that every instance, satisfying all the FDs given, is the generalised natural join of its projections on the decomposed subattributes. Some problems with the algorithm are pointed out. The algorithm may execute in time exponential in the size of the given nested attribute and set of FDs, the cardinality of the decomposition may be exponential in the size of the given nested attribute. Moreover, deciding whether an arbitrary subattribute is in NLNF with respect to the projected set of FDs is coNP-complete. Finally, some of the FDs that have been specified may be lost during the decomposition process. The results obtained for NLNF as well as the problems with the decomposition algorithm generalise well-known results from the RDM.

6 . 2 Open P roblems

Figure 1 . 1 gives an indication of opportunities for future research. Although an axiomati sation for the class of FDs has been achieved in all combinations of records, lists, sets and multisets, the expressiveness for the class of FDs can be increased. We have seen examples which suggest to study the interaction of FDs defined on embedded nested attributes. Consider for instance the nested attribute

L[N(A, B, C)]

together with its embedded attribute

N(A, B, C) .

Suppose the functional dependency

N(A)

---+

N(B)

has been specified on

N(A, B, C)

and

satisfies the FD L[N(A)] --+ L[N(B)]. Considering embedded attributes of a nested at

tribute therefore leads to the soundness of further inference rules which need to be studied in order to achieve an axiomatisation for this new class of FDs. The expressiveness can be even more increased if not only embedded attributes, but also combinations of embedded attributes are studied. This was suggested by the example of null-extended FDs and the FDs previously studied in XML. In the same spirit, one may try to increase the expres siveness of MVDs by studying the interaction between MVDs on embedded attributes.

Another approach to increasing the expressiveness is to

extend the number of subat

tributes

for a fixed nested attribute. This can be done by relating the information content of different data types to one another. The list-valued attribute L [A] for instance may have the subattribute L(A) which itself has the subattribute L{ A } . In the first step we drop the information on the order of the elements, in the second step we drop the possibility of multiple occurrences of the same element. It is then interesting to study to which extent this approach still results in a sufficiently powerful structure in which dependencies can be investigated.

A more general treatment of data dependencies in complex-value databases may have a successful turnout as in the RDM [109, 264, 278] . The problem is that of finding a suitable logic (if there is one) such that dependencies can be associated with formulae in that logic. The first-order theories of lists, sets and multisets established in [99] seem promising.

What changes if

lists, sets or multisets are allowed to have an infinite number of ele

ments?

It is desirable to improve the running time of Algorithm 4.4.1 for deciding the impli cation of FDs and MVDs. Substantial research on that subject has again been done for relational databases and the papers [98, 1 18, 135, 152, 173, 223, 239, 277] may give some more information.

It seems interesting to study

multi-valued dependencies in the presence of the set and

multiset constructor.

For relational databases, the fiat relation

r

satisfies the MVD A --*

B

if and only if the relation r* , that results from a NEST operation over attribute

B,

satisfies the FD A --+ (B)* , see [113] . This observation will have a direct impact on the interaction of FDs and MVDs on different combinations of embedded attributes.

The minimality results in thesis have been achieved with respect to Definition 3. 7. As mentioned before the notion of minimality can be improved. Strictly speaking,

minimality

would also refer to the fact that the constraints for every inference rule cannot be weakened without losing completeness. This stronger notion of minimality should be studied in the future. Moreover, it would be interesting to find other (all?) minimal sets of inference rules for the various axiomatisations.

Although the context-dependent Brouwerian complement rule could be replaced by the much weaker context-dependent N-axiom, it is still interesting to ask how mixed meet rule and auto-complement rule, respectively, can be weakened.

Another interesting line of research is

data mining.

It would be interesting to develop algorithms that determine all FDs and MVDs on a nested attribute N that are satisfied by a particular instance r

� dom(N) .

For relational databases, [167, 189, 1 98, 199] and [242, 300] have developed algorithms for FDs and MVDs, respectively.

6.2. OPEN PROBLEMS Sebastian Link

There is a polynomial-time algorithm for obtaining a lossless BCNF decomposition for relation schemata [270] . The idea from that paper may give hints how to obtain a

In document Dependencies in complex value databases : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems at Massey University (Page 183-187)

X

X