polynomial-time algorithm for achieving a lossless NLNF decomposition.

An open problem is to

extend the NLNF proposal to the class of MVDs, and the class

of FDs and MVDs,

to semantically justify the proposal in terms of absence of redundancies and abnormal update behavior, and to generalise the decomposition approach. Relevant papers that address these problems in the context of relational databases are [280, 281, 283, 289, 290] and [133] . The key to solving these problems is an appropriate definition of inevitable MVDs. Let

N

be a nested attribute and E a set of FDs and MVDs on

N.

The conditions when an FD from E+ is in Einev are exactly as before. An MVD X ---* Y E E+ is in Einev if and only if Y � X or Y E

NMaxB(N)

or X U Y

= N.

The set Ei�ev of inevitable FDs and MVDs on

N

with respect to E is then the closure of Einev under the complete set of inference rules for FDs and MVDs in the presence of lists from Theorem 4.28. It follows for an MVD X ---* Y E E+ that X ---* Y E Ei�ev if and only if ycc � X or ye � X

holds. A nested attribute

N

is said to be in Nested List Normal Form with respect to the set E of FDs and MVDs on

N

if and only if every X ---* Y E E* is inevitable on

N

with respect to E or X is a superkey for

N

with respect to

E.

It is conjectured that the results on the 4NF in relational databases from [280] carry over to the Nested List Normal Form. As it was the case with BCNF and NLNF, a simple extension of 4NF implies NLNF, but not vice versa. Along these investigations it might prove useful to generalise such notions as

reduced MVDs

and

minimal covers

of sets of MVDs [21 6] ,

pure set of FDs and MVDs

[154] ,

envelope set

[301, 302] and

conflict-free MVDs

[247] from relational databases to the context of complex object types.

A further desirable goal is to propose normal forms for nested attributes in the presence of more type constructors. The axiomatisations of FDs in the context of records, lists, sets and multisets suggest to continue along those lines. The decisive notion in a

Complex-value

Normal Form

proposal may be that of a unit of a nested attribute, taking over the role of maximal basis attributes in the proposal of the NLNF.

Normalisation is nothing but an

optimisation.

Considering the example of the prime factorisation Factor(Integer,Prime[Number] ,Exponent [Number] ) , one may ask whether the list constructor is really appropriate here. Instead of storing the list of prime factors and the list of exponents, one may store the set of prime factor/exponent pairs. It would be interesting to see whether the specification of inevitable FDs such as

Factor(Prime[.A] ) --+ Factor(Exponent[.A] ) and Factor(Exponent[.A] --+ Factor(Prime[.A])

suggest that the data type is inappropriate. A further observation is the following. Consider the list-valued attribute L[M] and suppose the FD

A

--+ L[-A] has been specified. It says

informally that all tuples coincide on L[.A] , i.e. , the length of the list L[M] is constant, say k . In this case, it is certainly more appropriate to use the record-valued attribute L(M1 , . . .

, Mk)

with

do

M

i) =

d

om(

M

) for i = 1 , . .. , k .

The notions of redundancy and update anomalies that were used in this thesis are not the only notions that appear in the literature. Vincent has introduced the concept of

FDs and MVDs on

R,

a relation

r

over

R

and a tuple

t

r,

the data value occurrence t [A] is redundant with respect to

E

iff for every replacement of t [A] by a value a' such that t[A] =I= a' and resulting in a new relation

r',

then �r'

E. (R, E)

is defined to be in redundancy free normal form if and only if there does not exist an

r

over

R

with Fr

E

which contains a data value occurrence that is redundant with respect to

E.

Vincent shows that

(R, E)

is in redundancy free normal form if and only if

R

is in 4NF with respect to

E.

Update anomalies, as defined in this thesis, are called key-based update anomalies in [280] . So-called

fact-based update anomalies

are also introduced in [280] , and the relationship between their absence and BCNF and 4NF are examined. The extensions of these notions to the framework of nested attributes and their relationship to the Nested List Normal Form are further directions of future research. Arenas and Libkin [13] use techniques from

information theory

to define a measure of information content of elements in a database with respect to a set of constraints. This provides a set of tools for testing when a nor mal form proposal corresponds to a good design. The results give information-theoretic justifications for normal forms such as BCNF, 4NF, project-join normal form (P J /NF) , fifth normal form (5NF) , domain-key normal form (DK/NF) and the XML normal form XNF proposed in [14] , as well as information-theoretic criteria for j ustifying normalisation algorithms. It would be interesting to test the measure with respect to the Nested List Normal Form proposal, and later on the Complex-value Normal Form proposal.

Normalisation is a well-studied area in the context of relational databases. Besides BCNF and 4NF, there are many other normal form proposals. An extension of third normal form (3NF) [70, 304] , P J /NF [104] , 5NF [282] and DK/NF [105] to nested attributes seem desirable.

The

(disjoint) union type

is well-worth investigating as it can be used to represent alternatives. It is very important for the higher-order entity-relationship model [265] and XML [53] . In order to give a small illustration of the difficulty of the union type we look at the following example. Figure 6.1 shows the structure of the union-valued attribute L(A EB B ) . L (A EB B) ---- ---- L (A EB AB) L(AA EB B) ---- ---- L(AA EB AB) I A

6.2. OPEN PROBLEMS Sebastian Link

Note that the subattribute

L(AAEBAs)

indicates from which domain a value stems. If the projection on

L(AA EB As)

okA,

then the value comes from

dom(A) ,

and if the projection

oks,

then the value comes from

dom(B).

Suppose that one needs to find two different

elements

t1 , t2

dom(L (A EB B))

with

1r{t (t1 )

1r{:t (t2)

_iff

W

::S

L(AA

As).

Consequently, both of the values must be

okA

or both of the values must be

oks

L(AA EB As) .

That means we have either

t1, t2

_dom(A)

t1 , t2

_{dom(B) .}

However, in this case we also have

1r£(.xAEBS) (t1)

1r£(.xAEBS) (t2)

1r£(AEB.Xs) (tl )

1r£(AEB.Xs) (t2) ·

That shows that one cannot

find any two elements of

dom(L(A EB B) )

with this property, which indicates that there is an FD

where the right-hand side is a disjunction of subattributes. Such FDs are relevant, if an axiomatisation of FDs in the presence of unions is pursued.

Another challenge is the inclusion of the

reference type

into the types of interest. It is particularly important for object-oriented databases and XML. A possible approach to investigate the reference type is to represent nested attributes as labelled trees where the labels of a non-leaf node are used to define embedded attributes and leaf nodes are either null or fiat attributes or referencing labels to other nodes. This leads to rational trees which are infinite, but in which the number of different subtrees is still finite.

More classes of relational dependencies

will be the subject of future studies in the pres ence of various combinations of data types. The book [264] identifies more than 90 different constraint classes for relational databases. The class of join dependencies is more general than the class of MVDs. Interestingly, there does not exist a finite Hilbert-style axioma tisation for this class [229] , however, a sound and complete set of Gentzen-style inference rules is proposed in [39] . A different and important class are inclusion dependencies which are not uni-relational, i.e. , refer to more than one relation schema.

Finally, the proposed concepts and algorithms should be

implemented.

The research report [252] contains a C++ implementation for computing dependency basis and nested attribute closure of a given subattribute with respect to a given set of functional and multi-valued dependencies in the presence of records and lists.

1 . Abiteboul, S . , P. Buneman and D. Suciu, "Data on the Web: From Relations to Semistructured Data and XML," Morgan Kaufmann Publishers, 2000.

2. Abiteboul, S. , S. Cluet, T. Milo, P. Mogilevsky, J. Simeon and S. Zohar,

Tools for

data translation and integration,

Data Engineering Bulletin 22 (1999), pp. 3-8.

3. Abiteboul, S., P. C. Fischer and H.-J. Schek, editors, "Nested Relations and Complex Objects, Papers from the Workshop "Theory and Applications of Nested Relations and Complex Objects" , Darmstadt, Germany, April 6-8, 1987," Number 361 in Lec ture Notes in Computer Science, Springer, 1989.

4. Abiteboul, S. and R. Hull,

IFO: A formal semantic database model,

Transactions on Database Systems (TODS) 12 (1987), pp. 525-565.

5. Abiteboul, S., R. Hull and V. Vianu, "Foundations of Databases," Addison-Wesley, 1 995.

6. Abiteboul, S. and P. C. Kanellakis,

Object identity as a query language primitive,

in:

Proceedings of the International Conference on Management of Data (SIGMOD

), ACM, 1989, pp. 1 59-1 73.

7. Aho, A. V., C. Beeri and J. D. Ullman,

The theory of joins in relational databases,

Transactions on Database Systems (TODS) 4 (1979), pp. 297-314.

8. Amos, M., G . Paun, G. Rozenberg and A. Salomaa,

Topics in the theory of DNA

computing,

Theoretical Computer Science 287 (2002) , pp. 3-38.

9. Anderson, I., "Combinatorics of finite sets," Oxford Science Publications, The Claren don Press Oxford University Press, New York, 1987.

10. Arapis, C.,

Temporal specifications of object behavior,

in:

Proceedings of the 3rd Sym

posium on Mathematical Fundamentals of Database and Knowledge Base Systems

(MFDBS),

number 495 in Lecture Notes in Computer Science (1991), pp. 308-324. 1 1 . Arenas, M. and L. Libkin,

On verifying consistency of XML specifications,

in:

Prin

In document Dependencies in complex value databases : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems at Massey University (Page 187-190)