An open problem is to
extend the NLNF proposal to the class of MVDs, and the class
of FDs and MVDs,
to semantically justify the proposal in terms of absence of redundancies and abnormal update behavior, and to generalise the decomposition approach. Relevant papers that address these problems in the context of relational databases are [280, 281, 283, 289, 290] and [133] . The key to solving these problems is an appropriate definition of inevitable MVDs. LetN
be a nested attribute and E a set of FDs and MVDs onN.
The conditions when an FD from E+ is in Einev are exactly as before. An MVD X ---* Y E E+ is in Einev if and only if Y � X or Y ENMaxB(N)
or X U Y= N.
The set Ei�ev of inevitable FDs and MVDs onN
with respect to E is then the closure of Einev under the complete set of inference rules for FDs and MVDs in the presence of lists from Theorem 4.28. It follows for an MVD X ---* Y E E+ that X ---* Y E Ei�ev if and only if ycc � X or ye � Xholds. A nested attribute
N
is said to be in Nested List Normal Form with respect to the set E of FDs and MVDs onN
if and only if every X ---* Y E E* is inevitable onN
with respect to E or X is a superkey forN
with respect toE.
It is conjectured that the results on the 4NF in relational databases from [280] carry over to the Nested List Normal Form. As it was the case with BCNF and NLNF, a simple extension of 4NF implies NLNF, but not vice versa. Along these investigations it might prove useful to generalise such notions asreduced MVDs
andminimal covers
of sets of MVDs [21 6] ,pure set of FDs and MVDs
[154] ,envelope set
[301, 302] andconflict-free MVDs
[247] from relational databases to the context of complex object types.A further desirable goal is to propose normal forms for nested attributes in the presence of more type constructors. The axiomatisations of FDs in the context of records, lists, sets and multisets suggest to continue along those lines. The decisive notion in a
Complex-value
Normal Form
proposal may be that of a unit of a nested attribute, taking over the role of maximal basis attributes in the proposal of the NLNF.Normalisation is nothing but an
optimisation.
Considering the example of the prime factorisation Factor(Integer,Prime[Number] ,Exponent [Number] ) , one may ask whether the list constructor is really appropriate here. Instead of storing the list of prime factors and the list of exponents, one may store the set of prime factor/exponent pairs. It would be interesting to see whether the specification of inevitable FDs such asFactor(Prime[.A] ) --+ Factor(Exponent[.A] ) and Factor(Exponent[.A] --+ Factor(Prime[.A])
suggest that the data type is inappropriate. A further observation is the following. Consider the list-valued attribute L[M] and suppose the FD
A
--+ L[-A] has been specified. It saysinformally that all tuples coincide on L[.A] , i.e. , the length of the list L[M] is constant, say k . In this case, it is certainly more appropriate to use the record-valued attribute L(M1 , . . .
, Mk)
withdo
m(M
i) =d
om(M
) for i = 1 , . .. , k .The notions of redundancy and update anomalies that were used in this thesis are not the only notions that appear in the literature. Vincent has introduced the concept of
FDs and MVDs on
R,
a relationr
overR
and a tuplet
inr,
the data value occurrence t [A] is redundant with respect toE
iff for every replacement of t [A] by a value a' such that t[A] =I= a' and resulting in a new relationr',
then �r'E. (R, E)
is defined to be in redundancy free normal form if and only if there does not exist anr
overR
with FrE
which contains a data value occurrence that is redundant with respect toE.
Vincent shows that(R, E)
is in redundancy free normal form if and only ifR
is in 4NF with respect toE.
Update anomalies, as defined in this thesis, are called key-based update anomalies in [280] . So-calledfact-based update anomalies
are also introduced in [280] , and the relationship between their absence and BCNF and 4NF are examined. The extensions of these notions to the framework of nested attributes and their relationship to the Nested List Normal Form are further directions of future research. Arenas and Libkin [13] use techniques frominformation theory
to define a measure of information content of elements in a database with respect to a set of constraints. This provides a set of tools for testing when a nor mal form proposal corresponds to a good design. The results give information-theoretic justifications for normal forms such as BCNF, 4NF, project-join normal form (P J /NF) , fifth normal form (5NF) , domain-key normal form (DK/NF) and the XML normal form XNF proposed in [14] , as well as information-theoretic criteria for j ustifying normalisation algorithms. It would be interesting to test the measure with respect to the Nested List Normal Form proposal, and later on the Complex-value Normal Form proposal.Normalisation is a well-studied area in the context of relational databases. Besides BCNF and 4NF, there are many other normal form proposals. An extension of third normal form (3NF) [70, 304] , P J /NF [104] , 5NF [282] and DK/NF [105] to nested attributes seem desirable.
The
(disjoint) union type
is well-worth investigating as it can be used to represent alternatives. It is very important for the higher-order entity-relationship model [265] and XML [53] . In order to give a small illustration of the difficulty of the union type we look at the following example. Figure 6.1 shows the structure of the union-valued attribute L(A EB B ) . L (A EB B) ---- ---- L (A EB AB) L(AA EB B) ---- ---- L(AA EB AB) I A6.2. OPEN PROBLEMS Sebastian Link
Note that the subattribute
L(AAEBAs)
indicates from which domain a value stems. If the projection onL(AA EB As)
isokA,
then the value comes fromdom(A) ,
and if the projectionis
oks,
then the value comes fromdom(B).
Suppose that one needs to find two differentelements
t1 , t2
Edom(L (A EB B))
with1r{t (t1 )
=1r{:t (t2)
iffW
::SL(AA
EBAs).
Consequently, both of the values must beokA
or both of the values must beoks
onL(AA EB As) .
That means we have eithert1, t2
Edom(A)
ort1 , t2
Edom(B) .
However, in this case we also have1r£(.xAEBS) (t1)
=1r£(.xAEBS) (t2)
or1r£(AEB.Xs) (tl )
=1r£(AEB.Xs) (t2) ·
That shows that one cannotfind any two elements of
dom(L(A EB B) )
with this property, which indicates that there is an FDwhere the right-hand side is a disjunction of subattributes. Such FDs are relevant, if an axiomatisation of FDs in the presence of unions is pursued.
Another challenge is the inclusion of the
reference type
into the types of interest. It is particularly important for object-oriented databases and XML. A possible approach to investigate the reference type is to represent nested attributes as labelled trees where the labels of a non-leaf node are used to define embedded attributes and leaf nodes are either null or fiat attributes or referencing labels to other nodes. This leads to rational trees which are infinite, but in which the number of different subtrees is still finite.More classes of relational dependencies
will be the subject of future studies in the pres ence of various combinations of data types. The book [264] identifies more than 90 different constraint classes for relational databases. The class of join dependencies is more general than the class of MVDs. Interestingly, there does not exist a finite Hilbert-style axioma tisation for this class [229] , however, a sound and complete set of Gentzen-style inference rules is proposed in [39] . A different and important class are inclusion dependencies which are not uni-relational, i.e. , refer to more than one relation schema.Finally, the proposed concepts and algorithms should be
implemented.
The research report [252] contains a C++ implementation for computing dependency basis and nested attribute closure of a given subattribute with respect to a given set of functional and multi-valued dependencies in the presence of records and lists.1 . Abiteboul, S . , P. Buneman and D. Suciu, "Data on the Web: From Relations to Semistructured Data and XML," Morgan Kaufmann Publishers, 2000.
2. Abiteboul, S. , S. Cluet, T. Milo, P. Mogilevsky, J. Simeon and S. Zohar,
Tools for
data translation and integration,
Data Engineering Bulletin 22 (1999), pp. 3-8.3. Abiteboul, S., P. C. Fischer and H.-J. Schek, editors, "Nested Relations and Complex Objects, Papers from the Workshop "Theory and Applications of Nested Relations and Complex Objects" , Darmstadt, Germany, April 6-8, 1987," Number 361 in Lec ture Notes in Computer Science, Springer, 1989.
4. Abiteboul, S. and R. Hull,
IFO: A formal semantic database model,
Transactions on Database Systems (TODS) 12 (1987), pp. 525-565.5. Abiteboul, S., R. Hull and V. Vianu, "Foundations of Databases," Addison-Wesley, 1 995.
6. Abiteboul, S. and P. C. Kanellakis,
Object identity as a query language primitive,
in:Proceedings of the International Conference on Management of Data (SIGMOD
), ACM, 1989, pp. 1 59-1 73.7. Aho, A. V., C. Beeri and J. D. Ullman,
The theory of joins in relational databases,
Transactions on Database Systems (TODS) 4 (1979), pp. 297-314.8. Amos, M., G . Paun, G. Rozenberg and A. Salomaa,
Topics in the theory of DNA
computing,
Theoretical Computer Science 287 (2002) , pp. 3-38.9. Anderson, I., "Combinatorics of finite sets," Oxford Science Publications, The Claren don Press Oxford University Press, New York, 1987.
10. Arapis, C.,