Complex Structured Values - 3 – Object-Oriented Paradigm

3 – Object-Oriented Paradigm – Object Basics

3.2.4 Complex Structured Values

The data model of Abiteboul and Beeri in [ABITE93A] addressed the set and tuple sequence non-commutatively by advocating for the unrestricted use in the order of these type constructors. Their only restriction on the type constructors' sequence is that the last constructor must be a set. The authors named these structures Complex values and in table 3.3 one finds their data and type definition. For an example see figure 3.2. The resultant schemas are acyclic – no part of the schema is defined by itself. Finally, it is easily shown that complex values are generalisations of NF2 and relational values. Also one can map a complex type into a NF2 type by converting a string of type constructors to a sequence of set and tuple pairs – see figure 3.3.

OOP – Object Basics - Page [ 43]

3.2.5 What gains?

What are the advantages attained through the adoption of the nested values and complex values? Firstly we have untied ourselves from the restriction of the relational model’s first normal form. Therefore some data access need not require explicit joining of a number of

relations to access it. Secondly, although not exemplified here, there are a number of reported algebraic and calculus based languages that manipulate these complex structures. Unfortunately these complex structures aren’t friendly to instance sharing; this is addressed with the interleaving of object identifiers and complex structures in objects.

Table 3.3: Complex Structure Values Rule Generating Part

The rules that generate the set of complex structure values,

val

cs, on

D

are: 1) All values of

D

are values of

val

cs. Therefore all atomic values are part of

val

cs;

2) If v1,  , vn are n values and a1,  , an are distinct sequence of attribute names from

A

then [a1:v1,  , an:vn] is a tuple type complex structure value,

val

cs;

3) If v1,  , vn are n distinct values then {v1,  , vn} is a set complex structure value,

val

cs. (Note the implicit prohibition of a heterogeneous set; this is common in databases).

Typing Part

The world of type is: t ::=

D

| [ a1:t,  , ak:t ] | { t }

Type t’s interpretation,  t , follows:

1)    = 

2) 

D

 =

D

;

3)  [a1:t1,  , ak:tk ]  = { [a1:v1,  , ak:vk ] | vi   ti , i = 1  k }; and

OOP – Object Basics - Page [ 44]

Figure 3.2: Complex Values Relation a) Structure, b) Instance and c) Type Tree a) (note attributes in angle brackets (e.g. members) take a set of values.)

TEAM ref desc manager < members >

ref name ref name < skills > < prev_teams > skills < teams >

members_ref

TEAM ref desc manager < members >

10 R&D ref name ref name < skills > < prev_teams > 101 JADE 101 JADE skills < teams >

DOC members_ref PLAN 101 DESIGN 105 members_ref 101 109

ref name < skills > < prev_teams > 103 JAKE skills < teams >

DESIGN members_ref ANALY 103

INFO.RET 106

ref desc manager < members >

20 SND ref name ref name < skills > < prev_teams > 110 JAM 110 JAM skills < teams >

DOC members_ref PLAN 110 PRESENT 119 145 members_ref 110

ref name < skills > < prev_teams > 113 JOY skills < teams >

PROG members_ref DESIGN 133 CASE 113 c) • × TEAM char ref char desc char ref × mana- ger char name • × members char ref char name • skills • previous teams char skill • team char member_ref TEAM • × ref char Complex structure Set type constructor Tuple type constructor Attribute name Basic domain

OOP – Object Basics - Page [ 45]

3.3 Object Identity

Entities and instances that are of interest to an information system must be identifiable and reachable, and consequently identification is part of many conceptual data models. In the database culture there is a strong notion of keys for identification of instances.

In the object-oriented paradigm a key is called an identifier and each object has one. An identifier is a handle through which other objects refer to it; the seminal papers on this theme are [KHOSH86] and [ABITE89B]. Typically this identifier is 1) unique, it distinguishes

its holder from every other object within a collection, 2) immutable throughout the object’s life span, 3) an internal property of an object and therefore is not usually seen or useable by Figure 3.3: Nested Relational Structure (2NF) Type Equivalent to the Complex Value

Structure type of Figure 3.2 (c). The darkened part of the figure represent the changes introduced to match the structures.

• ×

TEAM char ref char desc char ref

×

mana- ger char name

• ×

members char ref char name

•

skills

•

previous teams char skill

×

team_rel char member_rel

×

skills_rel

•

teams

×

member_rel

•

manager_rel

OOP – Object Basics - Page [ 46]

end users, and 4) its value has nothing to do with its object properties. Furthermore in object databases Atkinson and Morrison [ATKIN95] assert that other than identification

issues one also needs to correctly map application domain instances to corresponding repository objects.

When an object is created, called instantiation, an identifier is created for it. In any object collection there is a huge quantity of un-instantiated objects; for example integers being values of a basic domain. At this point the subtle issue of what is a value and what is an object is reinforced with the help of object-identifiers (as argument already introduced earlier in section 3.2). The general consensus documented by Beeri [BERRI90A] is to make the following distinctions: elements of the basic domains are “universal” abstractions and ever present, while elements of an information system’s entities are “local” abstractions, created on ad hoc basis and have a lifetime. A technical refinement to this distinction is that; the former carry their own information and therefore identify themselves; while the latter carry an object’s information content that includes relationships to values and to other objects represented by identifiers.

The paradigm offers object access methods based on navigation but must not exclude access based on values. This contrasts sharply with the access methods found in most programming languages and in relational databases. The following two examples explain common pitfalls in either access method:

1) In programming languages, a memory address pointer (that is identification by

addressing) facilitates access to a variable's state value and the address is dependent on the

executing environment rather than on the object it identifies. A problem may arise when having a number of addresses depicting "different" objects when they are identical. For example, a certain employee satisfies a query (return an employee name (a non-distinctive property) who earns more than £35,000) and he is bound to the variable WELL_PAID_EMP,

contemporaneously the same employee satisfies another query (return an employee name with a managerial job status and who is working in London) where he is bound to the variable LONDON_MANAGER. These two variables are accessible through different paths (for

example by pointer re-direction) and for the programming language to support the equality semantics it needs specialised constructs that can ascertain the equality of variables

OOP – Object Basics - Page [ 47] WELL_PAID_EMP and LONDON_MANAGER through some pointer chasing mechanism (e.g. dereferencing).

2) The relational model introduces the notion of a value-based key. Furthermore the relational key is unique within the scope of one table. A marked confusion arises with value based keys because of the differing concepts of value and identity. A serious drawback of this double usage is that the values of the key attributes cannot change without affecting the referential integrity of the database. In some of the world’s societies, for example, when a female marries she has to forsake her family’s surname for that of her spouse. If the key attribute set contains the family’s surname then a change in that value causes an introduction of a second entity (with the new surname) for the same person. To address this in value-based keys, it is a common practice to introduce an “artificial” attribute (e.g. employee number) that has no semantic meaning. Nonetheless this scheme is an effective one, and ironically takes the role of a value-based pointer due to the attributes lack of semantic meaning.

In some pre-relational data models the use of pointers for linking one record to another resembles the paradigm’s identifiers. This resemblance led to numerous, and rather loud criticism of identifiers in early object-oriented data models in that they re-introduce pointer chasing present in the traditional models. Furthermore Ullman in [ULLMA87] categorically

states that “object-identity does not mesh well with declarativeness”. Insisting on the conceptual nature of the object-identifiers and leaving the physical pragmatics as an implementation detail avoids this polemic.

The identification property in an object-oriented data model offers the implicit support of object equality and sharing. Through its identifier an object is identifiable from any other object and furthermore the identifier is independent of its values, location (in main or secondary store), and addressability. By assigning an identifier to an object's state, sharing of objects throughout the collection is possible and is independent from the assigned values. This sharing introduces the possibility of building cyclic graphs in an object collection. Specifically a number of objects can share an object (or refer to it) without the latter being replicated.

OOP – Object Basics - Page [ 48]

Three binary predicates, IDENTITY, SHALLOW_EQUALITY and DEEP_EQUALITY, are useful to test for

the equality of two objects and their level of sharing – this takes care of the identification by addressing issue previously presented. The predicate IDENTICAL evaluates to true when

the two objects have the same identifier. SHALLOW_EQUALITY imply that the two objects have

equal values and objects for their properties. DEEP_EQUALITY is satisfied if each object’s state

has equal values when all references are recursively “folded out” by the values they represent. Two identical objects are also shallow equal. Two shallow equal objects are also deep equal. But generally two deep equal objects are not shallow equal and two shallow equal objects are not identical.

A formal treatment of deep equality in the presence of cyclic graphs (and consequently the possibility of infinite trees) is found in Abiteboul’s paper [ABITE95B]. Different copying

against sharing operators are also required, such as shallow and deep copying, as first found in Smalltalk-80 [GOLDB88] and in Common Lisp Object System (CLOS) [BOBRO86].

Nonetheless it is known from [BEERI99] that, in general, identifiers alone in cyclic structure

are not sufficient to distinguish two objects.

In document An object-oriented data and query model (Page 58-64)