Selecting a Database System
4.1 Performance Evaluation of Database Systems
4.1.1 Existing Abstract Database Benchmarks
Three abstract database benchmarks have been dened. The rst one is the so called Sim- ple Benchmark [DHS+91, DEL92]. The second one is the Hypermodel Benchmark dened
in [ABM+90]. The recently suggested OO7 Benchmark [CDN93] has gained substantial ac-
ceptance for comparison of object database systems. We outline the denitions of each of these benchmarks in the next subsection to be able to explain why they cannot be used for evaluating database performance of software engineering applications.
To simplify the comparison, we use a common notation for describing the conceptual schemas of the three dierent benchmarks. We select the entity relationship notation suggested by [BPR88] since it can express inheritance and ordered relationships which are used in the benchmark. In this notation, a rectangle represents an entity2. A solid arrow between entities
represents an aggregation relationship. Its semantics are that no object can exist without being related in an aggregation to an existing object, i.e. the aggregation relationship models the part-of/belongs-to relationship. Dotted lines model reference relationships. At the end of arrows or lines, black circles represent a many-end of a relationship, and white circles repre- sent a one-end. A circle placed on a line declares the relationship to be ordered. Attributes of entities are described within the rectangle, whereas attributes of relationships are shown in a circle connected to the resp. arrow. A triangle on a line between entities indicates an inheritance relationship meaning that a sub-entity inherits all attributes from its super-entity and can participate in the same relationships as the super-entity.
2In the following,
4.1.1.1 The Simple Benchmark
The Simple Benchmark was dened in order to measure the performance of elementary OMS operations. The conceptual schema for this benchmark (as well as for the others in this thesis) is shown as an extended Entity-Relationship Model (EER-model) in Figure 4.1.
longfield BIG SMALL str10 str80 str160 DIR DIRREL MNREL str10 str80 str160
Figure 4.1: EER-Diagram for the Simple Benchmark
Objects of type DIR model a composite object, which is composed of a number of other ob-
jects. DIRRELmodels the composition relationship. Component objects can have dierent sizes.
Small objects represent, for instance, syntax graph nodes that have three small attributes. Big objects have an additional long-eld attribute that can be used to store long comments, for in- stance, or source code or even compiled object code. Apart from the composition relationship, the benchmark considers reference relationships and denes a relationship MNRELthat models
references between small and big objects. MNREL, in addition denes three small attributes.
The operations of the Simple Benchmark include creation and deletion of small and big objects, as well as creation and deletion of relationships between them. Furthermore operations on attributes of entities and relationships such as storing and retrieving strings of lengths 10, 80, and 160 bytes or long-elds of the lengths 10 and 128 KBytes are dened.
The benchmark requires that the operations access and modify a non-empty database. This avoids the possibility that objects accessed by the operations will reside only in database caches (i.e. in main memory). A realistic size of an initial database, created before performance measurements start, guarantees that operations have to access secondary storage (as usually happens in real applications). The Simple Benchmark requires the initial database to contain 3,000 objects of type SMALLand 400 objects of type BIG.
4.1.1.2 The Hypermodel Benchmark
The Hypermodel Benchmark diers from the Simple Benchmark in using more complex data structures and operations. The benchmark is a development for comparing databases with respect to hypertext applications.
The conceptual schema of the Hypermodel Benchmark is shown in Figure 4.2. It includes three dierent entities, namely Node, TextNode, and FormNode. TextNode and FormNode are
uniqueId ten hundred thousand million text TextNode bitmap width height FormNode parent/children refTo/refFrom partOf/parts Node offsetFrom offsetTo
Figure 4.2: Conceptual Schema of the Hypermodel Benchmark
TextNodes represent an unstructured text and FormNodes represent a bitmap. Three rela-
tionships are dened, namely the parent/children relationship, the partOf/parts relation-
ship and the refTo/refFromrelationship. The parent/children relationship is of cardinality
1:n, ordered and denes the aggregation structure between nodes. The m:n partOf/parts
relationship models the section/subsection structure of a hypertext and the m:n relationship
refTo/refFrommodels arbitrary hypertext links. Each Node has a unique identier and four
attributes (ten, hundred, thousand, and million) for storing randomly selected numbers of
a particular range. ATextNodecontains an additional textattribute and aFormNodehas three
attributes. Widthandheightstore the dimensions of a picture and a long-eld attributebitmap
stores the picture itself. The refTo/refFromrelationship contains two attributes, offsetFrom
and offsetTo, that store relative coordinates of hypertext links withintextattributes.
The operations of the Hypermodel Benchmark include mainly retrieval operations such as queries for attributes with particular names or attribute values in particular ranges, lookups for node sets connected by the above mentioned relationship in normal or reverse order, and nally, operations performing a sequential scan and a transitive closure traversal following dierent relationships. The only update operations substitute words in the text attribute of
a text-node and inverts a sub-rectangle within the bitmap attribute of a randomly selected
FormNode. A detailed description of these operations is of no concern for this thesis.
The initial database contains a balanced tree of nodes andfather/childrenrelationships. The
size of the initial database depends of the height of this tree. Each inner node is of type
Node and has exactly ve children. Each leaf node is either of type FormNode or TextNode.
The partOf/partsrelationship is created for each node by selecting one inner node of level k
and relating it to ve random nodes at level k+1. TherefTo/refFromrelationship is created
for each node to another random node. Nodes are numbered and the numbers are stored in the uniqueId attribute. Ten, hundred, thousand, and million are initialised by random
numbers selected from the corresponding interval. Each attribute of objects of type TextNode
is initialised with a text containing up to 100 words, each word having up to ten characters. A formnode consists of a random square bitmap with an edge length of up to 400 pixels.
4.1.1.3 The OO7 Benchmark
The OO7 Benchmark is intended as a yardstick for comparing the performances of ODBSs when they are used in complex engineering applications such as CAD, CAM or CASE. There- fore, the benchmark is more complex than the Simple or the Hypermodel Benchmark.
ComplAssembl id buildDate type Assembly BaseAssembly Document id title text id buildDate type CompositePart AtomicPart id buildDate x y docId type ToDoc ToAtomicParts length ToCompositePart type relatedParts parent/ children ToManual
Figure 4.3: Conceptual Schema of the OO7 Benchmark
The conceptual schema of the OO7 Benchmark, displayed in Figure 4.3, includes six entities. All the entities have an attributeidmodelling a unique object identier. EntityCompositeParts
models design primitives of the application area such as a register cell in chip design or a procedure in a programming language. A composite part has a buildDate attribute storing
information about object creation time and a string attribute for storing type information
of composite parts. Each composite part has a reference relationship to one documentation object, i.e. an instance of entity Document. Documents have a string attribute for storing a title and a long eld attribute for storing text. A composite part is composed of a set of
atomic parts. These parts model statements in procedures or gates in register cells. They have a number of small attributes that store graphical coordinates, type information, creation time and the like. In addition to the aggregation relationship, atomic parts can have reference relationships with each other. The set of all composite parts model a library of which complex designs or programs can be composed. The entityAssemblywith its sub-entities ComplAssembl
and BaseAssembly, as well as the aggregation relationships between them, determine this com-
position. Complex assemblies are composed by theparent/childrenrelationship from a set of
other assemblies, i.e. either complex or base assemblies. A base assembly, in turn, is composed of a set of composite parts.
The benchmark denes traversal and query operations. The traversal operations navigate through the composition hierarchy. Some of them update attributes during traversal. Updates that modify the composition hierarchy are not included in any of the operations. The query operations retrieve the set of all atomic parts or subsets of those that full particular conditions on thebuildDateattribute.
The OO7 Benchmark denes three initial database states: small, medium and large. All of them contain balanced ternary trees of assembly parts of height seven, i.e. 1093 assembly parts. The databases dier in that a small database contains 10,000 atomic parts, whereas medium and large databases contain 100,000 atomic parts. For a large database, ten assembly trees are established, whereas small and medium databases only contain one tree.