COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

(1)

COMP 378 Database Systems

Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

The entity-relationship (E-R) model is a a data model in which information stored in the database is viewed as sets of entities and sets of relationships among entities. The primary use of the E-R model is as a design tool for relational databases.

1 The Design Process

One (relatively) typical process:

1. the designers meet with prospective database users and domain experts to characterize what data needs to be stored and how that data will be accessed. The result of this step is a set of user requirements, which are expressed via some combination of text and diagrams.

2. the designers choose a data model and translate the user requirements into a conceptual schema in this model. We will use the E-R model at this step as a way to analyze data and relationships among data items.

3. the designers review the database schema to ensure that it supports the functional requirements of the application as a whole - that is, that data can be accessed and updated in the required ways.

4. the conceptual schema is implemented

(a) logical design phase: the conceptual schema is mapped to an implementation schema. We will use an algorithm to transform an E-R schema into a relational database schema.

(b) physical design phase: file and index structures are chosen for the physical schema of the database In particular, the database schema should avoid:

• redundancy: repeated information wastes storage and increases opportunities for inconsistency

• incompleteness: bad design may make information difficult to store. For example, if we eliminate the customer relation by adding name and address attributes to the purchased relation, we have no clean way to store information about customers who have not purchased a flyrod.

2 The Entity-Relationship Model

2.1 Entity Sets

An entity is something that exists and can be distinguished from other entities. Examples:

• customer entities with unique social security numbers

• account entities with unique account numbers An entity set is a set of entities of the same type.

• example: all of the account entities for a bank

• entity sets need not be disjoint. Example: a person entity could be in both the customer and employee entity sets.

An attribute is a characteristic or property of an entity.

(2)

• example: a customer entity might have attributes such as: customer name, social security number, address, ...

• as all entities in an entity set have the same attributes, entity sets also have attributes - the attributes of the contained entities. The value of the attribute can be different for each entity in the set.

The domain of an attribute is the set of possible values for the attribute (the type of the attribute).

Examples:

• the domain of customer name might be strings of some fixed length

• the domain of social security number might be 9 digit positive integers An entity is represented by (consists of) a value for each of its attributes.

• entities can be written as a record or tuple (like a row in a table). Example: for a customer entity with attributes custnum, name, and address:

2 Elvis Presley Graceland

Note that this representation forces attributes to always be written in the same order.

The custnum attribute is added to ensure that customer entities can always be uniquely identified based only on attribute values, as two customers might have the same name and address.

• entity sets can be written as a group of entities in a box Example: entity sets for an ER database for a fly shop

• logical schema (entity sets and their attributes):

– entity set fly

attribute domain

stocknum positive integer pattern string[30]

size positive integer color {black, olive, natural}

inventory positive integer

– entity set flyrod

attribute domain

stocknum positive integer manufacturer string[30]

length positive real (feet) lineweight positive integer inventory positive integer

(3)

– entity set customer

attribute domain

custnum positive integer name string[30]

address string[50]

• instance:

1 Tim Wahls Tome 244

1 G. Loomis 8.5 6 3

2 gold−ribbed hare’s ear 14 natural 4

1 dahlberg diver 2 natural 4

3 Sage 7.0 2 5 3 Linda Null E258

customer

flyrod

fly

5 woolly bugger 10 olive 3

4 woolly bugger 2 black 7

2 Orvis 9.5 9 4

Figure 1: The entity sets in an instance of the flyshop database

Types of attributes:

• simple vs. composite attributes

– simpleattributes are atomic - they can not be divided into smaller parts

– composite attributes are attributes that can be subdivided. For example, an address attribute could be divided into house number, street, city, state and zip code.

• single vs. multivalued attributes

– single valued attributes have only one value for a particular entity

– multivalued attributes can have multiple values for the same entity. For example, a customer could have 0, 1 or many telephone numbers.

• derived attributes which can be calculated from other information in the database. For example, if all of a students grades are stored, then gpa is a derived attribute. Values of derived attributes are typically computed when needed rather than being stored directly.

The special value null is used to indicate that the value for an attribute does not exist or is not known.

(4)

2.2 Relationship Sets

A relationship is an association among 2 or more entities - for example, the relationship between a customer and a Sage 2-weight flyrod purchased at the fly shop.

A relationship set is a set of relationships of the same type. The following diagram shows two relationship sets:

• purchased, which is a relationship between customers and flyrods

• requestedby, which is a relationship between customers and flies

requestedby

purchased

1 Tim Wahls Tome 244

1 G. Loomis 8.5 6 3

2 gold−ribbed hare’s ear 14 natural 4

1 dahlberg diver 2 natural 4

3 Sage 7.0 2 5 3 Linda Null E258

customer

flyrod

fly

2 Orvis 9.5 9 4

Figure 2: An instance of an entity-relationship database for the flyshop.

A relationship set is a mathematical relation on two or more entity sets. A relationship set R on entity sets E₁, E₂, ..., En for n ≥ 2 is a subset of:

{(e₁, e₂, ..., en) | e₁∈ E₁, e₂∈ E₂, ..., en∈ En} where (e₁, e₂, ..., en) is a relationship.

The value of {(e₁, e₂, ..., en) | e₁∈ E₁, e₂ ∈ E₂, ..., en ∈ En} is the cartesian product of E₁, E₂, ..., En - the set of all possible relationships between entities in E₁, entities in E₂, ..., and entities in Eⁿ.

Notes:

• a relationship set is part of the schema of an ER database, while each relationship in the set is part of an instance

• the way that an entity participates in a relationship is called the role of the entity. Roles are only needed when an entity set participates in the same relationship set multiple times. For example, given an entity set employee, a relationship set manages could be defined that relates each employee to his or her manager (who is also an employee). Roles are needed to clarify which employee is the manager and which is the managee.

(5)

• a relationship may have attributes, and so a relationship set can have attributes. For example, relationship set purchased could have a pdate attribute recording the date of purchase.

• each relationship in a relationship set must be uniquely identifiable from only the participating entities (without using any attributes of the relationship set). For example, if the same customer can purchase the same flyrod model multiple times (which is realistic), the attribute pdate of relationship set purchased must be a multivalued attribute.

• a relationship set on two entity sets is called a binary relationship set, or a relationship set of degree 2. This is the most common case, but relationship sets of degree 3 and higher are allowed.

3 Constraints

Constraints are restrictions on the data stored in a database. The constraints themselves are part of the schema of the database, and dictate whether or not a particular instance is a legal instance of that schema.

3.1 Mapping Cardinalities

A mapping cardinality is a data constraint that specifies how many entities an entity can be related to in a relationship set.

Example: each customer can purchase as many kinds of flyrods as they want (a constraint on purchased), but the fly shop only tracks the first customer to request each fly pattern (a constraint on requestedby).

Mapping cardinalities on binary relationship sets are simplest, although they can also be defined for relationship sets of higher degree.

Consider a binary relationship set R on entity sets A and B. There are four possible mapping cardinalities in this case:

1. one to one - an entity in A is related to at most one entity in B, and an entity in B is related to at most one entity in A.

a1 a2 a3

b1

b4 b2 b3

2. one to many - an entity in A is related to any number of entities in B, but an entity in B is related to at most one entity in A.

a1 a2 a3

b1 b2 b3 b5 b4

Example: if the fly shop stocks flies by customer request only, then relationship set requestedby from customer to fly has one to many mapping cardinality because no one else need request a fly once the shop stocks it.

3. many to one - an entity in A is related to at most one entity in B, but an entity in B is related to any number of entities in A.

a1 b1

b2 b3 b5 b4 a2

a3 a4 a5

(6)

4. many to many - an entity in A is related to any number of entities in B, and an entity in B is related to any number of entities in A.

a1 b1

b2 b3 b5 b4 a2

a3 a4 a5

Relationship set purchased has many to many mapping cardinality because a customer can buy many different kinds of flyrods, and each kind of flyrod can be purchased by multiple customers.

The mapping cardinality of a relationship set depends on the real world relationships it is modeling.

3.2 Keys

Issue: distinguishing between entities in an entity set (and relationships in a relationship set). We require that all entities are distinguishable based only on attribute values - no two (different) entities are allowed to have the same values for all attributes. The concept of keys (and associated terminology) is very similar to that of the relational model.

3.2.1 Entity Sets

A superkey is a set of one or more attributes that allow entities to be uniquely identified.

All of the following are superkeys for entity set fly:

• {stocknum}

• {stocknum, size}

• {pattern, size, color}

• {stocknum, pattern, size, color}

This list is not comprehensive.

A superkey must allow entities (or relationships) to be distinguished across all possible instances. This often requires outside knowledge about what attributes mean.

A candidate key is a superkey that has no superkeys as proper subsets. A candidate key is a minimal superkey.

Candidate keys for fly:

• {stocknum}

• {pattern, size, color}

and no others.

The primary key is the (one) candidate key chosen (by the database designer or database administrator) as the primary means of uniquely identifying entities.

Example: {stocknum} is the most likely primary key for fly.

Primary keys should be chosen so that they are unlikely to ever change. Frequently, primary keys are artificially generated (stock numbers, order numbers, social security numbers, ...) to ensure that this is the case.

3.2.2 Relationship Sets

Let R be a relationship set on entity sets E₁, E₂, ..., En, and let primary-key(Ei) denote the set of primary key attributes for entity set Ei.

(7)

• if R has no attributes, then the set of attributes

primary-key(E₁) ∪ primary-key(E₂) ∪ ... ∪ primary-key(En) is sufficient to completely describe any relationship in R

• if R has attributes a₁, a₂, ..., am, then the set of attributes

primary-key(E₁) ∪ primary-key(E₂) ∪ ... ∪ primary-key(En) ∪ {a₁, a₂, ..., am} is sufficient to completely describe any relationship in R

In either case, the set of attributes:

primary-key(E₁) ∪ primary-key(E₂) ∪ ... ∪ primary-key(Eⁿ)

forms a superkey for the relationship set. If the primary key attributes from the entity sets do not have unique names, they must be renamed (typically by prefixing the attribute name with the entity set name). If one entity set participates multiple times in the relationship set, then the role name (rather than the entity set name) is used when forming unique attribute names.

The primary key of a relationship set depends on the mapping cardinality:

1. one to one mapping cardinality: the primary key of any of the entity sets involved is a candidate key for the relationship set, because any entity (from any entity set) can be involved in at most one relationship.

2. one to many mapping cardinality: the primary key of the relationship set is the primary key of the second entity set (the to many set), because an entity in the second entity set can be involved in at most one relationship.

3. many to one mapping cardinality: the primary key of the relationship set is the primary key of the first entity set (the to many set), because an entity in the first entity set can be involved in at most one relationship.

4. many to many mapping cardinality: the primary key of the relationship set includes the primary keys of all of the entity sets involved.

Examples - assuming that the primary keys of entity sets are as follows:

entity set primary key customer {custnum}

flyrod {stocknum}

fly {stocknum}

then the primary keys for our relationship sets are:

relationship set mapping cardinality primary key

purchased many to many {custnum, stocknum}

requestedby one to many {stocknum}

3.3 Participation Constraints

The participation of an entity set in a relationship set is total if every entity must participate in at least one relationship. Otherwise, the participation of the entity set is partial. Examples:

• if the fly shop stocks only flies that customers have requested, then the participation of entity set fly in the requestedby relationship set is total. Since a customer is not required to request a fly, the participation of customer in requestedby is partial.

• the participation of entity sets customer and flyrod in relationship set purchased is partial in both cases. A customer is not required to purchase a flyrod, and some flyrod models may not (yet) have been purchased.

(8)

4 Entity-Relationship Diagrams

The logical schema of an E-R database can be represented graphically by an E-R diagram as follows:

• entity sets are represented by rectangles

• attributes are represented by ovals

• relationship sets are represented by diamonds

• lines link attributes to entity and relationship sets, and entity sets to relationship sets.

• multivalued attributes are represented by double ovals

• derived attributes (attributes whose value can be computed from other attributes) are represented by dashed ovals

• weak entity sets (more later) are represented by double rectangles Constraints are represented in E-R diagrams as follows:

• mapping cardinalities:

– a directed link (arrow) represents a to one relationship – an undirected link represents a to many relationship

• participation constraints:

– a double line represents total participation – a single line represents partial participation

stocknum length

fly

pdate pattern

size color

inventory

custnum name

address

manufacturer stocknum

lineweight inventory

requestedby

customer

purchased

flyrod

Figure 3: Entity-relationship diagram for the flyshop DBMS Note that primary key attributes of the entity sets are underlined.

(9)

Composite attributes are represented by attributes with attributes. For example:

person

ssn name address

number street

city state zip

Figure 4: Composite attributes

Roles are indicated by labeling the line between the entity set and the relationship set with the role name:

person mother

father

ssn name

address

child dad mom

child

Figure 5: Using roles to clarify a relationship between an entity set and itself

Mapping cardinalities can also be indicated by labeling the line between an entity set and a relationship set with l..h, where l is the smallest number of relationships that an entity can be involved in and h is the largest.

• ∗ can be used as the upper bound to indicate no limit

• a lower bound of 1 specifies that the participation of the entity set is total

• if the upper bound on both sides of a relationship set is one, then the relationship set is one to one Examples:

person mother

father

ssn name

address

child dad mom

child 0 .. 1

0 .. * 0 .. *

0 .. 1

Figure 6: Alternate notation for mapping cardinalities

Note that 0..1 is on the “to many” side of the relationship (the opposite side from where the arrowhead would go).

(10)

stocknum length

fly

pdate pattern

size color

inventory

custnum name

address

manufacturer stocknum

lineweight inventory

requestedby

customer

purchased

flyrod

1 .. 1 0 .. * 0 .. *

0 .. *

Figure 7: Another example of the alternate notation for mapping cardinalities

E-R diagrams are often used for designing other kinds of databases (especially relational databases) and for modeling software systems.

5 Design Issues

• use of entity sets vs. attributes – example: addresses – how to decide:

∗ do we need to store extra information associated only with addresses?

∗ can entities be related to multiple addresses?

• use of entity sets vs. relationship sets – example: an online shopping cart

∗ if the shopping cart is represented as a relationship set, then each item placed in the cart must be represented by a separate relationship. This forces attributes of the shopping cart (date, transaction number, ...) to be stored once for every item placed in the cart.

∗ it the shopping cart is represented as an entity set, then the cart can be involved in multiple relationships with items purchased with no redundant information stored.

– example: the purchase of a flyrod in the flyshop. Since each purchase involves one customer and one flyrod, this is best represented as a relationship set.

• binary vs. n-ary relationship sets

– an n-ary relationship relationship set can always be represented by several binary relationship sets – example: a ternary (3-ary) relationship set parents storing the parents of each child could be

represented by two binary relationship sets mother and father.

– how to decide: must all entities involved in the relationship be known in order to have valid information to store? For example, if we have information about only one parent, we must use a nullvalue in the parents relationship set.

• placement of relationship set attributes

Issue: can attributes of a relationship set be moved to one of the participating entity sets?

– if the mapping cardinality of the relationship set is one to one, one to many, or many to one, then the attributes can be moved to the entity set on the “to many” side (or either side in the one to one case) of the relationship set

(11)

∗ example: if requestedby had a requestdate attribute, this attribute could be moved to entity set fly with no loss of information

∗ if the participation of the entity set in the relationship set is not total, then null values would have to be used for the values of those attributes in some cases

– if the mapping cardinality of the relationship set is many to many, then any attributes of the relationship set can not be moved to either participating entity set. For example, purchased has many to many mapping cardinality, which means that each customer can purchase multiple flyrods and each flyrod model can be purchased by multiple customers. Hence, attribute pdate can not be moved to either of these entity sets without losing information.

6 Weak Entity Sets

A weak entity set is an entity set that does not have sufficient attributes to form a primary key. An entity set that does have a primary key is called a strong entity set.

For example, consider shopping carts used by an online store. If the only attributes stored for shopping cards are the time, date and the amount (value of items in the cart), then two distinct shopping carts could have the same values on all attributes. Since all attributes together do not form a superkey, entity set scart does not have sufficient attributes to form a primary key. One solution to this problem is to add a synthetic attribute (such as a transaction number) to serve as the primary key. This works, but is not always necessary or desirable.

A weak entity set must always be associated with another entity set (the identifying or owner entity set), and every entity in the weak entity set must be associated with exactly one entity in the identifying entity set. The weak entity set is said to be existence dependent on the identifying entity set. For example, the scart entity set could be owned by a customer entity set, which means that two shopping carts that seem to be identical could be identified by which customer they belong to.

The relationship between the weak entity set and the identifying entity set is called the identifying relationship. Participation of the weak entity set in the identifying relationship must be total, and the identifying relationship is many to one (or possibly one to one) from the weak entity set to the identifying entity set. Given these possible mapping cardinalities, an identifying relationship is not allowed to have attributes - they can always be moved to the weak entity set.

A discriminator is a set of attributes of a weak entity set that allow two weak entities belonging to the same strong entity to be distinguished. For example, if one customer can not have two different shopping carts at the same time, then the time and date attributes form a discriminator for scart. The primary key of a weak entity set is constructed by combining the primary key of the identifying entity set with the discriminator of the weak entity set. For example, if the primary key of customer is custnum, then the primary key of scart is {custnum, tdate, ttime}, where tdate and ttime are the date and time attributes of scart.

In an E-R diagram:

• a weak entity set is indicated by a double rectangle

• an identifying relationship is indicated by a double diamond

• discriminator attributes are underlined with a dashed line

has

customer scart holds inventory

name address

tdate

ttime amount stocknum description

custnum qty

Figure 8: E-R Diagram for an Online Store (with a Weak Entity Set)

(12)

7 Inheritance

In the E-R model, one entity set can inherit the attributes and relationships of another entity set. An inheritance relationship is indicated in an E-R diagram using a triangle labeled with ISA.

person has address

ISA

customer employee

street

ssn dob

name

number zip

salary dept

city state

credit

Figure 9: E-R Diagram for Inheritance Entity sets customer and employee inherit:

• the attributes ssn, name and dob

• {ssn} as the primary key

• the has relationship with address entities

The entity set inherited from (person in this example) is called a higher level entity set, while the inheriting entity sets (customer and employee) are called lower level entity sets.

Several kinds of constraints can be placed on inheritance relationships:

• disjoint or overlapping

– whether or not a higher level entity can belong to multiple lower level entity sets

– a disjointness constraint is indicated in an E-R diagram by writing “disjoint” next to the ISA relationship (triangle)

– overlapping is the default

• total or partial specialization

– whether or not every higher level entity must belong to a lower level entity set

– total specialization is indicated in an E-R diagram by drawing a double line from the higher level entity set to the ISA relationship

– partial is the default

So in the example E-R diagram above, a person is allowed to be both a customer and an employee, and every person must be either a customer or an employee (or both).

8 Reducing Entity-Relationship Schemas to Relational Schemas

Idea: represent the data and relationships in an E-R database as a collection of relations.

(13)

8.1 Reducing Entity Sets to Relations

The table corresponding to a strong entity set has a column for each attribute of the entity set and a row for each entity in the entity set. The table has the same name and primary key as the entity set.

Example: the table for entity set fly (from Figure 1).

stocknum pattern size color inventory

2 gold-ribbed hare’s ear 14 natural 4

1 Dahlberg Diver 2 natural 4

Table 1: fly Does the order of rows matter?

For a weak entity set, the table has one column for each attribute of the entity set, and an additional column for each primary key attribute of the identifying entity set. The primary key is the union of the primary key of the identifying entity set and the discriminator of the weak entity set. For example, the table for the scart weak entity set (Figure 8) has columns custnum, ttime, tdate and amount, and primary key {custnum, ttime, tdate}.

8.2 Reducing Relationship Sets to Tables

The table for a relationship set will have a column for each attribute of the primary key of each entity set involved in the relationship, and one for each attribute of the relationship set. These attributes are renamed if necessary to avoid name clashes. The table will have one row for each relationship, and will have the same name and primary key as the relationship set.

Example: relationship set requestedby (from Figure 2) becomes:

custnum stocknum

1 5

3 2

2 4

2 1

Table 2: requestedby with {stocknum} as the primary key.

The primary keys of the entity sets involved in the relationship become foreign keys in the table for the relationship set. For the requestedby table above, {custnum} is a foreign key referencing the customer table (derived from the customer entity set), and {stocknum} is a foreign key referencing the fly table.

The identifying relationship set for a weak entity set is not reduced to a table. Since the table for a weak entity set includes the primary key of the identifying entity set, including a table for the identifying relationship is redundant. For example, the table for the weak entity set scart (Figure 8) has columns for custnum, ttime, tdate and amount. The table for the identifying relationship set has would have columns for custnum, ttime and tdate, and so all data contained in this table would already be contained in the table for scart.

The table for an identifying relationship is guaranteed to be redundant because identifying relationships are not allowed to have attributes. Additionally, participation of the weak entity set in the identifying relationship must be total, so the table for the weak entity set will never contain null values for the primary key attributes of the identifying entity set. Because identifying relationships must be many to one (or one to one), information about one weak entity is never repeated in the table for the weak entity set.

In general, the table for a relationship set can be combined with the table for an entity set whenever:

• the relationship set is many to one

(14)

• the participation of the entity set on the “to many” side of the relationship is total

The attributes of the relationship set itself and the primary key attributes of the entity set on the “to one”

side of the relationship are added as columns in the table for the entity set on the “to many” side of the relationship.

For example, the table for relationship set requestedby could be eliminated by adding a custnum column to table fly (to store the customer number of the customer requesting the fly). Since each fly is requested by exactly one customer:

• information about a fly never has to be repeated in order to store that multiple customers have requested the fly

• the column for custnum will never contain null values

8.3 Complex Attributes

Composite attributes are handled by “flattening” them into simple attributes. For example, the table generated from entity set person in Figure 4 would have columns ssn, name, number, street, city, state and zip.

Unlike other attribute types, a multivalued attribute is converted to a table. This table has one column for the attribute itself, and additional columns for the primary key attributes of the entity or relationship set that it belongs to. For example, the following entity set employee:

employee

salary ssn

name dept

dependents

would be reduced to two tables:

• a table named employee with columns ssn, name, dept and salary

• a table named (for example) dependents with columns ssn and dependent

With these tables, any number of dependents can be stored for any employee, and no null values are needed if an employee has no dependents.

If all attributes of the entity or relationship set other than the multivalued attribute are primary key attributes, then only the table constructed from the multivalued attribute is needed. For example, relationship set purchased (from Figure 2) becomes:

custnum stocknum pdate

1 3 9/4/1998

3 3 9/3/1997

2 2 8/8/2002

3 2 9/2/2003

3 1 9/1/1995

Table 3: purchased

In general, all attributes of the table derived from a multivalued attribute are needed to form a primary key. Can you argue that this is true for the employee and purchased tables we have just derived? Additionally, the primary key attributes of the original entity or relationship set become foreign keys referencing the tables derived from those sets.

(15)

8.4 Inheritance

Tables for entity sets constructed using inheritance are reduced to tables as follows:

1. construct a table for the higher level entity set as usual

2. for each lower level entity set, construct a table with columns for each attribute of the lower level entity set, and for the primary key attributes of the higher level entity set.

For example, the tables for entity sets person, customer and employee in Figure 9 would have the following columns:

• person: ssn, name, dob

• customer: ssn, credit

• employee: ssn, salary, dept

Note that {ssn} is the primary key for each of these tables, and that {ssn} in both customer and employee is a foreign key referencing person.

If the inheritance relationship is disjoint and complete (i.e. there is no overlap in the lower level entity sets and every higher level entity is also a lower level entity), then another possibility is as follows:

• no table is created for the higher level entity set

• each lower level entity set is reduced to a table as usual, except that the attributes of the higher level entity set are added as columns

The primary key of each of the tables derived from lower level entity sets is the primary key of the higher level entity set. If the inheritance relationship were not complete, then some entities could not be stored with this representation. If the inheritance relationship were not disjoint, then this representation would have some redundancy.