COMP 378 Database Systems
Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model
The entity-relationship (E-R) model is a a data model in which information stored in the database is viewed as sets of entities and sets of relationships among entities. The primary use of the E-R model is as a design tool for relational databases.
1 The Design Process
One (relatively) typical process:
1. the designers meet with prospective database users and domain experts to characterize what data needs to be stored and how that data will be accessed. The result of this step is a set of user requirements, which are expressed via some combination of text and diagrams.
2. the designers choose a data model and translate the user requirements into a conceptual schema in this model. We will use the E-R model at this step as a way to analyze data and relationships among data items.
3. the designers review the database schema to ensure that it supports the functional requirements of the application as a whole - that is, that data can be accessed and updated in the required ways.
4. the conceptual schema is implemented
(a) logical design phase: the conceptual schema is mapped to an implementation schema. We will use an algorithm to transform an E-R schema into a relational database schema.
(b) physical design phase: file and index structures are chosen for the physical schema of the database In particular, the database schema should avoid:
• redundancy: repeated information wastes storage and increases opportunities for inconsistency
• incompleteness: bad design may make information difficult to store. For example, if we eliminate the customer relation by adding name and address attributes to the purchased relation, we have no clean way to store information about customers who have not purchased a flyrod.
2 The Entity-Relationship Model
2.1 Entity Sets
An entity is something that exists and can be distinguished from other entities. Examples:
• customer entities with unique social security numbers
• account entities with unique account numbers An entity set is a set of entities of the same type.
• example: all of the account entities for a bank
• entity sets need not be disjoint. Example: a person entity could be in both the customer and employee entity sets.
An attribute is a characteristic or property of an entity.
• example: a customer entity might have attributes such as: customer name, social security number, address, ...
• as all entities in an entity set have the same attributes, entity sets also have attributes - the attributes of the contained entities. The value of the attribute can be different for each entity in the set.
The domain of an attribute is the set of possible values for the attribute (the type of the attribute).
Examples:
• the domain of customer name might be strings of some fixed length
• the domain of social security number might be 9 digit positive integers An entity is represented by (consists of) a value for each of its attributes.
• entities can be written as a record or tuple (like a row in a table). Example: for a customer entity with attributes custnum, name, and address:
2 Elvis Presley Graceland
Note that this representation forces attributes to always be written in the same order.
The custnum attribute is added to ensure that customer entities can always be uniquely identified based only on attribute values, as two customers might have the same name and address.
• entity sets can be written as a group of entities in a box Example: entity sets for an ER database for a fly shop
• logical schema (entity sets and their attributes):
– entity set fly
attribute domain
stocknum positive integer pattern string[30]
size positive integer color {black, olive, natural}
inventory positive integer
– entity set flyrod
attribute domain
stocknum positive integer manufacturer string[30]
length positive real (feet) lineweight positive integer inventory positive integer
– entity set customer
attribute domain
custnum positive integer name string[30]
address string[50]
• instance:
1 Tim Wahls Tome 244
1 G. Loomis 8.5 6 3
2 gold−ribbed hare’s ear 14 natural 4
1 dahlberg diver 2 natural 4
2 Elvis Presley Graceland
3 Sage 7.0 2 5 3 Linda Null E258
customer
flyrod
fly
5 woolly bugger 10 olive 3
4 woolly bugger 2 black 7
2 Orvis 9.5 9 4
Figure 1: The entity sets in an instance of the flyshop database
Types of attributes:
• simple vs. composite attributes
– simpleattributes are atomic - they can not be divided into smaller parts
– composite attributes are attributes that can be subdivided. For example, an address attribute could be divided into house number, street, city, state and zip code.
• single vs. multivalued attributes
– single valued attributes have only one value for a particular entity
– multivalued attributes can have multiple values for the same entity. For example, a customer could have 0, 1 or many telephone numbers.
• derived attributes which can be calculated from other information in the database. For example, if all of a students grades are stored, then gpa is a derived attribute. Values of derived attributes are typically computed when needed rather than being stored directly.
The special value null is used to indicate that the value for an attribute does not exist or is not known.
2.2 Relationship Sets
A relationship is an association among 2 or more entities - for example, the relationship between a customer and a Sage 2-weight flyrod purchased at the fly shop.
A relationship set is a set of relationships of the same type. The following diagram shows two rela- tionship sets:
• purchased, which is a relationship between customers and flyrods
• requestedby, which is a relationship between customers and flies
requestedby
purchased
1 Tim Wahls Tome 244
1 G. Loomis 8.5 6 3
2 gold−ribbed hare’s ear 14 natural 4
1 dahlberg diver 2 natural 4
2 Elvis Presley Graceland
3 Sage 7.0 2 5 3 Linda Null E258
customer
flyrod
fly
5 woolly bugger 10 olive 3
4 woolly bugger 2 black 7
2 Orvis 9.5 9 4
Figure 2: An instance of an entity-relationship database for the flyshop.
A relationship set is a mathematical relation on two or more entity sets. A relationship set R on entity sets E1, E2, ..., En for n ≥ 2 is a subset of:
{(e1, e2, ..., en) | e1∈ E1, e2∈ E2, ..., en∈ En} where (e1, e2, ..., en) is a relationship.
The value of {(e1, e2, ..., en) | e1∈ E1, e2 ∈ E2, ..., en ∈ En} is the cartesian product of E1, E2, ..., En - the set of all possible relationships between entities in E1, entities in E2, ..., and entities in En.
Notes:
• a relationship set is part of the schema of an ER database, while each relationship in the set is part of an instance
• the way that an entity participates in a relationship is called the role of the entity. Roles are only needed when an entity set participates in the same relationship set multiple times. For example, given an entity set employee, a relationship set manages could be defined that relates each employee to his or her manager (who is also an employee). Roles are needed to clarify which employee is the manager and which is the managee.
• a relationship may have attributes, and so a relationship set can have attributes. For example, rela- tionship set purchased could have a pdate attribute recording the date of purchase.
• each relationship in a relationship set must be uniquely identifiable from only the participating en- tities (without using any attributes of the relationship set). For example, if the same customer can purchase the same flyrod model multiple times (which is realistic), the attribute pdate of relationship set purchased must be a multivalued attribute.
• a relationship set on two entity sets is called a binary relationship set, or a relationship set of degree 2. This is the most common case, but relationship sets of degree 3 and higher are allowed.
3 Constraints
Constraints are restrictions on the data stored in a database. The constraints themselves are part of the schema of the database, and dictate whether or not a particular instance is a legal instance of that schema.
3.1 Mapping Cardinalities
A mapping cardinality is a data constraint that specifies how many entities an entity can be related to in a relationship set.
Example: each customer can purchase as many kinds of flyrods as they want (a constraint on purchased), but the fly shop only tracks the first customer to request each fly pattern (a constraint on requestedby).
Mapping cardinalities on binary relationship sets are simplest, although they can also be defined for relationship sets of higher degree.
Consider a binary relationship set R on entity sets A and B. There are four possible mapping cardinalities in this case:
1. one to one - an entity in A is related to at most one entity in B, and an entity in B is related to at most one entity in A.
a1 a2 a3
b1
b4 b2 b3
2. one to many - an entity in A is related to any number of entities in B, but an entity in B is related to at most one entity in A.
a1 a2 a3
b1 b2 b3 b5 b4
Example: if the fly shop stocks flies by customer request only, then relationship set requestedby from customer to fly has one to many mapping cardinality because no one else need request a fly once the shop stocks it.
3. many to one - an entity in A is related to at most one entity in B, but an entity in B is related to any number of entities in A.
a1 b1
b2 b3 b5 b4 a2
a3 a4 a5
4. many to many - an entity in A is related to any number of entities in B, and an entity in B is related to any number of entities in A.
a1 b1
b2 b3 b5 b4 a2
a3 a4 a5
Relationship set purchased has many to many mapping cardinality because a customer can buy many different kinds of flyrods, and each kind of flyrod can be purchased by multiple customers.
The mapping cardinality of a relationship set depends on the real world relationships it is modeling.
3.2 Keys
Issue: distinguishing between entities in an entity set (and relationships in a relationship set). We require that all entities are distinguishable based only on attribute values - no two (different) entities are allowed to have the same values for all attributes. The concept of keys (and associated terminology) is very similar to that of the relational model.
3.2.1 Entity Sets
A superkey is a set of one or more attributes that allow entities to be uniquely identified.
All of the following are superkeys for entity set fly:
• {stocknum}
• {stocknum, size}
• {pattern, size, color}
• {stocknum, pattern, size, color}
This list is not comprehensive.
A superkey must allow entities (or relationships) to be distinguished across all possible instances. This often requires outside knowledge about what attributes mean.
A candidate key is a superkey that has no superkeys as proper subsets. A candidate key is a minimal superkey.
Candidate keys for fly:
• {stocknum}
• {pattern, size, color}
and no others.
The primary key is the (one) candidate key chosen (by the database designer or database administrator) as the primary means of uniquely identifying entities.
Example: {stocknum} is the most likely primary key for fly.
Primary keys should be chosen so that they are unlikely to ever change. Frequently, primary keys are artificially generated (stock numbers, order numbers, social security numbers, ...) to ensure that this is the case.
3.2.2 Relationship Sets
Let R be a relationship set on entity sets E1, E2, ..., En, and let primary-key(Ei) denote the set of primary key attributes for entity set Ei.
• if R has no attributes, then the set of attributes
primary-key(E1) ∪ primary-key(E2) ∪ ... ∪ primary-key(En) is sufficient to completely describe any relationship in R
• if R has attributes a1, a2, ..., am, then the set of attributes
primary-key(E1) ∪ primary-key(E2) ∪ ... ∪ primary-key(En) ∪ {a1, a2, ..., am} is sufficient to completely describe any relationship in R
In either case, the set of attributes:
primary-key(E1) ∪ primary-key(E2) ∪ ... ∪ primary-key(En)
forms a superkey for the relationship set. If the primary key attributes from the entity sets do not have unique names, they must be renamed (typically by prefixing the attribute name with the entity set name). If one entity set participates multiple times in the relationship set, then the role name (rather than the entity set name) is used when forming unique attribute names.
The primary key of a relationship set depends on the mapping cardinality:
1. one to one mapping cardinality: the primary key of any of the entity sets involved is a candidate key for the relationship set, because any entity (from any entity set) can be involved in at most one relationship.
2. one to many mapping cardinality: the primary key of the relationship set is the primary key of the second entity set (the to many set), because an entity in the second entity set can be involved in at most one relationship.
3. many to one mapping cardinality: the primary key of the relationship set is the primary key of the first entity set (the to many set), because an entity in the first entity set can be involved in at most one relationship.
4. many to many mapping cardinality: the primary key of the relationship set includes the primary keys of all of the entity sets involved.
Examples - assuming that the primary keys of entity sets are as follows:
entity set primary key customer {custnum}
flyrod {stocknum}
fly {stocknum}
then the primary keys for our relationship sets are:
relationship set mapping cardinality primary key
purchased many to many {custnum, stocknum}
requestedby one to many {stocknum}
3.3 Participation Constraints
The participation of an entity set in a relationship set is total if every entity must participate in at least one relationship. Otherwise, the participation of the entity set is partial. Examples:
• if the fly shop stocks only flies that customers have requested, then the participation of entity set fly in the requestedby relationship set is total. Since a customer is not required to request a fly, the participation of customer in requestedby is partial.
• the participation of entity sets customer and flyrod in relationship set purchased is partial in both cases. A customer is not required to purchase a flyrod, and some flyrod models may not (yet) have been purchased.
4 Entity-Relationship Diagrams
The logical schema of an E-R database can be represented graphically by an E-R diagram as follows:
• entity sets are represented by rectangles
• attributes are represented by ovals
• relationship sets are represented by diamonds
• lines link attributes to entity and relationship sets, and entity sets to relationship sets.
• multivalued attributes are represented by double ovals
• derived attributes (attributes whose value can be computed from other attributes) are represented by dashed ovals
• weak entity sets (more later) are represented by double rectangles Constraints are represented in E-R diagrams as follows:
• mapping cardinalities:
– a directed link (arrow) represents a to one relationship – an undirected link represents a to many relationship
• participation constraints:
– a double line represents total participation – a single line represents partial participation
stocknum length
fly
pdate pattern
size color
inventory
custnum name
address
manufacturer stocknum
lineweight inventory
requestedby
customer
purchased
flyrod
Figure 3: Entity-relationship diagram for the flyshop DBMS Note that primary key attributes of the entity sets are underlined.
Composite attributes are represented by attributes with attributes. For example:
person
ssn name address
number street
city state zip
Figure 4: Composite attributes
Roles are indicated by labeling the line between the entity set and the relationship set with the role name:
person mother
father
ssn name
address
child dad mom
child
Figure 5: Using roles to clarify a relationship between an entity set and itself
Mapping cardinalities can also be indicated by labeling the line between an entity set and a relationship set with l..h, where l is the smallest number of relationships that an entity can be involved in and h is the largest.
• ∗ can be used as the upper bound to indicate no limit
• a lower bound of 1 specifies that the participation of the entity set is total
• if the upper bound on both sides of a relationship set is one, then the relationship set is one to one Examples:
person mother
father
ssn name
address
child dad mom
child 0 .. 1
0 .. * 0 .. *
0 .. 1
Figure 6: Alternate notation for mapping cardinalities
Note that 0..1 is on the “to many” side of the relationship (the opposite side from where the arrowhead would go).
stocknum length
fly
pdate pattern
size color
inventory
custnum name
address
manufacturer stocknum
lineweight inventory
requestedby
customer
purchased
flyrod
1 .. 1 0 .. * 0 .. *
0 .. *
Figure 7: Another example of the alternate notation for mapping cardinalities
E-R diagrams are often used for designing other kinds of databases (especially relational databases) and for modeling software systems.
5 Design Issues
• use of entity sets vs. attributes – example: addresses – how to decide:
∗ do we need to store extra information associated only with addresses?
∗ can entities be related to multiple addresses?
• use of entity sets vs. relationship sets – example: an online shopping cart
∗ if the shopping cart is represented as a relationship set, then each item placed in the cart must be represented by a separate relationship. This forces attributes of the shopping cart (date, transaction number, ...) to be stored once for every item placed in the cart.
∗ it the shopping cart is represented as an entity set, then the cart can be involved in multiple relationships with items purchased with no redundant information stored.
– example: the purchase of a flyrod in the flyshop. Since each purchase involves one customer and one flyrod, this is best represented as a relationship set.
• binary vs. n-ary relationship sets
– an n-ary relationship relationship set can always be represented by several binary relationship sets – example: a ternary (3-ary) relationship set parents storing the parents of each child could be
represented by two binary relationship sets mother and father.
– how to decide: must all entities involved in the relationship be known in order to have valid information to store? For example, if we have information about only one parent, we must use a nullvalue in the parents relationship set.
• placement of relationship set attributes
Issue: can attributes of a relationship set be moved to one of the participating entity sets?
– if the mapping cardinality of the relationship set is one to one, one to many, or many to one, then the attributes can be moved to the entity set on the “to many” side (or either side in the one to one case) of the relationship set
∗ example: if requestedby had a requestdate attribute, this attribute could be moved to entity set fly with no loss of information
∗ if the participation of the entity set in the relationship set is not total, then null values would have to be used for the values of those attributes in some cases
– if the mapping cardinality of the relationship set is many to many, then any attributes of the relationship set can not be moved to either participating entity set. For example, purchased has many to many mapping cardinality, which means that each customer can purchase multiple flyrods and each flyrod model can be purchased by multiple customers. Hence, attribute pdate can not be moved to either of these entity sets without losing information.
6 Weak Entity Sets
A weak entity set is an entity set that does not have sufficient attributes to form a primary key. An entity set that does have a primary key is called a strong entity set.
For example, consider shopping carts used by an online store. If the only attributes stored for shopping cards are the time, date and the amount (value of items in the cart), then two distinct shopping carts could have the same values on all attributes. Since all attributes together do not form a superkey, entity set scart does not have sufficient attributes to form a primary key. One solution to this problem is to add a synthetic attribute (such as a transaction number) to serve as the primary key. This works, but is not always necessary or desirable.
A weak entity set must always be associated with another entity set (the identifying or owner entity set), and every entity in the weak entity set must be associated with exactly one entity in the identifying entity set. The weak entity set is said to be existence dependent on the identifying entity set. For example, the scart entity set could be owned by a customer entity set, which means that two shopping carts that seem to be identical could be identified by which customer they belong to.
The relationship between the weak entity set and the identifying entity set is called the identifying relationship. Participation of the weak entity set in the identifying relationship must be total, and the identifying relationship is many to one (or possibly one to one) from the weak entity set to the identifying entity set. Given these possible mapping cardinalities, an identifying relationship is not allowed to have attributes - they can always be moved to the weak entity set.
A discriminator is a set of attributes of a weak entity set that allow two weak entities belonging to the same strong entity to be distinguished. For example, if one customer can not have two different shopping carts at the same time, then the time and date attributes form a discriminator for scart. The primary key of a weak entity set is constructed by combining the primary key of the identifying entity set with the discriminator of the weak entity set. For example, if the primary key of customer is custnum, then the primary key of scart is {custnum, tdate, ttime}, where tdate and ttime are the date and time attributes of scart.
In an E-R diagram:
• a weak entity set is indicated by a double rectangle
• an identifying relationship is indicated by a double diamond
• discriminator attributes are underlined with a dashed line
has
customer scart holds inventory
name address
tdate
ttime amount stocknum description
custnum qty
Figure 8: E-R Diagram for an Online Store (with a Weak Entity Set)
7 Inheritance
In the E-R model, one entity set can inherit the attributes and relationships of another entity set. An inheritance relationship is indicated in an E-R diagram using a triangle labeled with ISA.
person has address
ISA
customer employee
street
ssn dob
name
number zip
salary dept
city state
credit
Figure 9: E-R Diagram for Inheritance Entity sets customer and employee inherit:
• the attributes ssn, name and dob
• {ssn} as the primary key
• the has relationship with address entities
The entity set inherited from (person in this example) is called a higher level entity set, while the inheriting entity sets (customer and employee) are called lower level entity sets.
Several kinds of constraints can be placed on inheritance relationships:
• disjoint or overlapping
– whether or not a higher level entity can belong to multiple lower level entity sets
– a disjointness constraint is indicated in an E-R diagram by writing “disjoint” next to the ISA relationship (triangle)
– overlapping is the default
• total or partial specialization
– whether or not every higher level entity must belong to a lower level entity set
– total specialization is indicated in an E-R diagram by drawing a double line from the higher level entity set to the ISA relationship
– partial is the default
So in the example E-R diagram above, a person is allowed to be both a customer and an employee, and every person must be either a customer or an employee (or both).
8 Reducing Entity-Relationship Schemas to Relational Schemas
Idea: represent the data and relationships in an E-R database as a collection of relations.
8.1 Reducing Entity Sets to Relations
The table corresponding to a strong entity set has a column for each attribute of the entity set and a row for each entity in the entity set. The table has the same name and primary key as the entity set.
Example: the table for entity set fly (from Figure 1).
stocknum pattern size color inventory
5 woolly bugger 10 olive 3
2 gold-ribbed hare’s ear 14 natural 4
4 woolly bugger 2 black 7
1 Dahlberg Diver 2 natural 4
Table 1: fly Does the order of rows matter?
For a weak entity set, the table has one column for each attribute of the entity set, and an additional column for each primary key attribute of the identifying entity set. The primary key is the union of the primary key of the identifying entity set and the discriminator of the weak entity set. For example, the table for the scart weak entity set (Figure 8) has columns custnum, ttime, tdate and amount, and primary key {custnum, ttime, tdate}.
8.2 Reducing Relationship Sets to Tables
The table for a relationship set will have a column for each attribute of the primary key of each entity set involved in the relationship, and one for each attribute of the relationship set. These attributes are renamed if necessary to avoid name clashes. The table will have one row for each relationship, and will have the same name and primary key as the relationship set.
Example: relationship set requestedby (from Figure 2) becomes:
custnum stocknum
1 5
3 2
2 4
2 1
Table 2: requestedby with {stocknum} as the primary key.
The primary keys of the entity sets involved in the relationship become foreign keys in the table for the relationship set. For the requestedby table above, {custnum} is a foreign key referencing the customer table (derived from the customer entity set), and {stocknum} is a foreign key referencing the fly table.
The identifying relationship set for a weak entity set is not reduced to a table. Since the table for a weak entity set includes the primary key of the identifying entity set, including a table for the identifying relationship is redundant. For example, the table for the weak entity set scart (Figure 8) has columns for custnum, ttime, tdate and amount. The table for the identifying relationship set has would have columns for custnum, ttime and tdate, and so all data contained in this table would already be contained in the table for scart.
The table for an identifying relationship is guaranteed to be redundant because identifying relationships are not allowed to have attributes. Additionally, participation of the weak entity set in the identifying relationship must be total, so the table for the weak entity set will never contain null values for the primary key attributes of the identifying entity set. Because identifying relationships must be many to one (or one to one), information about one weak entity is never repeated in the table for the weak entity set.
In general, the table for a relationship set can be combined with the table for an entity set whenever:
• the relationship set is many to one
• the participation of the entity set on the “to many” side of the relationship is total
The attributes of the relationship set itself and the primary key attributes of the entity set on the “to one”
side of the relationship are added as columns in the table for the entity set on the “to many” side of the relationship.
For example, the table for relationship set requestedby could be eliminated by adding a custnum column to table fly (to store the customer number of the customer requesting the fly). Since each fly is requested by exactly one customer:
• information about a fly never has to be repeated in order to store that multiple customers have requested the fly
• the column for custnum will never contain null values
8.3 Complex Attributes
Composite attributes are handled by “flattening” them into simple attributes. For example, the table generated from entity set person in Figure 4 would have columns ssn, name, number, street, city, state and zip.
Unlike other attribute types, a multivalued attribute is converted to a table. This table has one column for the attribute itself, and additional columns for the primary key attributes of the entity or relationship set that it belongs to. For example, the following entity set employee:
employee
salary ssn
name dept
dependents
would be reduced to two tables:
• a table named employee with columns ssn, name, dept and salary
• a table named (for example) dependents with columns ssn and dependent
With these tables, any number of dependents can be stored for any employee, and no null values are needed if an employee has no dependents.
If all attributes of the entity or relationship set other than the multivalued attribute are primary key at- tributes, then only the table constructed from the multivalued attribute is needed. For example, relationship set purchased (from Figure 2) becomes:
custnum stocknum pdate
1 3 9/4/1998
3 3 9/3/1997
2 2 8/8/2002
3 2 9/2/2003
3 1 9/1/1995
Table 3: purchased
In general, all attributes of the table derived from a multivalued attribute are needed to form a primary key. Can you argue that this is true for the employee and purchased tables we have just derived? Additionally, the primary key attributes of the original entity or relationship set become foreign keys referencing the tables derived from those sets.
8.4 Inheritance
Tables for entity sets constructed using inheritance are reduced to tables as follows:
1. construct a table for the higher level entity set as usual
2. for each lower level entity set, construct a table with columns for each attribute of the lower level entity set, and for the primary key attributes of the higher level entity set.
For example, the tables for entity sets person, customer and employee in Figure 9 would have the following columns:
• person: ssn, name, dob
• customer: ssn, credit
• employee: ssn, salary, dept
Note that {ssn} is the primary key for each of these tables, and that {ssn} in both customer and employee is a foreign key referencing person.
If the inheritance relationship is disjoint and complete (i.e. there is no overlap in the lower level entity sets and every higher level entity is also a lower level entity), then another possibility is as follows:
• no table is created for the higher level entity set
• each lower level entity set is reduced to a table as usual, except that the attributes of the higher level entity set are added as columns
The primary key of each of the tables derived from lower level entity sets is the primary key of the higher level entity set. If the inheritance relationship were not complete, then some entities could not be stored with this representation. If the inheritance relationship were not disjoint, then this representation would have some redundancy.