CONCEPTUAL DATA AND KNOWLEDGE MODELING
6.2 ENTITY-RELATIONSHIP DATA MODELING
6.2.1 WHAT IS THE ENTITY-RELATIONSHIP (ER) APPROACH?
The entity-relationship (ER) [Chen 1976] approach provides an effective way for conceptual modeling of data. (Actually, ER approach goes beyond database modeling, such as in software engineering, but we will not discuss this issue here.) The underlying idea is simple: Data can be described in terms of "things" and their connections. Consequently, there are two kinds of basic constructs in an ER model: entity sets consists of entities, as well as relationship sets connecting the entity sets. Both entity sets and relationship sets can be described by attributes. ER modeling typically makes use of ER diagrams. An ER diagram (ERD) is the graphical expression of the overall
logical structure of a database. In an ERD, each entity set is represented by a rectangle, each relationship set is represented by a diamond and is connected to associated entity sets by lines, and each attribute is represented by an oval.
6.2.2 A SIMPLE EXAMPLE
Consider a university database of student information, the courses offered, and the courses taken by individual students. Information recorded for students includes S-ID, name, major, etc. Information recorded for courses include call numbers, department offering this course, and the section of this course. For the time being, we will assume a course can have many students enrolled, but each student can only register for one course. The information for the date a course is taken is also recorded. Based on this design requirement, an ERD like Figure 6.1(a) can be developed to capture this mini- world. Note that an arrow has been used to denote the restriction that each student can only take one course.
Figure 6.1 (a) An ER diagram
A shorthand form may be more convenient for drawing the ERD. In this shorthand form, rectangles, diamonds and ovals are replaced by squared brackets, acute brackets and parenthesis, respectively. (Each weak entity set will be represented using double squared brackets, like [[child]].) We will refer this form of ERD as its linear representation. This form is particularly suitable for small, non-sophisticated ERDs. In the remaining part of this book we will stay with this form.
Figure 6.1 (b) An ER diagram (in linear notation)
Although the ER approach seems to be simple, there are a lot of design issues to be considered. For example, we should decide which "things" should be treated as entity sets. Other issues should be considered including which attributes should go to the "student" entity set, which should go to the relationship set "takes," etc.
si
(s-id) (s-name) (major) (date) (c-id, dept, section) [student] <take> → [course]
Student Course
Take
sid s-name
major date c-id dept
6.2.3 MAJOR CONSTRUCTS
We now discuss the major constructs involved in ER modeling in more detail. An entity set (i.e., a strong entity set) is a set of entities of the same type that share the same properties (or attributes). An entity is a thing or object in the real world that is distinguishable from all other things. Entity sets do not need to be disjoint. An entity is represented by a set of attributes.
A relationship set is a set of relationships of the same type. A relationship is an association among several entities. The degree of a relationship set is determined by the number of entity sets associated. Typically we have binary relationship sets, but we may also have n-ary relationship sets. A relationship set may have its own attributes (just like an entity set). An important aspect of a relationship set is the mapping constraint (namely, cardinalities):
• one to one (1:1) ←→. For example, a student can only take one course and each course can only have one student. (Of course, this restriction is not realistic.)
• one to many (1:N) ← For example, a student can take many courses, while each course can have only one student. (Again, not realistic in our current example.)
• many to one (N:1) →. For example, many students can take one course, and each course can have many students.
• many to many (M:N) . For example, each student can take one or more courses and each course may have more than one student. This is a reasonable assumption in our example.
The function that an entity plays in a relationship is called that entity's role. For example, a graduate student can play the role of either a student as well as an instructor.
Both entity sets and relationship sets are described by attributes. There are different kinds of attributes, such as simple versus composite attributes, single-valued versus multivalued attributes, null attributes and derived attributes (such as age being derived by data of birth and today's date). The collection of attributes is referred to as the schema (as we already discussed earlier in this chapter).
6.2.4 SOME IMPORTANT CONCEPTS
In relational databases, we have already learned concepts related to keys, such as candidate key or primary key. These concepts can be extended to entity-relationship models.
• Keys for entity sets: There are several different types of keys for entity sets (collection of attributes). They are different to search keys at file structures level, and are also different to each other.
♦ Superkey: It uniquely identifies an entity in the entity set.
♦ Candidate key: It is the minimal super key (attributes used as candidate key are usually underlined).
♦ Foreign key: It is a set of attributes which form the primary key of another relation.
• Keys for relationship sets: Primary keys for relationship sets are formed from primary keys of associated entity sets.
• Existence dependency: If the existence of entity x depends on the existence of entity y, then:
♦ x is existence dependent on y;
♦ Entity y is a dominant entity,
♦ Entity x is a subordinate entity.
• Total versus partial participation of entity set E in relationship set R:
♦ Total participation: every entity in E participates in at least one relationship in R (closely related to existence dependency).
♦ Partial participation: only some entities in E participates in relationships in R.
• Weak entity sets: An entity set which does not have sufficient attributes to form a primary key. For example, the employees' dependents in a company database are reasonably treated as weak entities, because the dependents' information is included only because their existence would affect the benefit of the employees. The discriminator of a weak entity set is a set of attributes that distinguish among all the entities in the weak entity set. The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set's discriminator. A portion of an ERD involving a weak entity set "dependent" is depicted in Figure 6.2, where "dependent" is a weak entity set. The discriminator of this weak entity set is "d-first-name," and the primary key of the set "dependent" is the combination of e-id and d-first-name.
Figure 6.2 An ERD with a weak entity set
6.2.5 DESIGN ISSUES IN ER MODELING
The following is the general process of developing an ERD:
• Obtain data requirements;
• Entity sets designation;
• Relationship sets designation (refinement of entity sets designation) Designing an ER diagram could be tricky. The following are some issues that need to be considered.
• Use an attibute or an entity set to represent an object?
(e-id, e-first-name, e-last-name) (p-id, dept-name) [employee] --- <participate> --- [project] || (start-time)
<<has>> || [[dependent]] (d-first-name)
• Use an entity set or a relationship set to represent a real world concept? (Note: The notions of an entity set and a relationship set are not precise.)
• Use binary relationship set or n-ary relationship set?
• Use a strong or weak entity set?
• Use extended ER features?
Readers are referred to [Elmasri and Navathe 1994] for a more detailed discussion on ER modeling, including a description of the Enhanced-ER model.
6.2.6 MAPPING ER DIAGRAMS INTO RELATIONS
ERDs can be converted to a form closely related to predicate logic, which is a relation. The general steps needed for converting an ERD to tabular format are stated below (an example can be found in Section 6.2.8). After the ERD is converted to the table format, relational database techniques (as described in previous chapters) can then be used.
• For strong entity set E:
We represent E by a table with distinct columns; each column corresponds to one of the attributes of E. Each row corresponds to an entity of the entity set.
• For weak entity set A owned by strong entity set B:
We represent it by a table with distinct columns; each column corresponds to one of the attributes of A or attributes of the primary key of B.
• For relationship set R (R does not link a weak entity set to its owner strong entity set):
We represent it by a table with distinct columns; each column corresponds to one of the attributes in primary keys of associated entity sets or R's own descriptive attributes.
• For many-to-one relationship:
For a N:1 relationship set R from entity set A to entity set B, if there is an existence dependency of A on B, combine the tables A and R.
• For multivalued attribute M:
We create a table T with a column C that corresponds to M and columns corresponding to the primary key of the entity set or relationship set of which M is an attribute.
6.2.7 KEYS IN CONVERTED TABLES
Important to relational database design is the concept of key. We have already discussed various issues related to keys in Chapter 4. When the ER approach is used, primary keys and foreign keys in the converted tables can be determined from the primary keys in the corresponding constructs (entity sets or relationship sets) in the ERD. Examples for the following definitions can be found in Section 6.2.8.
• Entity relation: The primary key of the entity set in ERD becomes the primary key of the entity relation.
• Relationship relation: The union of the primary keys of the related entity sets becomes a superkey of the relation. (Note here union refers to put together all the attributes in both primary keys.)
Foreign key: An attribute in a relation is a primary key of another relation.
6.2.8 AN EXAMPLE: A BANKING ENTERPRISE
6.2.8.1 Data requirements
We use an example to illustrate ER modeling and its conversion to table format. The example used here is a banking enterprise, which is similar to the one discussed in some other books (such as the one in [Silberschatz, Korth and Sudarshan 1997]). The most important feature of our treatment lies in the common treatment of knowledge and data. Since a relationship resembles a predicate, it would be beneficial to use a verb or a noun or an adjective as the name of a relationship. Also we will use the linear format to represent an ERD.
The data requirements for a banking enterprise are described below. The bank is organized into branches. Each branch is located in a particular city, and identified by a unique name. The assets of each branch are recorded. The bank offers accounts to customers. For each customer, his or her name, city and street information is recorded. Accounts can be held by more than one customer, and a customer can have more than one account. Each account is assigned a unique account number. The balance of each account is maintained. The bank also provides loans to customers. For each loan, a unique loan number is assigned and the amount of loan is maintained.
6.2.8.2 ER Diagram for banking enterprise
The ER diagram can be constructed as shown in Figure 6.3, with primary keys in entity sets underlined.
Figure 6.3 The banking ER diagram
6.2.8.3 Converting to tables
Applying the conversion method described in the previous section, we obtain the following tables (each represents a relational schema). Note that while each entity set is converted to a table, only those relationship sets which do not involve a 1 to n mapping are converted into tables (otherwise they are
(a-no, bal) (b-name, b-city, assets) [account] <deposit> [branch]
| ↑ <own> <approve> [customer] <borrow> [loan] (c-name, c-addr, c-city) (l-no, amount)
"absorbed" into one of the associated entity tables). Primary keys are underscored.
Account relation: a-no, balance, b-name Branch relation: b-name, b-city, assets Borrow relation: c-name, l-no
Customer relation: c-name, c-addr, c-city Loan relation: l-no, amount, b-name Owns relation: c-name, a-no
Note that relationships "approve" and "deposit" are not converted to tables. Also note that b-name in both Account relation and Loan relation is a foreign key, because it is a primary key in Branch relation.
6.2.9 EXTENDED ER FEATURES AND RELATIONSHIP WITH OBJECT-ORIENTED MODELING
There are some well-known problems of ER modeling which have been used to lead to more advanced modeling techniques, including extended ER models and object-oriented models:
• Specialization: For example, in a university database, both instructors and students are specialization of "persons." Semantically, these two entity sets should have a closer relationship than relationship with other entity sets. The original ER modeling technique does not reflect this.
• Generalization: For instance, in the above example, "persons" is the entity set generalized from instructors and students. Common attributes such as Social Security Number, first name and last name, can be stored in the persons entity set. However, the original ER modeling technique does not support this.
• Aggregation. Another limitation of the ER model is that it is not possible to express relationships among relationships. For example, in a banking database, a customer may be both a depositor and a borrower, but according to the standard ER modeling technique, we have to treat "deposit" and "loan" as two separate activities.
Efforts have been made to enrich the ER model. However, most extensions are much less well known than the original ER modeling approach. In addition, these extensions have been overshadowed by object-oriented data modeling approaches (which have become increasingly popular). Loosely speaking, an object corresponds to an entity in the ER model. The object- oriented paradigm is based on encapsulating data and code related to an object into a single unit. Conceptually, an object communicates with the rest of the system by sending messages to invoke various methods. Because of this feature, object-oriented data modeling has a behavior part (in addition to the conventional structure part), with inheritance a strength of this kind of modeling. We will take a look at the issue of inheritance in Section 6.6 when we discuss frame systems.
One criticism of the ER approach is its lack of the behavior part (which is supported by object-oriented modeling approaches). Nevertheless, the
simplicity of ER modeling makes it continuously a favorable tool in many applications.