• No results found

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

N/A
N/A
Protected

Academic year: 2021

Share "Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Topics for this week:

1. Good Design

2. Functional Dependencies 3. Normalization

Readings for this week:

1. E&N, Ch. 10.1-10.6; 12.2 2. Quickstart, Ch. 3

3. Complete the tutorial at http://sqlcourse2.com/

In-class Activity:

1. Review components of DBMS, with an eye to viewing the whole process as systematic activities:

conceptual, logical, and physical design.

2. Demonstrate by creating a small conceptual and logical model then translating this into part of the physical model - creating the database and tables using MySQL’s terminal window. is demonstrates, too, the DDL.

3. Demonstrate a few commands issued against the data (DML).

4. Review some of your ER questions.

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations No doubt, you’ve learned already some of the concepts & terminology of relational database modeling.

To get up to speed with what you’ve covered, here are some quick notes that may help us harmonize our perspectives.

Components of a DBMS:

Many roles and activities are involved; not all of which you may have encountered. Here’s a view of the components of a DBMS. Notice the different contributions of programmers, users, and the DBA - the database administrator. Note, too, the functions that constitute a DBMS (and where the DDL and DML fit in). Ultimately the commands and functions must be communicated to the computer system itself - via the file manager, various other access methods and buffers before reaching the actual data

stored in the relationship databases and tables.

Query Processor: transforms queries into a series of low-level instructions directed to the database manager.

Database Manager (DM): Interacts with the user- submitted application programs and queries. e DM accepts queries and examines the external and conceptual schema to determine what conceptual records are required to satisfy the request. e DN places a call to the File Manger to perform the request.

File Manager: manipulates the underlying storage files and manages the allocation of storage space on the disk. Actual physical manipulation of the data is passed to the appropriate access method.

DML preprocessor: converts DML statements

embedded in an application program into a

(2)

LIS458 - Database Management - Week 11 (4/6/09)

2 standard function call in the host language (e.g., MySQL).

DDL compiler: compiler converts DDL statements into a set of tables containing meta-data. ese tables are then stored in the catalog while control info is stored in the data files headers.

Catalog manager: [not the library kind!] - manages access to and maintains the system catalogue; the system catalog is accessed by most of the DBMS components.

ere are other important functions in SQL soware Terms:

Relation: a relation is a table with columns & rows Attribute: an attribute is a named column of a relation

Domain is the set of allowable values for one or more attributes.

e idea of domain is important because it allows the user to define in a central place the meaning and source of values that attributes can hold. [is leads to the data dictionary; example below.]

Tuple: a tuple is a row of a relation.

Degree: the degree of a relation is the number of attributes it contains.

Cardinality: the cardinality of a relation is the number of tuples (rows) it contains. As rows are added or deleted, cardinality changes.

Relational database is a collection or normalized relations.

Example of domains for some branches and staff relations:

Attribute Bno Street Area

Domain Name BRANCH_NO STREET_NAME AREA_NAME

Meaning

The set of all possible branch numbers

Set of all streets in Boston Set of all local area names

Domain definition

characters: size 3, range B1- B99

character: size 25 character: size 2

(3)

LIS458 - Database Management - Week 11 (4/6/09)

3

Alternative Names:

Relation Table File

Tuple Row Record

Attribute Column Field

Now consider the various mathematical combinations from your other readings. ese become important when we consider joins, unions, and other kinds of set operations.

Entity-Relationship Modeling

ree main items: entity types, relationship types, and attributes.

Entity type: an object or concept that is identified by the organization as having an independent existence.

Entity: an instance of an entity type that is uniquely identifiable.

Weak entity type: an entity type that is existence-dependent on some other entity type.

Strong entity type: an entity type that is not existence-dependent on some other entity type.

Example: e Staff and Branch items above are strong entities; if there were data about a staff member’s next-of-kin, that entity is weak: it exists only if a staff member has a next-of-kin.

Attributes are a property of an entity or a relationship type. e Attribute domain is a set of values that may be assigned to that attribute. E.g., Staff->Lname should include only the last names of employees.

Some attributes are simple, others composite. A single attribute is composed of a single component with an independent existence. E.g., cannot be further subdivided (e.g., sex, salary; aka “atomic attributes”).

Composite attributes consist of multiple components; e.g., e “Address attribute” could be divided into multiple attributes each with their own existence (e.g., street, city, state, zip).

Derived attributes are values that are derived from the value of a related attribute or set of attributes, not necessarily in the same entity. For example, paycheck could be derived from hourly_wage *

hours_worked.

Keys

Candidate keys are attributes or sets of attributes that uniquely identify individual occurrences of an entity type. For example, branch number (Bno) is a candidate key for the Branch entity type and has a distinct value for each branch entity. Must be unique for each occurrence (like unique IDs). Primary key is that candidate key selected because of its current and future uniqueness. Composite key is a key

consisting of two or more attributes.

Diagrammatic Representations

Here’s an example of Staff, Branch, and Next_of_Kin entities and attributes represented graphically followed by the composition of keys:

Staff entity composition:

Staff (Staff_No, FName, LName, Address, Tel_No, Sex, DOB, Position, Salary) Primary Key Staff_no

Alternate Key FName, LName, DOB

Composite Attribute Name (FName, LName) Derived Attribute Total_Staff

City Zip and so on...

CITY_NAMES ZIP_CODE

Names of all towns in Boston

Set of zips in Boston

character: size 20 character: size 12

(4)

LIS458 - Database Management - Week 11 (4/6/09)

3

LIS458 - Database Management - Week 11 (4/6/09)

4 Branch entity type composition:

Branch (Branch_No, Street, Area, City, Zip, Tel_No, Fax_No) Primary Key Branch_No

Alternate Key Fax_No

Composite Attribute Address (Street, Area, City, Zip) Multi-valued Attribute Tel_No

Next_of_Kin entity type composition:

Next_of_Kin (NName, Address, Tel_No, Relationship)

Figure: Diagrammatic representation compare this chart to the above compositions.

Other kinds of relationships: Binary, Ternary, Quaternary... (ternary, quaternary are examples of n-ary)

(5)

LIS458 - Database Management - Week 11 (4/6/09)

5

In addition, there’s the issue of recursive relationships and views:

Cardinality constraints: Cardinality is the number of possible relationships for each participating entity. e most common degree for relationships is binary and the cardinality ratio for binary

relationships are one-to-one (1:1), one-to-many (1:M), and many-to-many (M:N).

If the business rule permits it, it is possible to have M:N but with an intervening relationship, e.g., Newspaper [M] ---<advertises> --- [N] Rentals Constraints: Sometimes you can indicate when the relationship participation is mandator and when optional. In addition, you can display the minimum and maximum values (e.g., 5, N) means minimum of 5 instances, no maximum).

Problems with ER Models

ese problems are called connection traps and occur due to a misinterpretation of the meaning of certain relationships. Two main types: fan traps and chasm traps. In general, to identify connection traps, you must ensure that the meaning of a relationship is fully understood and clearly defined. If not, then the representation doesn’t reflect the real world issue.

Fan trap: is where a model represents a relationship between entity types, but the pathway between

certain entity occurrences is ambiguous. Usually where two or more 1:M relationships fan out from the

(6)

LIS458 - Database Management - Week 11 (4/6/09)

5

LIS458 - Database Management - Week 11 (4/6/09)

6 sane entity. Chasm map is where a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences. [ere are many more traps; there is also the Enhanced Entity Relationship Model (EER).

Normalization:

to produce a set of relations with desirable properties, given the data requirements of the organization. A main goal is to remove redundancies (although later you’ll see there’s a time for including redundant data). ere are also many forms of normalization - we’ll look at only 1st, 2nd, and 3rd.

We want to achieve several goals: (1) to be able to identify uniquely through single or composite keys every datum in the table or sets of tables. (2) e purpose is to avoid anomalies: insertion (when data is added but not in all the right places, e.g., adding new staff but not to the branch where they work);

deletion anomalies (same idea but in reverse); modification anomalies (like adding money to your checking account using the ATM but the new balance doesn’t reflect the new cash!).

Part of the technique of normalization is understanding how data might depend on each others.

Functional dependencies: is is a relationship between attributes in a relation. For example, if A and B are attributes of a relation R, B is functionally dependent upon A (denoted A → B), if each value of A is associated with exactly one value of B. E.g., Staff_No -- [Position is functionally dependent on Staff_No]

--> Position. For example: Staff number S100 ---> Manager

Fundational dependency is a property of the meaning or semantics of the attributes in a relation. e semantics indicate how attributes relate to one another, and specify the functional dependencies between them. when a functional dependency is present, the dependency is specified as a constraint between them. For example: Let’s say “B” is dependent upon “A”: if we know the value of A and examine the relationship of the dependency, we find only one value of B in all the rows that have a given value of A.

Let’s say every staff member has a staff ID # and a job position [S100, Librarian]. at would be represented as StaffNo --> Position. If we know the StaffNo (the data in A), we know the job position (Librarian, the data in B). Conversely we cannot know the data in B (job title = Librarian) unless we know A (StaffNo=S100). [Position attribute is functionally dependent upon Staff_No.]

If a person has two job IDs (say login as the department manager or login as the librarian), then then Staff_No is not functionally dependent on Position. [In other words, “Jane” is the manager and the librarian. She has two ID #s (M5 for manager, S100 for librarian). While knowing staff no. gives us Jane’s position, there are many people who are also managers and many who are librarians. Cannot identify uniquely the position. e relationship between Staff_No and Position is 1:1; for each staff number there is only one position. On the other hand, the relationship between Position and Staff_No is 1:M; there are several staff numbers (members of staff) associated with a position.

ere is an example of “functional dependencies of the Staff_Branch relation” [e ER diagram 2 pages back].

Staff_No --> SName, SAddress, Position, Salary, Branch_no, BAddress, Tel_no Branch_No --> B_Address, Tel_No

BAddress -> Branch_No, Tel_No Tel_No --> Branch_No, BAddress

Now, to create candidate key(s) for this relation, what attribute or group of attributes uniquely identify each row?

Process of normalization: first analyze the relations based on their primary key (or candidate keys) and

functional dependencies.

(7)

LIS458 - Database Management - Week 11 (4/6/09)

7 1st Normal Form (1NF): e intersection of each row and column contains one and only one value.

2nd Normal Form (2NF): A relation that is in first normal form and every non-primary key attribute is fully functionally dependent on the primary key. [Full functional dependency indicates that if A and B are attributes or a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A]

3rd NF: A relation that is in first and second normal form and in which no non-primary-key attribute is transitively dependent on the primary key. [Transitive dependence is a condition where A, B, and C are attributes of a relation such that if A→B and B→C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).]

Methodology - Conceptual Database Design.

e logic to creating databases has 3 major components: conceptual database design (to build the conceptual representations of the DB, including identification of important entity and relationships);

logical database design (to translate the conceptual representation to the logical structure of the database, which includes designing the relations); and Physical database design to allow the designer to decide how the logical structure is to be physically implemented (as tables) on the target database management system.

Factors for success in database design:

• Work interactively with users as much as possible (“Joint Application Design” - JAD)

• Follow a structured methodology throughout the data modeling process

• Employ a data-driven approach

• incorporate structural and integrity considerations into the data model

• Combine conceptual, normalization, and transaction validation techniques into the data modeling methods

• Use diagrams to represent as much of the data models as possible

• Use a database design language (DBDL) to represent additional data semantics

• Build a data dictionary to supplement the data model diagrams

• Be willing to repeat these steps!

Overview of the Database Design Methodology Conceptual DB Design

Step 1 Build local conceptual model for each user view 1.1. Identify entity types

1.2. Identify relationship types

1.3. Identify and associate attributes with entity or relationship types 1.4. Determine attribute domains

1.5. Determine candidate and primary key attributes 1.6. Specialize/generalize entity types (optional step) 1.7. Draw ER diagram

1.8. Review local conceptual data model with the user Logical database design for the relationship model

Step 2: Build and validate local logical data model for each user view 1.1. Map local conceptual data model to local logical data model 1.2. Derive relations from local logical data model

1.3. Validate model using normalization

1.4. Validate model against user transactions

(8)

LIS458 - Database Management - Week 11 (4/6/09)

7

LIS458 - Database Management - Week 11 (4/6/09)

8 1.5. Draw ER diagrams

1.6. Define integrity constraints

1.7. Review local logical data model with user Step 3: Build and validate global logical data model

1.1. Merge local logical data models into a global model 1.2. Validate global logical data model

1.3. Check for future growth 1.4. Draw final ER diagram 1.5. Review with users Physical Database Design

Step 4: Translate global local data model for target DBMS 1.1 Design base relations for target DBMS

1.2 Design enterprise constraints for the target DBMS Step 5: Design physical representation

1.1 Analyze transactions 1.2 Choose file organizations 1.3 Choose secondary indices

1.4 Consider the introduction of controlled redundancy 1.5 Estimate disk space requirements

Step 6: Design security mechanisms 1.1 Design user views

1.2 Design access rules

Step 7: Monitor and tune the operational system

* ** * ** *

Focusing on the local logical data model (Step 2 above): e goal is to refine the conceptual model to remove undesirable features.

1. Remove M:N relationships: if a M:N is found, try to decompose this relationship to identify an intermediate entry. e M:N relationship becomes two 1:M relationships to the newly identified entity. For example, consider this M:N “(Many) newspapers <advertise> (many) rentals”. e

<advertise> can be decomposed we identify the “Advert” entity and two new 1:M relationships

(9)

LIS458 - Database Management - Week 11 (4/6/09)

9 (“Lists” and “AdvertisesIn” Advert (see image):

2. Remove complex relationships

3. Remove recursive relationships

4. Remove relationships with attributes

5. Remove multi-valued relationships

6. Re-examine 1:1 relationships

7. Remove redundant relationships.

References

Related documents

However, such local opinion and the popular discourse obscure the fact that while the globally domi- nant narrative of the Maasai Olympics has been appropriated by the outsiders, the

Implementation design is concerned with mapping the conceptual data model (which may be expressed as objects, E-R diagrams, 3NI relations, and so on) into logical database

In addition to the lineshape widths having a noticeable difference for various heat pipe locations, the amplitudes show that the high number density, in

Introduction to database systems, entity-relationship data model, relational model, network model, hierarchical model, formal query languages, commercial query languages,

Uses evidence-based pharmacologic treatments for substance use, addictive ,and co-occurring disorders, including monitoring of patient response and appropriate adjustment

Both of the following games were lost on time; However, the opening moves and middlegame may be of use to those interested in this opening.. After having read a few of the UON, I

09.30—11.00 Visit the city of Volendam and picture of the Summer Tour Group in traditional clothing.. SATERDAY 20 AUGUST SATERDAY 20 AUGUST SATERDAY 20 AUGUST SATERDAY

Outline 1: Introduction 2: Data Climate data: Observation &amp; ERA-40 &amp; CCLM model Wheat phenological data &amp; GDD 3: Statistical methods Correlation &amp; linear