SOLUTIONS TO REVIEW QUESTIONS
2
Solutions to Review Questions and Exercises
Part One Background 4
Chapter 1 Introduction to Databases... 4
Chapter 2 Database Environment ... 6
Chapter 3 The Relational Model... 8
Chapter 4 Database Planning, Design, and Administration ... 12
Part Two Methodology 14 Chapter 5 Entity-Relationship Modeling... 14
Chapter 6 Normalization... 17
Chapter 7 Methodology - Conceptual Database Design ... 22
Chapter 8 Methodology - Logical Database Design for Relational Model... 23
Chapter 9 Methodology - Physical Database Design for Relational DBMSs ... 25
Chapter 10 Conceptual Database Design Methodology - Worked Example ... 27
Chapter 11 Logical Database Design Methodology - Worked Example ... 29
Chapter 12 Physical Database Design Methodology – Worked Example ... 32
Part Three Database Languages 33 Chapter 13 SQL ... 33
Chapter 14 Advanced SQL... 39
Chapter 15 QBE... 46
Part Four Selected Database Issues 47 Chapter 16 Security... 47
Chapter 17 Transaction Management... 49
Chapter 18 Query Processing ... 54
Part Five Current Trends 62 Chapter 19 Distributed DBMSs - Concepts and Design ... 62
Chapter 20 Distributed DBMSs - Advanced Concepts ... 67
Chapter 21 Introduction to Object DBMSs ... 73
Chapter 23 Object-Relational DBMSs ... 84
Part Six Future Trends 89
Chapter 24 Web Technology and DBMSs ... 89 Chapter 25 Data Warehousing... 91 Chapter 26 OLAP and Data Mining... 93
4
Part One Background Chapter 1 Introduction to Databases
Review Questions
1.1 List four examples of database systems other than those listed in Section 1.1.
Some examples could be:
• A system that maintains component part details for a car manufacturer;
• An advertising company keeping details of all clients and adverts placed with them;
• A training company keeping course information and participants' details;
• An organization maintaining all sales order information.
1.2 Discuss each of the following terms:
Data For end users, this constitutes all the different values connected with the various objects/entities that are of concern to them. (See also Section 1.3.3)
Database (See Section 1.3.1)
Database Management System (See Section 1.3.2)
Data Independence This is essentially the separation of underlying file structures from the
programs that operate on them, also called program-data independence. (See also Sections 1.2.2 and 1.3.1)
Security The protection of the database from unauthorized users, which may involve
passwords and access restrictions. (See also Section 1.6)
Integrity The maintenance of the validity and consistency of the database by use of
particular constraints that are applied to the data. (See also Section 1.6)
Views These present only a subset of the database that is of particular interest to a
user. Views can be customized, for example, field names may change, and they also provide a level of security preventing users from seeing certain data. (See also Section 1.3.2)
1.3 Describe the approach taken to the handling of data in the early file-based systems. Discuss the
disadvantages of this approach.
Focus was on applications for which programs would be written, and all the data required would be stored in a file or files owned by the programs. (See also Section 1.2).
Clearly, each program was responsible for only its own data, which could be repeated in other program’s data files. Different programs could be written in different languages, and would not be able to access another program’s files. This would be true even for those programs written in the same language, because a program needs to know the file structure before it can access it. (See also Section 1.2.2).
1.4 Describe the main characteristics of the database approach and contrast it with the file-based
approach.
Focus is now on the data first, and then the applications. The structure of the data is now kept separate from the programs that operate on the data. This is held in the system catalog or data dictionary. Programs can now share data, which is no longer fragmented. There is also a reduction in redundancy, and achievement of program-data independence. (See also Section 1.3)
1.5 Describe the five components of the DBMS environment and discuss how they relate to each
other.
See Section 1.3.3.
Data Administrator See Section 1.4.1
Database Administrator See Section 1.4.1
Logical Database Designer See Section 1.4.2
Physical Database Designer See Section 1.4.2
Application Programmer See Section 1.4.3
End Users See Section 1.4.4
1.7 Discuss the advantages and disadvantages of database processing.
See Section 1.6 Exercises
1.8 Interview some users of database systems. Which DBMS facilities do they find most useful and
why? Which DBMS facilities do they find least useful and why? What do these users perceive to be the advantages and disadvantages of the DBMS?
Select a variety of users for a particular DBMS. If the users are using different DBMSs, group the answers for the different systems, which will give an overall picture of specific systems.
1.9 Write a small program that allows entry and display of renter details including a renter number,
name, address, telephone number, preferred number of rooms and maximum rent. The details should be stored in a file. Enter a few records and display the details. Now repeat this process but rather than writing a special program, use any DBMS that you have access to. What can you conclude from these two approaches?
The program can be written in any appropriate programming language, such as Pascal, FORTRAN, C. It should adhere to basic software engineering principles including being well-structured, modular, and suitably commented. It is important to appreciate the process involved even in developing a small program such as this. The DBMS facilities to structure, store, and retrieve data are used to the same effect. The differences in the approaches, such as the effort involved, potential for extension, ability to share the data should be noted.
1.10 Study the DreamHome case study presented in Section 1.7. In what ways would a DBMS help this
organization? What data can you identify that needs to be represented in the database? What relationships exist between the data? What queries do you think are required?
It may be useful to review the file-based approach and the database approach here before tackling the first part of the exercise. Careful reading and thinking about how people might use the applications should help in carrying out the rest of the exercise.
1.11 Study the Wellmeadows Hospital case study presented in Appendix A. In what ways would a
DBMS help this organization? What data can you identify that needs to be represented in the database? What relationships exist between the data?
6
Chapter 2 Database Environment
Review Questions
2.1 Discuss the concept of data independence and explain its importance in a database environment.
See Section 2.1.5
2.2 To address the issue of data independence, the ANSI-SPARC 3-level architecture was proposed.
Compare and contrast the 3 levels of this model.
See Section 2.1
2.3 What is a data model? Discuss the main types of data models.
An integrated collection of concepts for describing data, relationships between data and constraints on the data in an organization. (See also Section 2.3).
Object-based data models such as the entity-relationship model (see Section 2.3.1). Record-based data models such as the relational data model, network data model, and hierarchical data model (see Section 2.3.2).
2.4 Discuss the function and importance of conceptual modeling.
See Section 2.3.4.
2.5 Describe the types of facilities you would expect to be provided in a multi-user Database
Management System.
Data Storage, Retrieval and Update Authorization Services
A User-Accessible Catalog Support for Data Communication
Transaction Support Integrity Services
Concurrency Control Services Services to Promote Data Independence
Recovery Services Utility Services
See also Section 2.4
2.6 Of the facilities in 2.4, which ones do you think would not be needed in a standalone PC
Database Management System? Provide justification for your answer.
Concurrency Control Services - only single user.
Authorization Services - only single user, but may be needed if different individuals are to use the DBMS at different times.
Utility Services - limited in scope.
Support for Data Communication - only standalone system.
2.7 Describe the main components in a DBMS and suggest which components are responsible for
each facility identified in question 2.5.
Query Processor, DML Preprocessor, Data Storage, Retrieval and
Query Optimizer, Data Manager Update
Dictionary Manager A User-Accessible Catalog
Data Manager Transaction Support
Scheduler Concurrency Control Services
Utilities Recovery Services
Authorization Control Authorization Services
Utilities Support for Data Communication
Integrity Checker Integrity Services
Database Manager, DDL Compiler, File Manager
Utility Services See also Sections 2.5 and 2.4.
2.8 What is meant by the term "client-server architecture" and what are the advantages of this
approach? Compare the client-server architecture with two other architectures.
The client is a process that requires some resource, and the server provides the resource. Neither need reside on the same machine. Advantages include:
• Better performance
• Likely reduction in hardware costs
• Reduction in communication costs
• Better consistency See also Section 2.6.
2.9 Discuss the function and importance of the data dictionary.
See Section 2.7 Exercises
2.10 Analyze the DBMSs that you are currently using. Determine each system’s compliance with the
functions that we would expect to be provided by a DBMS. What types of languages does each system provide? What type of architecture does each DBMS use? Check the accessibility and extensibility of the data dictionary. Is it possible to export the data dictionary to another system?
To do this you will need to obtain appropriate information about each system. There should be manuals available or possibly someone in charge of each system who could supply information.
2.11 Write a program that stores names and telephone numbers in a database. Write another program
that stores names and addresses in a database. Modify the programs to use external, conceptual, and internal schemas. What are the advantages and disadvantages of this modification?
The programs can be written in any suitable language and should be well structured and appropriately commented. Two distinct files result. The structures can be combined into one containing name, address, and tel_no, which can be the representation of both the internal and conceptual schemas. The conceptual schema should be created separately with a routine to map the conceptual to the internal schema. The two external schemas also must be created separately with routines to map the data between the external and the conceptual schema. The two programs should then use the appropriate external schema and routines.
2.12 Write a program that stores names and dates of birth in a database. Extend the program so that it
stores the format of the data in the database; in other words, create a data dictionary. Provide an interface that makes this data dictionary accessible to external users.
Again, the program can be written in any suitable language. It should then be modified to add the data format to the original file. This should not be difficult, if the original program is well structured. The interface for other users operates on the data dictionary and is separate from the original program. A menu-based interface is adequate.
2.13 How would you modify this program to conform to a client-server architecture? What would be
the advantages and disadvantages of this modification?
The server should hold the data dictionary and the programs that operate on it. The user interface should be separate, on the client, and call the data dictionary programs.
8
Chapter 3 The Relational Model
Review Questions
3.1 Discuss each of the following concepts in the context of the relational data model:
(a) Relation
(b) Attribute
(c) Tuple
(d) Intension and Extension
(e) Degree and Cardinality.
Each term defined in Section 3.2.1.
3.2 Discuss the differences between the candidate keys and the primary key of a relation. Explain
what is meant by a foreign key. How do foreign keys of relations relate to candidate keys?
The primary key is the candidate key that is selected to identify tuples uniquely within a relation. A foreign key is an attribute or set of attributes within one relation that matches the candidate key of some (possibly the same) relation.
3.3 Define the two principal integrity rules for the relational model. Discuss why it is desirable to
enforce these rules.
Two rules are Entity Integrity (Section 3.3.2) and Referential Integrity (Section 3.3.3).
3.4 Define the five basic relational algebra operations. Define the remaining three relational algebra
operations in terms of the five basic operations.
Five basic operations are:
• Selection and Projection (Unary)
• Cartesian Product, Union and Set Difference (Binary). There is also the Join, Intersection and Division operations:
• Can rewrite θ-Join in terms of the basic selection and Cartesian product operations:
R F S = σF(R × S)
• Can express the intersection operator in terms of the set difference operation: R ∩ S = R - (R - S)
• Can express the division operator in terms of the basic operations: T1 = ΠC(R)
T2 = ΠC( (S x T1) - R) T = T1 - T2
3.5 What is a view? Discuss the difference between a view and a base relation. Explain what happens
when a user accesses a database through a view.
View is the dynamic result of one or more relational operations operating on the base relations to produce another relation. Base relation exists as a set of data in the database. A view does not contain any data, rather a view is defined as a query on one or more base relations and a query on the view is translated into a query on the associated base relations.
Exercises
The following tables form part of a database held in a relational DBMS:-Hotel (Hotel_No, Name, Address)
Room (Room_No, Hotel_No, Type, Price)
Booking (Hotel_No, Guest_No, Date_From, Date_To, Room_No) Guest (Guest_No, Name, Address)
where Hotel contains hotel details and Hotel_No is the primary key
Room contains room details for each hotel and (Hotel_No, Room_No) forms the primary key Booking contains details of the bookings and the primary key comprises (Hotel_No, Guest_No,
and Date_From)
and Guest contains guest details and Guest_No is the primary key.
3.6 Generate the relational algebra for the following queries:
(a) List all hotels.
HOTEL
(b) List all single rooms with a price below £20 per night.
σtype='S' AND price < 20(ROOM)
(c) List the names and addresses of all guests.
Πname, address(GUEST)
(d) List the price and type of all rooms at the Grosvenor Hotel.
Πprice, type(ROOM hotel_no (σname='Grosvenor Hotel'(HOTEL)))
(e) List all guests currently staying at the Grosvenor Hotel.
GUEST guest_no (σdate_from <= '01-01-99' AND date_to >= '01-01-99' ( BOOKING hotel_no (σname='Grosvenor Hotel'(HOTEL)))) (substitute '01-01-99' for today’s date).
(f) List the details of all rooms at the Grosvenor Hotel, including the name of the guest
staying in the room, if the room is occupied.
(ROOM hotel_no (σname='Grosvenor Hotel'(HOTEL)) // Outer Join
Πguest.name, hotel.hotel_no, room.room_no(
(GUEST guest_no (σdate_from <= '01-01-99' AND date_to >= '01-01-99' ( BOOKING hotel_no (σname='Grosvenor Hotel'(HOTEL)))) (substitute '01-01-99' for today’s date).
(g) List the guest details (Guest_No, Name and Address) of all guests staying at the
Grosvenor Hotel.
Πguest_no, name, address(GUEST guest_no (σdate_from <= '01-01-99' AND date_to >= '01-01-99' ( BOOKING hotel_no (σname='Grosvenor Hotel'(HOTEL)))))
10
3.7 Using relational algebra, create a view of all rooms in the Grosvenor Hotel, excluding price
details. What would be the advantages of this view?
Πroom_no, hotel_no, type(ROOM hotel_no (σname='Grosvenor Hotel'(HOTEL))) Security - hides the price details from people who should not see it.
Reduced complexity - a query against this view is simpler than a query against the two underlying base relations.
3.8 Produce the equivalent tuple and domain relational calculus statements for the above queries.
Tuple Relational Calculus
(a) RANGE OF H IS HOTEL {H}
(b) RANGE OF R IS ROOM
{R | R.Type = 'S' AND R.Price < 20} (c) RANGE OF G IS GUEST
{G.Name, G.Address} (d) RANGE OF H IS HOTEL
RANGE OF R IS ROOM
{R.Price, R.Type | ∃H (R.Hotel_No = H.Hotel_No AND H.Name = 'Grosvenor Hotel')}
(e) RANGE OF H IS HOTEL RANGE OF G IS GUEST RANGE OF B IS BOOKING
{G | B ((B.Date_From <= '01-01-99' AND
B.Date_To >= '01-01-99') AND (B.Guest_No = G.Guest_No) AND
∃H (B.Hotel_No = H.Hotel_No AND H.Name = 'Grosvenor Hotel'))}
(f) Need to use Union of a relation containing all Rooms that are occupied with a relation extended by a NULL name for all unoccupied rooms.
(g) RANGE OF H IS HOTEL RANGE OF G IS GUEST RANGE OF B IS BOOKING
{G.Guest_No, G.Name, G.Address | B((B.Date_From <= '01-01-99' AND B.Date_To >= '01-01-99') AND
(B.Guest_No = G.Guest_No) AND
∃H (B.Hotel_No = H.Hotel_No AND H.Name = 'Grosvenor Hotel'))} Domain Relational Calculus
(a) {Hotel_No, Name, Address | ∃Hotel_No, Name, Address (HOTEL(Hotel_No, Name, Address)}
(b) {Room_No, Hotel_No, Type, Price | ∃Room_No, Hotel_No, Type, Price (ROOM(Room_No, Hotel_No, Type, Price) AND
Type = 'S' AND Price < 20)}
(c) {Name, Address | ∃Name, Address (GUEST(Name, Address)} (d) {Price, Type | ∃Room_No, Type, Price, Hotel_No
(ROOM(Hotel_No, Type, Price) AND
∃Name (HOTEL(Hotel_No, Name) AND Name = 'Grosvenor Hotel')}
(e) {Guest_No, Guest.Name, Guest.Address | ∃Guest_No, Name, Address, (GUEST(Guest_No, Name, Address) AND
∃Date_From, Date_To, Hotel_No
(BOOKING(Hotel_No, Guest_No, Date_From, Date_To) AND Date_From <= '01-01-99' AND Date_To >= '01-01-99') AND
∃Name (HOTEL(Hotel_No, Name) AND Name = 'Grosvenor Hotel')))}
(f) Rest similar to above.
3.9 Explain how the entity and referential integrity rules apply to these relations.
For each relation, the primary key must not contain any nulls.
Room is related to Hotel through the attribute Hotel_No. Therefore, the Hotel_No in Room should either be null or contain the number of an existing hotel in the Hotel relation. In this case study, it would probably be unacceptable to have a Hotel_No in Room with a null value.
Booking is related to Hotel through the attribute Hotel_No. Therefore, the Hotel_No in Booking should either be null or contain the number of an existing hotel in the Hotel relation. However, because Hotel_No is also part of the primary key, a null value for this attribute would be unacceptable. Similarly for Guest_No. Booking is also related to Room through the attribute Room_No.
3.10 Analyze the RDBMSs that you are currently using. Determine the support the system provides for
primary keys, alternate keys, foreign keys, relational integrity and views. What types of relational languages does the system provide? For each of the languages provided, what are the equivalent operations for the eight relational algebra operations?
12
Chapter 4 Database Planning, Design, and Administration
Review Questions
4.1 Discuss the relationship between the information system lifecycle and the database application
lifecycle.
See Sections 4.1 and 4.2.
4.2 Describe the purpose of each of the stage of the database application lifecycle.
See Section 4.2.
4.3 Identify some of the techniques available to help document the users' requirements specification.
See Section 4.2.3.
4.4 Describe the main aims of the conceptual and logical database design phases.
The aims of conceptual database design are described in Section 4.3.2 and the aims of logical database design are described in Section 4.3.3.
4.5 Explain why it is necessary to select the target database management system before commencing
with the physical database design phase. Describe the main aims of physical database design.
Physical database design is tailored to a specific DBMS, therefore it is essential the DBMS is determined before the physical design phase can begin. The aims of physical database design are described in Section 4.3.4
4.6 Describe the prototype approach and identify the potential advantages of using this approach.
See Section 4.2.7.
4.7 Outline the procedure for selecting a DBMS.
See Section 4.6.
4.8 Define the purpose and tasks associated with data administration and database administration.
See Section 4.7. Exercises
4.10 Assume that you are responsible for selecting a new DBMS product for a group of users in your
organization. To undertake this exercise, you must first establish a set of requirements for the group and then identify a set of features that a DBMS product must provide to fulfill the requirements. Describe the process of evaluating and selecting the best DBMS product.
The student should follow the approach to DBMS selection described in Section 4.6 and produce a report that identifies a suitable DBMS product that meets the requirements of the organization. The selection should be fully justified.
4.11 Assume that you are responsible for selecting a DBMS product for the Wellmeadows Hospital
case study. Describe the process of evaluating and selecting the best DBMS product.
The student should follow the approach to DBMS selection described in Section 4.6 and produce a report that identifies a suitable DBMS product that meets the requirements of the
Wellmeadows Hospital case study. The selection should be fully justified and any assumptions
made about the case study should be highlighted..
4.12 Investigate whether data administration and database administration exists as distinct
functional areas within your organization. If identified, describe the organization, responsibilities, and tasks associated with each functional area.
The student should investigate the organization and identify whether data administration and database administration exists as distinct functional areas. However, the student should be careful to note that these functions may be named differently, merged as a single function or included as part of a larger IT/IS function. If identified, the student should compile a report documenting the organization, responsibilities, and tasks.
Doctor Clinic
NOK
Supply OPClinic Appointmt InPatient
Patient
Item Drug Medication
Staff Ward Bed
M 1 1 1 1 1 1 1 Supplier Requisitio M M M M M M M M M M M M M 1 M M M M M N M M 1 1 1 1 1 1 1 1 M Hospital Supplies Ward Management Patient Management d M N Attend Refers RelatedT Takes Form I Requires Has WorksIn WorksIn Is GoesT Examine Has Makes For AssignedTo Receives Provides 1
14
Part Two Methodology Chapter 5 Entity-Relationship Modeling
Review Questions
5.1 Describe the purpose of high-level data models in database design.
The main purpose for developing a high-level data model is to support a user’s perception of data and to conceal the more technical aspects associated with database design. Furthermore, a conceptual data model is independent of the particular DBMS and hardware platform that is used to implement the database.
5.2 Describe the basic concepts of the Entity-Relationship (ER) model. Present the diagrammatic
representation of these concepts.
The basic concepts of the Entity-Relationship model include entity types, relationship types, and attributes.
See Section 5.1
5.3 Describe the constraints that may be placed on participating entities in a relationship.
There are two main types of restrictions on relationships called cardinality and participation constraints.
See Section 5.2
5.4 Describe the problems that may occur when creating an ER model.
Problems may occur due to the misinterpretation of the meaning of certain relationships and these problems are referred to as connection traps. There are two main types of connection traps called fan traps and chasm traps.
See Section 5.3
5.6 Describe the main concepts associated with the Enhanced Entity-Relationship model. Present the
diagrammatic representation of these concepts.
The main concepts of the EER model are specialization/generalization, aggregation and categorization. However, in the accompanying textbook, we focus only on specialization/ generalization and categorization, which is associated with the related concepts of entity types described as superclasses or subclasses and the process of attribute inheritance.
See Section 5.4 Exercises
The University Accommodation Office Case Study
The Director of the University Accommodation Office requires you to design a database to assist with the administration of the office. The requirements collection and analysis phase of the database design process based on the Director’s view has provided the following requirements specification for the Accommodation Office database.
(1) The data stored on each full-time student includes the matriculation number, name (first and last name), home address (street, city/town, postcode), date of birth, sex, category of student (for example, first year undergraduate (1UG), postgraduate (PG)), nationality, smoker (yes or no), special needs, any additional comments, current status (placed/waiting), and what course the student is studying on. The student information stored relates to those currently renting a room
and those on the waiting list. Students may rent a room in a university owned hall of residence or student flat. When a student joins the University he or she is assigned to a member of staff who acts as his or her Advisor of Studies. The Advisor of Studies is responsible for monitoring the student’s welfare and academic progress. The data held on a student’s Advisor includes their full name, position, name of department, internal telephone number, and room number.
(2) Each hall of residence has a name, address, telephone number, and a hall manager who supervises the operation of the hall. The halls provide only single rooms, which have a room number, place number, and monthly rent rate. The place number uniquely identifies each room in all the halls controlled by the Accommodation Office and is used when renting a room to a student.
(3) The Accommodation Office also offers student flats. These flats are fully furnished and provide single room accommodation for groups of 3, 4, or 5 students. The information held on student flats includes a flat number, address, and the number of single bedrooms available in each flat. The flat number uniquely identifies each flat. Each bedroom in a flat has a monthly rent rate, a room number, and a place number. The place number uniquely identifies each room available in all student flats and is used when renting a room to a student.
(4) A student may rent a room in a hall or student flat for various periods of time. New lease agreements are negotiated at the start of each academic year with a minimum rental period of one semester (15 weeks) and a maximum rental period of one year, which includes Semesters 1, 2, and the Summer Semester. Each individual lease agreement between a student and the Accommodation Office is uniquely identified using a lease number. The data stored on each lease includes the lease number, duration of the lease (given as semesters), name, and matriculation number of the student, place number, room number, address details of the hall or student flat, the date the student wishes to enter the room, and the date the student wishes to leave the room (if known).
(5) Student flats are inspected by staff on a regular basis to ensure that the accommodation is well maintained. The information recorded for each inspection is the name of the member of staff who carried out the inspection, the date of inspection, an indication of whether the property was found to be in a satisfactory condition (yes or no), and any additional comments.
(6) Some information is also held on members of staff on the Accommodation Office and includes the staff number, name (first and last name), home address (street, city/town, postcode), date of birth, sex, position (for example, Hall Manager, Administrative Assistant, Cleaner), and location (for example, Accommodation Office or Hall).
(7) The Accommodation Office also stores a limited amount of information on the courses run by the University including the course number, course title (including year), course leader’s name, internal telephone number, and room number, and department name. Each student is associated with a single course.
(8) Whenever possible, information on a student’s next-of-kin is stored which includes the name, relationship, address (street, city/town, postcode), and contact telephone number.
5.6 Create an Enhanced Entity–Relationship (EER) model to represent the data requirements of the
University Accommodation Office case study. Develop the model using the following the steps:
(a) Identify entity types.
(b) Identify relationship types and determine the cardinality and participation constraints of the relationships.
(c) Identify attributes and associate attributes with entity or relationship types. (d) Determine candidate and primary key attributes.
(e) Specialize / generalize entity types (where appropriate). (f) Categorize entity types (where appropriate).
(g) Draw the EER diagram.
State any assumptions you made when creating the EER model.
An example of a possible ER model for the University Accommodation Office case study is shown below.
16
The student should identify any ambiguities in the specification for the University Accommodation Office and state clearly his or her assumptions. For example, we may assume that we only hold information on a single next-of-kin per student and that some students do not provide this information.
Lease Matric Course Reques Studi Fo Related Run Advis AcdStaff Place Provid HallN FlatN Manag Works O IDat Located StaffN Lease Studen Cours Academi Staff NOK Room Accomm adatio Hall Flat Accom Staff Undertak Office d M M 1 1 M 1 M Inspecti 1 1 1 1 M 1 M 1 1 M M M 1 1 1 1 M ∩ ∩
Chapter 6 Normalization
Review Questions
6.1 Describe the purpose of normalizing data.
When we design a database for a relational system, the main objective in developing a logical data model is to create an accurate representation of the data, its relationships and constraints. To achieve this objective, we must identify a suitable set of relations. A technique that we can use to help identify such relations is called normalization. Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise. Normalization supports database designers by presenting a series of tests, which can be applied to individual relations so that a relational schema can be normalized to a specific form to prevent the possible occurrence of update anomalies.
See also Sections 6.1 and 6.4.
6.2 Describe the problems that are associated with redundant data.
A major aim of relational database design is to group attributes into relations so as to minimize information redundancy and thereby reduce the file storage space required by the base relations. Another serious difficulty using relations that have redundant information is the problem of update anomalies. These can be classified as insertion, deletion, or modification anomalies. See Section 6.2
6.3 Describe the concept of functional dependency.
Functional dependency describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A → B), if each value of A in R is associated with exactly one value of B in R.
Functional dependency is a property of the meaning or semantics of the attributes in a relation. The semantics indicate how the attributes relate to one another and specify the functional dependencies between attributes. When a functional dependency is present, the dependency is specified as a constraint between the attributes.
See also Section 6.3.
6.4 How is the concept of functional dependency associated with the process of normalization?
Normalization is a formal technique for analyzing relations based on their primary key (or candidate keys in the case of BCNF) and functional dependencies. Normalization is often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. Three normal forms were initially proposed, which are called first (1NF), second (2NF) and third (3NF) normal form. Subsequently, a stronger definition of third normal form was introduced and is referred to as Boyce-Codd normal form (BCNF). All of these normal forms are based on the functional dependencies among the attributes of a relation.
6.5 Provide a definition for first, second, third and Boyce-Codd normal forms.
First Normal Form (1NF) is a relation in which the intersection of each row and column contains one and only one value.
Second Normal Form (2NF) is a relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on the primary key.
Third Normal Form (3NF) is a relation that is in first and second normal form in which no non-primary-key attribute is transitively dependent on the primary key.
18
Boyce-Codd Normal Form (BCNF) is a relation in which every determinant is a candidate key. See also Sections 6.5 to 6.8.
6.6 Describe the purpose of fourth (4NF) and fifth (5NF) normal form. For 4NF see Section 6.11 and for 5NF see Section 6.12.
Exercises
The table shown in Figure 6.25 lists dentist/patient appointment data. A patient is given an appointment at a specific time and date with a dentist located at a particular surgery. On each day of patient appointments, a dentist is allocated to a specific surgery for that day.
StaffNo DentistName PatNo PatName Appointment
Date Time
SurgeryNo
S1011 Tony Smith P100 Gillian White 12/9/95 10.00 S15
S1011 Tony Smith P105 Jill Bell 12/9/95 12.00 S15
S1024 Helen Pearson P108 Ian MacKay 12/9/95 10.00 S10
S1024 Helen Pearson P108 Ian MacKay 14/9/95 14.00 S10
S1032 Robin Plevin P105 Jill Bell 14/9/95 16.30 S15
S1032 Robin Plevin P110 John Walker 15/9/95 18.00 S13
Figure 6.25: Lists Dentist/Patient Appointment Data.
6.7 The table shown in Figure 6.25 is susceptible to update anomalies. Provide examples of insertion,
deletion, and update anomalies.
The student should provide examples of insertion, deletion and update anomalies using the data shown in the table. An example of a deletion anomaly is if we delete the details of the dentist called 'Helen Pearson', we also lose the appointment details of the patient called 'Ian MacKay'.
6.7 Describe and illustrate the process of normalizing the table shown in Figure 6.25 to Boyce-Codd
normal forms. State any assumptions you make about the data shown in this table.
The student should state any assumptions made about the data shown in the table. For example, we may assume that a patient is registered at only one surgery. Also, a patient may have more than one appointment on a given day.
StaffNo ADate ATime StaffNo DName PK 3NF / BCNF fd1 fd2 StaffNo ADate fd4 FK FK PK PatNo SurgeryNo fd5 FK PK
StaffNo ADate ATime DName PatNo PName
PK fd2 SurgeryNo fd4 fd2 and fd4 violates 2NF 2NF StaffNo ADate Fd3’ violates 3NF
SurgeryNo StaffNo DName
fd1
fd3
StaffNo ADate ATime PatNo PName
fd5
PatNo PName
PK
20
An agency called Instant Cover supplies part-time/temporary staff to hotels within Strathclyde Region. The table shown in Figure 6.26 lists the time spent by agency staff working at various hotels. The National Insurance Number (NIN) is unique for every member of staff.
NIN ContractNo Hours EName H_No H_Loc
1135 C1024 16 Smith J H25 East Kilbride
1057 C1024 24 Hocine D H25 East Kilbride
1068 C1025 28 White T H4 Glasgow
1135 C1025 15 Smith J H4 Glasgow
Figure 6.26: Instant Cover’s Contracts.
6.8 The table shown in Figure 6.26 is susceptible to update anomalies. Provide examples of insertion,
deletion, and update anomalies.
The student should provide examples of insertion, deletion and update anomalies using the data shown in the table. An example of an update anomaly is if we wish to change the name of the employee called 'Smith J', we may only change the entry in the first row and not the last with the result that the database becomes inconsistent.
6.9 Describe and illustrate the process of normalizing the table shown in Figure 6.26 to Boyce-Codd
Normal Form. State any assumptions you make about the data shown in this table.
The student should state any assumptions made about the data shown in the table. For example, we may assume that a hotel may be associated with one or more contracts.
NIN Contract_No Hours EName H_No H_Loc
NIN Contract_No Hours
NIN EName
NIN Contract_No Hours Contract_No H_No
H_No H_ Loc PK PK FK PK PK fd1 fd2 fd3 fd4 fd2 and fd3 violates 2NF 2NF 3NF / BCNF fd1 fd3 fd4
Contract_No H_No H_Loc
fd4 violates 3NF FK FK NIN EName PK fd2
22
Chapter 7 Methodology - Conceptual Database Design
Review Questions
7.1 Describe the purpose of a design methodology.
A structured approach that uses procedures, techniques, tools, and documentation aids, to support and facilitate the process of design. A design methodology consists of phases that contain steps, which guide the designer in the choice of techniques that are appropriate at each stage of the project and also helps to plan, manage, control and evaluate database development projects. Furthermore, it is a structured approach for analyzing and modeling a set of requirements for a database in a standardized and organized manner.
See Section 7.1.
7.2 Describe the main phases involved database design.
See Section 7.1.2
7.3 Identify important factors in the success of logical database design.
See Section 7.1.3
7.4 Discuss the important role played by users in the process of database design.
The users' involvement throughout the database design phase is critical to providing the 'correct' system. In particular, the user should clarify any ambiguities in the specification that describes the required system and also review continually the development of the database design. The process of developing the database design is repeated until the user is prepared to 'sign-off' the design as being a 'true' representation of the part of the enterprise that is being modeling.
See Sections 7.1.3 and 7.2.
7.5 Describe the main objective of conceptual database design.
See Section 7.3 Step 1.
7.6 Describe what a user view represents and the approaches that may be used to identify user
views.
User views can be identified using various methods. First, by examining the data flow diagrams that should have been produced previously to identify functional areas and possibly individual functions. Alternatively, by interviewing users, examining procedures, reports, forms, and/or observing the enterprise in operation.
See also Section 7.3
7.7 Identify the main tasks associated with conceptual database design.
See Section 7.3
7.8 Discuss the purpose of specialization/generalization of entity types, and discuss why this is an
optional step in conceptual database design.
See Section 7.3 Step 1.6.
7.9 Identify and describe the purpose of the documentation generated during conceptual database
design.
Chapter 8 Methodology - Logical Database Design for Relational Model
Review Questions
8.1 Identify the three main phases of database design and discuss the purpose of logical database
design.
The three main phases of database design are conceptual, logical, and physical. For purpose of logical database design see Section 8.1.
8.2 Describe the steps involved in refining a conceptual data model into a logical data model.
See Section 8.1 Step 2.
8.3 Describe the rules for deriving relations that represent strong entity types, weak entity types,
one-to-one binary relationship types, one-to-many relationship types, multi-valued attributes, and superclass/subclass relationships.
See Section 8.1 Step 2.2.
8.4 Discuss how the technique of normalization can be used to validate the logical data model and
the relations derived from the model.
The logical data model can be validated using the technique of nomalization and against the transactions that the model is required to support. Normalization is used to improve the model so that it satisfies various constraints that avoid unnecessary duplication of data. Normalization ensures that the resultant model is a closer model of the enterprise that it serves, it is consistent, and has minimal redundancy and maximum stability.
See also Section 8.1 Step 2.3.
8.5 Discuss two approaches that can be used to validate that the logical data model is capable of
supporting the transactions required by the user’s view.
Two possible approaches to ensure that the logical data model supports the required transactions include: checking that all the information (entities, relationships and their attributes) required by each transaction is provided by the model in documenting a description of each transaction’s requirements, and diagrammatically representing the pathway taken by each transaction directly on the ER diagram.
See also Section 8.1 Step 2.4.
8.6 Describe the purpose of integrity constraints and identify the five main types of constraints. The main types of constraints include: There are five types of integrity constraints: required data, attribute domain constraints, entity integrity, referential integrity, and enterprise constraints.
See also Section 8.1 Step 2.6.
8.7 Describe the alternative strategies that can be applied if there exists a child occurrence
referencing a parent occurrence that we wish to delete.
There are several strategies to consider when there exists a child occurrence referencing the parent occurrence that we are attempting to delete: NO ACTION, CASCADE, SET NULL, SET DEFAULT, and NO CHECK.
See also Section 8.1 Step 2.6.
8.8 Identify the tasks typically associated with merging local logical data models into a global
24 See Section 8.1 Step 3 and in particular Step 3.1.
Chapter 9 Methodology - Physical Database Design for Relational DBMSs
Review Questions
9.1 Explain the difference between logical and physical database design. Why might these tasks be
carried out by different people?
Logical database design is concerned with building the data model for the organization that is completely independent of any DBMS. Physical database design, however, is concerned with actually defining the data model using the DDL of a particular DBMS. Consequently, logical database design focuses on what the data model represents, whereas physical database design focuses on how the data model is to be implemented. Different skills are required to undertake these design phases, which are often found in different people.
See Section 9.1.
9.2 Describe the inputs and outputs of physical database design.
The inputs are the global logical data model and the data dictionary. The outputs are the base relations, integrity rules, file organization specified, secondary indexes determined, user views and access rules.
See Section 9.3.
9.3 Describe the purpose of the main steps in the physical design methodology presented in this
chapter.
Step 4 Produces a relational database schema from the global logical data model. This includes integrity rules.
Step 5 Determines the file organizations for the base relations. This takes account of the nature of the transactions to be carried out, which also determine where secondary indexes will be of use. As a result of analyzing the transactions, the design may be altered by incorporating controlled redundancy into it.
Step 6 Designs the security measures for the database implementation. This includes designing the user views and the access rules on the relations and views. Step 7 Monitors the database application systems and improves performance by
making amendments to the design as appropriate. See also Section 9.3.
9.4 "One of the main objectives of physical database design is to store data in an efficient way." How
might we measure efficiency in this context?
This can be measured by the number of transactions that can be processed by the system in a given time frame, or by the length of time it takes to complete one transaction, or by the amount of disk storage taken up by the database files.
See also Section 9.3 Step 5.
9.5 Under what circumstances would we want to denormalize a logical data model? Use examples to
illustrate your answer.
Generally, if overall performance needs to be improved, controlled redundancy can be introduced. See also Section 9.3 Step 5.4.
Examples:
If queries on staff always required the branch address, this attribute could be posted into staff. The effect on updating would be minimal if branch data was relatively static, and it removes a join from the query.
26
If the number of staff for each branch was often required with branch details, a derived attribute could be placed in branch. This would remove the need to access and repeatedly count the relevant records in staff. When staff joined or left a branch, this attribute would required updating.
Chapter 10 Conceptual Database Design Methodology - Worked Example
Exercises
The Wellmeadows Hospital case study
10.1 Identify user views for the Medical Director and Charge Nurse in the Wellmeadows Hospital case
study, described in Appendix A.
See Appendix A
10.2 List the users' requirements specification for each of these views.
The student should examine the case study in detail and identify and compile a user specification for the Medical Director’s and the Charge Nurse’s views. As an additional task the student may also compile a specification for the Personnel Officer’s view.
Of course to complete this exercise will require that the student makes some ascertions about the precise requirements of each user view. Any assumption should be documented along with each view.
Examples of specification for the Medical Director’s and Charge Nurse’s views are shown below. Medical Director
The Director is responsible for the overall management of the hospital and must maintain control over the use of resources (including staff, beds, and supplies) in the provision of cost-effective treatment for all patients.
1. The hospital is composed of many wards. Each ward is managed by a Charge Nurse. The information to be held on each ward includes the ward name, number (e.g. W1), phone number, location (e.g. Block E), number of beds and the name of the Charge Nurse.
2. Each ward is allocated staff including (e.g. Charge nurse, senior and junior nurses, doctors, consultants, auxiliaries).
3. The hospital maintains a central stock of surgical (e.g. syringe, bandages) and non-surgical (e.g. plastic bags, aprons). The details of non-surgical and non-non-surgical supplies includes item number and name, item description, quantity in stock, re-order level, cost per unit. The supplies used by each ward is monitored.
The hospital also maintains a stock of pharmaceutical supplies (e.g. antibiotics, pain killers). The details of pharmaceutical supplies includes drug number and name, description, dosage, quantity in stock, re-order level, cost per unit. The pharmaceutical supplies used by each ward is monitored.
4. The details of the suppliers of the surgical, non-surgical and pharmaceutical items are stored. The information stored includes the supplier name and number, address, phone and fax number.
5. Patients are normally referred to the hospital for treatment by their local doctor. The details of local doctors are stored including the their name, clinic number, address and phone number.
6. The details of patients referred to the hospital includes the patient number, name (first and last name), address, phone number, date of birth, martial status, next-of-kin details (name, relationship, address and phone number).
28
7. When a patient is referred by their doctor to attend the hospital, the patient is given an appointment and is examined by a consultant. The details of the appointment are stored including the consultant’s name and number, appointment number, date, time and examination room (e.g. Room E112).
As a result of the examination, the patient is either recommended to attend the outpatient clinic or placed on a waiting list until a bed can be found in a particular ward.
8. The details of outpatients are stored. The information stored includes the patient details as stated earlier (see 6) and the date and time of the appointment at the outpatient clinic. 9. The details of patients currently placed in a ward and those on the waiting list for a place
on a ward are stored. The information stored includes the patient details as stated earlier (see 6) and the date placed on waiting list, ward required, expected duration of stay, date placed in the ward and date left the ward.
Charge Nurse
The Charge Nurse has overall responsibility for the management of a single ward. The Charge Nurse is allocated a budget to run the ward and must ensure that all resources (staff, beds and supplies) are used effectively in the care of patients.
The Charge Nurse and other senior medical staff are responsible for the allocation of beds to patients on the waiting list.
1. The information to be held on each ward includes the details of staff allocated to each ward including the staff number, name, address, phone number, position, number of hours worked per week and shift (e.g. early, late).
2. The information stored on each patient on the waiting list includes the patient number, name (first and last name), address, phone number, date of birth, martial status, next-of-kin details, date placed on waiting list, required ward, date placed in ward, expected duration of stay, date left ward.
3. When a patient enters the ward they are allocated a bed with a unique bed number. Each patient is prescribed medication and the details of this medication includes the patient number, drug number and name, units per day, start and finish date. The medication (pharmaceutical supplies) given to each patient is monitored.
4. Staff are allocated to work in wards, as required. The Charge Nurse of each ward is responsible for creating a staff rota which ensures that the correct complement of staff are on duty for each shift (early, late, night). Nursing staff may be specifically allocated to patients who required specialist care.
5. When required the Charge Nurse may obtain surgical, non-surgical and pharmaceutical supplies from the central stock of supplies held by the hospital. The information to be stored includes the requisition number, staff name and number, ward number, item number (or drug number), quantity required, date ordered and date received.
10.3 Create local conceptual data models for each of the user views. State any assumptions
necessary to support your design.
The student should create local conceptual data models using the requirements specification for the Medical Director and the Charge Nurse. This should include an ER model representing each user view and the supporting documentation that describes the models. Throughout the process of design, the student should clearly state any assumptions necessary to support his or her design.
Chapter 11 Logical Database Design Methodology - Worked Example
Exercises
The Wellmeadows Hospital case study
11.1 Create and validate the local logical data models for each of the user views of the Wellmeadows
Hospital case study identified in Exercise 10.1.
The student should refine the local conceptual models to create local logical data models based on the Medical Director and Charge Nurse views. The logical models representing each user view should also be validated. The student should produce an ER model for each user view and the supporting documentation that describes each model. Throughout the process of design, the student should clearly state any assumptions necessary to support his or her design.
11.2 Merge the local data models to create a global logical data model of the Wellmeadows Hospital
case study. State any assumptions necessary to support your design.
Once the local data models have been validated, the student should demonstrate the view integration approach to create a global logical data model. The student should produce an ER model of the global data model, representing both user views and the supporting documentation that describes the model. Throughout the process of design, the student should clearly state any assumptions necessary to support his or her design.
11.3 Create or update the supporting documentation for the global logical data model of the
Wellmeadows Hospital case study.
An example of an ER model and the relational schema of the global data model of the Wellmeadows Hospital case study is shown below. Note that this answer also includes the Personnel Officer’s view.
30
Ward (WardNo, WName, Location, TotalBeds, TelExtn, CNStaffNo) Primary Key WardNo
Alternate Key TelExtn
Foreign Key CNStaffNo is NOT NULL references Staff(StaffNo) on delete SET DEFAULT on update CASCADE
Staff (StaffNo, FName, LName, Address, TelNo, DOB (Date_of_Birth), Sex, (National Insurance Number) NIN, Position, Salary, SScale, WeekHrs, ContType, TypePay) Primary Key StaffNo
Alternate Key NIN
Qualification (QDate, QType, Institution, StaffNo) Primary Key QType, StaffNo
Foreign Key StaffNo is NOT NULL references Staff(StaffNo) on delete CASCADE on update CASCADE
Work_Experience (SDate, FDate, Position, OrgName, StaffNo) Primary Key OrgName, StaffNo
Foreign Key StaffNo is NOT NULL references Staff(StaffNo) on delete CASCADE on update CASCADE
Staff_Rota (Shift, WeekNo, StaffNo, WardNo) Primary Key StaffNo, WeekNo
Foreign Key StaffNo is NOT NULL references Staff(StaffNo) on delete CASCADE on update CASCADE)
Foreign Key WardNo is NOT NULL references Ward(WardNo) on delete CASCADE on update CASCADE)
Patient (PatNo, FName, LName, Address, TelNo, DOB, Sex, MStatus, DateReg, DocName, ClinicNo, NName, NRelationship, NAddress, NTelNo)
Primary Key PatNo
NOK
Supply OPClinic Appointmt InPatient
Patient
Item Drug Medication
Staff Ward Bed
M 1 1 1 1 1 1 Supplier Requisition M M M M M M M M M M M M 1 M M M M M 1 M M 1 1 1 1 1 1 N 1 M M N Attends Refers RelatedTo Takes Form s In Requires WorksIn WorksIn Is GoesTo Examines Has Makes For AssignedTo Receives Provides 1 Qualificatn WorkE xp Has Has 1 1 M M d
Foreign Key DocName, ClinicNo is NOT NULL references Doctor(DocName, ClinicNo) on delete NO ACTION on update CASCADE
Doctor (DocName, ClinicNo, Address, TelNo) Primary Key DocName, ClinicNo
Appointment (AppNo, PatNo, ConsStaffNo, ADate, ATime, RoomNo) Primary Key AppNo
Foreign Key PatNo is NOT NULL references Patient(PatNo) on delete NO ACTION on update CASCADE)
Foreign Key ConStaffNo is NOT NULL references Staff(StaffNo) on delete NO ACTION on update CASCADE)
OutPatient_Appointment (OutPatDate, OutPatTime, PatNo) Primary Key OutPatDate, PatNo
Foreign Key PatNo is NOT NULL references Patient(PatNo) on delete CASCADE on update CASCADE)
InPatient_Allocation (ListDate, WardReq, Duration, PlacedDate, ExLeaveDate, ActLeaveDate, PatNo, BedNo)
Primary Key PatNo, ListDate
Foreign Key PatNo is NOT NULL references Patient(PatNo) on delete CASCADE on update CASCADE)
Foreign Key WardReq is NOT NULL references Ward(WardNo) on delete NO ACTION on update CASCADE
Medication (PatNo, DrugNo, UnitsDay, AMethod, SDate, FDate) Primary Key PatNo, DrugNo, SDate
Foreign Key PatNo is NOT NULL references Patient(PatNo) on delete CASCADE on update CASCADE)
Foreign Key DrugNo is NOT NULL references Pharmaceutical(DrugNo) on delete NO ACTION on update CASCADE)
Pharmaceutical (DrugNo, DName, Description, Dosage, MAdmin, QStock, RLevel, UnitCost , SupplierNo)
Primary Key DrugNo
Foreign Key SupplierNo is NOT NULL references Supplier(SupplierNo) on delete NO ACTION on update CASCADE
Non-Surgical/Surgical (ItemNo, IName, IDescription, QStock, RLevel, UnitCost, SupplierNo) Primary Key ItemNo
Foreign Key SupplierNo is NOT NULL references Supplier(SupplierNo) on delete NO ACTION on update CASCADE
Requisition (ReqNo, CNStaffNo, WardNo, ItemDrugNo, QuantReq, DateOrder, DateReceive) Primary Key ReqNo
Foreign Key CNStaffNo is NOT NULL references Staff(StaffNo) on delete NO ACTION on update CASCADE
Foreign Key WardNo is NOT NULL references Ward(WardNo) on delete NO ACTION on update CASCADE
Foreign Key ItemDrugNo is NOT NULL references Non-Surgical/Surgical(ItemNo) and Pharmaceutical(DrugNo) on delete NO ACTION on update CASCADE
Supplier (SupplierNo, SName, SAddress, TelNo, FaxNo) Primary Key SupplierNo
Alternative Key TelNo Alternative Key FaxNo
32
Chapter 12 Physical Database Design Methodology – Worked Example
Exercises
12.1 Create a physical database design for the logical design of the DreamHome case study
(described in Chapter 11) based on the DBMS that you have access to.
The assumption made here is that any DBMS that is being used is a relational DBMS. It is important to know the facilities that are provided by the DBMS, and understand how to make use of them for physical database design. Assumptions made may made, for example, on the performance of transactions. The student should produce the required documentation for the target DBMS.
12.2 Implement the DreamHome database using the physical design created in Section 12.1.
The student should implement the physical database design created in Exercise 12.1 using the target DBMS.
12.3 Investigate whether your DBMS can accommodate the two new requirements for the
DreamHome case study given in Step 7 of this chapter.
Again the student will have to investigate the functionality of the target DBMS to assess whether the new requirements can be made. If yes, the student should implement the new requirements and also suggest further enhancements.
12.4 Create a physical database design for the Wellmeadows Hospital case study (described in
Appendix A) based on the DBMS that you have access to.
The student should create a physical database design for the Wellmeadows Hospital case study based on the logical design created in Exercise 11.3.
12.5 Implement the Wellmeadows Hospital database using the physical design created in 12.4.
The student should implement the physical database design created in Exercise 12.4 using the target DBMS.
Part Three Database Languages Chapter 13 SQL
Review Questions
13.1 What are the two major components of SQL and what function do they serve?
A data definition language (DDL) for defining the database structure. A data manipulation language (DML) for retrieving and updating data.
13.2 What are the advantages and disadvantages of SQL?
Advantages
• Satisfies ideals for database language
• (Relatively) Easy to learn
• Portability
• SQL standard exists
• Both interactive and embedded access
• Can be used by specialist and non-specialist. Disadvantages
• Impedance mismatch - mixing programming paradigms with embedded access
• Lack of orthogonality - many different ways to express some queries
• Language is becoming enormous (SQL-92 is 6 times larger than predecessor)
• Handling of nulls in aggregate functions
• Result tables are not strictly relational - can contain duplicate tuples, imposes an ordering on both columns and rows.
13.3 Explain the function of each of the clauses in the SELECT statement. What restrictions are
imposed on these clauses?
FROM Specifies the table or tables to be used. WHERE Filters the rows subject to some condition.
GROUP BY Forms groups of rows with the same column value. HAVING Filters the groups subject to some condition. SELECTSpecifies which columns are to appear in the output. ORDER BY Specifies the order of the output.
If the SELECT list includes an aggregate function and no GROUP BY clause is being used to group data together, then no item in the SELECT list can include any reference to a column unless that column is the argument to an aggregate function.
When GROUP BY is used, each item in the SELECT list must be single-valued per group. Further, the SELECT clause may only contain:
• Column names.
• Aggregate functions.
• Constants.
• An expression involving combinations of the above.
All column names in the SELECT list must appear in the GROUP BY clause unless the name is used only in an aggregate function.
13.4 What restrictions apply to the use of the aggregate functions within the SELECT statement? How
do nulls affect the aggregate functions?
34
Apart from COUNT(*), each function eliminates nulls first and operates only on the remaining non-null values. COUNT(*) counts all the rows of a table, regardless of whether nulls or duplicate values occur.
13.5 Explain how the GROUP BY clause works. What is the difference between the WHERE and
HAVING clauses?
SQL first applies the WHERE clause. Then it conceptually arranges the table based on the grouping column(s). Next, applies the HAVING clause and finally orders the result according to the ORDER BY clause.
WHERE filters rows subject to some condition; HAVING filters groups subject to some condition.
13.6 What is the difference between a subquery and a join? Under what circumstances would you not
be able to use a subquery?
With a subquery, the columns specified in the SELECT list are restricted to one table. Thus, cannot use a subquery if the SELECT list contains columns from more than one table.
Exercises
The following tables form part of a database held in a relational DBMS: Hotel (Hotel_No, Name, Address)
Room (Room_No, Hotel_No, Type, Price)
Booking (Hotel_No, Guest_No, Date_From, Date_To, Room_No) Guest (Guest_No, Name, Address)
where Hotel contains hotel details and Hotel_No is the primary key
Room contains room details for each hotel and (Hotel_No, Room_No) forms the primary key Booking contains details of the bookings and the primary key comprises (Hotel_No, Guest_No
and Date_From)
and Guest contains guest details and Guest_No is the primary key. Simple Queries
13.7 List full details of all hotels.
SELECT * FROM hotel;
13.8 List full details of all hotels in London.
SELECT * FROM hotel WHERE address LIKE '%London%';
Strictly speaking, this would also find rows with an address like: '10 London Avenue, New York'. 13.9 List the names and addresses of all guests in London, alphabetically ordered by name.
SELECT name, address FROM guest WHERE address LIKE '%London%' ORDER BY name;
13.10 List all double or family rooms with a price below £40.00 per night, in ascending order of price.
SELECT * FROM room WHERE price < 40 AND type IN ('D', 'F') ORDER BY price;
(Note, ASC is the default setting).
SELECT * FROM booking WHERE date_to IS NULL; Aggregate Functions
13.12 How many hotels are there?
SELECT COUNT(*) FROM hotel; 13.13 What is the average price of a room?
SELECT AVG(price) FROM room;
13.14 What is the total revenue per night from all double rooms?
SELECT SUM(price) FROM room WHERE type = 'D'; 13.15 How many different guests have made bookings for August?
SELECT COUNT(DISTINCT guest_no) FROM booking
WHERE (date_from <= DATE'1999-08-01' AND date_to >= DATE'1999-08-01') OR (date_from >= DATE'1999-08-01' AND date_from <= DATE'1999-08-31'); Subqueries and Joins
13.16 List the price and type of all rooms at the Grosvenor Hotel.
SELECT price, type FROM room WHERE hotel_no =
(SELECT hotel_no FROM hotel WHERE name = 'Grosvenor Hotel'); 13.17 List all guests currently staying at the Grosvenor Hotel.
SELECT * FROM guest WHERE guest_no =
(SELECT guest_no FROM booking
WHERE date_from <= CURRENT_DATE AND date_to >= CURRENT_DATE AND hotel_no =
(SELECT hotel_no FROM hotel WHERE name = 'Grosvenor Hotel'));
13.18 List the details of all rooms at the Grosvenor Hotel, including the name of the guest staying in the room, if the room is occupied.
SELECT r.* FROM room r LEFT JOIN
(SELECT g.name, h.hotel_no, b.room_no FROM Guest g, Booking b, Hotel h WHERE g.guest_no = b.guest_no AND b.hotel_no = h.hotel_no AND
h.name= 'Grosvenor Hotel' AND
b.date_from <= CURRENT_DATE AND b.date_to >= CURRENT_DATE) AS XXX ON r.hotel_no = XXX.hotel_no AND r.room_no = XXX.room_no; 13.19 What is the total income from bookings for the Grosvenor Hotel today?
SELECT SUM(price) FROM booking b, room r, hotel h WHERE (b.date_from <= CURRENT_DATE AND
b.date_to >= CURRENT_DATE) AND
36
13.20 List the rooms that are currently unoccupied at the Grosvenor Hotel.
SELECT * FROM room r WHERE room_no NOT IN
(SELECT room_no FROM booking b, hotel h WHERE (date_from <= CURRENT_DATE AND
date_to >= CURRENT_DATE) AND
b.hotel_no = h.hotel_no AND name = 'Grosvenor Hotel'); 13.21 What is the lost income from unoccupied rooms at the Grosvenor Hotel?
SELECT SUM(price) FROM room r WHERE room_no NOT IN
(SELECT room_no FROM booking b, hotel h WHERE (date_from <= CURRENT_DATE AND
date_to >= CURRENT_DATE) AND
b.hotel_no = h.hotel_no AND name = 'Grosvenor Hotel'); Grouping
13.22 List the number of rooms in each hotel.
SELECT hotel_no, COUNT(room_no) AS count FROM room GROUP BY hotel_no;
13.23 List the number of rooms in each hotel in London.
SELECT hotel_no, COUNT(room_no) AS count FROM room r, hotel h WHERE r.hotel_no = h.hotel_no AND address LIKE '%London%' GROUP BY hotel_no;
13.24 What is the average number of bookings for each hotel in August?
SELECT AVG(X)
FROM ( SELECT hotel_no, COUNT(hotel_no) AS X FROM booking b
WHERE (b.date_from <= DATE'1999-08-01' AND b.date_to >= DATE'1999-08-01') OR (b.date_from >= DATE'1999-08-01' AND
b.date_from <= DATE'1999-08-31') GROUP BY hotel_no);
Yes - this is legal in SQL-92!
13.25 What is the most commonly booked room type for each hotel in London?
SELECT MAX(X)
FROM ( SELECT type, COUNT(type) AS X FROM booking b, hotel h, room r
WHERE r.room_no = b.room_no AND b.hotel_no = h.hotel_no AND h.address LIKE '%London%'
GROUP BY type);
13.26 What is the lost income from unoccupied rooms at each hotel today?
SELECT hotel_no, SUM(price) FROM room r WHERE room_no NOT IN
(SELECT room_no FROM booking b, hotel h WHERE (date_from <= CURRENT_DATE AND