SCHOLAR Study Guide
SQA Higher Information Systems
Unit
2:
Relational
Database
Systems
David Bethune
Heriot-Watt UniversityTom Kelly
Heriot-Watt UniversityTom Liversidge
Heriot-Watt University Heriot-Watt UniversityFirst published 2007 by Heriot-Watt University
This edition published in 2007 by Heriot-Watt University Copyright c2007 Heriot-Watt University
Members of the SCHOLAR Forum may reproduce this publication in whole or in part for educational purposes within their establishment providing that no profit accrues at any stage, Any other use of the materials is governed by the general copyright statement that follows. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, without written permission from the publisher. Heriot-Watt University accepts no responsibility or liability whatsoever with regard to the information contained in this study guide.
SCHOLAR Study Guide Unit 2: Information Systems 1. Information Systems
Printed and bound in Great Britain by Graphic and Printing Services, Heriot-Watt University, Edinburgh.
Acknowledgements
Thanks are due to the members of Heriot-Watt University’s SCHOLAR team who planned and created these materials, and to the many colleagues who reviewed the content.
We would like to acknowledge the assistance of the education authorities, colleges, teachers and students who contributed to the SCHOLAR programme and who evaluated these materials. Grateful acknowledgement is made for permission to use the following material in the
SCHOLAR programme:
The Scottish Qualifications Authority for permission to use Past Papers assessments. The Scottish Executive for financial support.
All brand names, product names, logos and related devices are used for identification purposes only and are trademarks, registered trademarks or service marks of their respective holders.
i
Contents
1 Database Fundamentals 1
1.1 Prior Knowledge and Revision . . . 2
1.2 Introduction . . . 2
1.3 Traditional Databases . . . 3
1.4 Database Management Systems (DBMS) . . . 4
1.5 Data, Entities and Relationships . . . 5
1.6 Summary . . . 6
2 Data Modelling 9 2.1 Prior Knowledge and Revision . . . 10
2.2 Introduction . . . 10
2.3 Early Data Models . . . 10
2.4 The Relational Model . . . 12
2.5 Data Analysis and Specification . . . 13
2.6 Entity Occurrence Modelling . . . 13
2.7 Summary . . . 16
3 The Relational Model 17 3.1 Prior Knowledge and Revision . . . 18
3.2 Introduction . . . 18
3.3 Data Structure . . . 18
3.4 Data Operators . . . 19
3.5 From Data Model to Database . . . 20
3.6 Keys . . . 21
3.7 Translating an Entity-Relationship Diagram into Tables . . . 22
3.8 Data Integrity . . . 23
3.9 Summary . . . 25
4 Representing the Relational Data Model 27 4.1 Prior Knowledge and Revision . . . 28
4.2 Introduction . . . 28
4.3 Properties of Tables . . . 28
4.4 Relational Data Modeling . . . 29
4.5 Database Integrity: Primary Keys and Entity Integrity . . . 31
4.6 Summary . . . 32
5 Normalisation 33 5.1 Prior Knowledge and Revision . . . 34
5.2 Introduction . . . 34
ii CONTENTS
5.4 Why Normalise? . . . 43 5.5 Summary . . . 45
Glossary 47
c
1
Topic 1
Database Fundamentals
Contents
1.1 Prior Knowledge and Revision . . . 2
1.2 Introduction . . . 2
1.2.1 Database Terminology . . . 2
1.3 Traditional Databases . . . 3
1.3.1 Problems with file-based systems . . . 3
1.3.2 The Database Solution . . . 4
1.4 Database Management Systems (DBMS) . . . 4
1.4.1 Advantages of Using a Database System . . . 5
1.5 Data, Entities and Relationships . . . 5
1.5.1 Relationships . . . 6
1.6 Summary . . . 6
Learning Objectives
By the end of this topic you should be able to:
¯ define a database management system
¯ describe the problems associated with traditional paper-based databases ¯ define a database
¯ describe the advantages of using computerised databases ¯ identify data entities and relationships
2 TOPIC 1. DATABASE FUNDAMENTALS
1.1
Prior Knowledge and Revision
The topic assumes that you have previously considered some of the basic principles of database design including the benefits of electronic systems in comparison with manual databases. This may have been done through the study of the corresponding Unit at Intermediate 2 or in Standard Grade Computing Studies, or any other course on Computing or ICT.
Check your understanding by trying some revision questions.
1.2
Introduction
This topic takes an introductory look at how the storage, processing and retrieval of information have developed from paper based systems to computerised databases. The problems encountered in such systems are covered and what needs to be considered to overcome them. Included in this topic are:
1. the terminology of databases
2. a look at tradition forms of databases
3. problems associated with traditional methods 4. advantages in using database systems
1.2.1 Database Terminology
A database is a collection of related information about a set of persons, objects or events.
Databases have been around for longer than computers! Some common everyday
examples of databases which are stored on paper are:
an address book which stores names, addresses and other contact details about people
the Phone BookTM which stores names, addresses and telephone numbersabout people or companies a recipe book which stores ingredients and instructions about dishes
Nowadays, most databases are stored electronically on computers or other electronic equipment. Examples include:
your contacts list in your email or instant messaging application
which stores contact information about people
names and numbers in your mobile phone which stores contact information about people
playlists in an MP3 player which stores information about tracks of music
A database management system (DBMS) is a software package which is used to create, manipulate and present data from electronic databases. Example include Microsoft Access, Filemaker Pro, MySQL, Oracle, Ingres, Postgres, SQL Server and many others.
1.3. TRADITIONAL DATABASES 3
1.3
Traditional Databases
Why do database systems exist?
In order to understand the answer to this important question, it is useful to consider what was used before database systems were created. The first databases were manual paper-based systems. Often paper records were organised into folders, and stored in filing cabinets. There were several problems associated with paper-based databases:
¯ the storage of paper records was very bulky, often requiring several large filing
cabinets
¯ it was very easy to mis-file a paper record, or for records to be lost or damaged ¯ data was often duplicated in several records
¯ keeping records up-to-date was difficult and time consuming, and often resulted in
data inconsistency, where duplicated values were updated in one record but not in others
¯ many people were employed to maintain the records, which was costly ¯ searching for records was time consuming
¯ producing reports, such as sorted lists or data collated from several sources, was
extremely time consuming, if not impossible
The first computerised data processing systems were file-based systems with data stored in files, and application programs operating on that data.
Although this system was an improvement on previous approaches where data was stored within the application itself, there were still various difficulties with file-based systems.
1.3.1 Problems with file-based systems
If a system is developed following an evolutionary approach, with the systems being created and modified, then the structure of the data may need to change. Changes to this data means that all programs that refer to this data must also be changed. This problem is often known as data dependence.
c
4 TOPIC 1. DATABASE FUNDAMENTALS
Data dependence
Please visit the online course to view an interactive demonstration of this concept. In a file-based systems, the same information may well be duplicated in multiple files. For example, the full name and address of a customer may appear in several files, depending on what sort of orders they placed (small or large) and when they placed them. This is known as duplication of data, sometimes known as data redundancy because the same information does not need to be repeated.
This duplication of data often leads to another problem, that of inconsistency of the data where different copies of the data do not agree.
Data Redundancy
Please visit the online course to view an interactive demonstration of this concept.
1.3.2 The Database Solution
As a result of these problems a more effective solution was required. This solution is having a collection of related data in a database. This database is then controlled and accessed by means of a database management system (DBMS). It should be noted, however, that although a DBMS can help to address the problem, it still requires certain rules to be followed. In particular, if a database is not designed correctly, issues such as duplication leading to inconsistency can still occur.
Good design of the system is therefore essential to avoid these potential problems.
1.4
Database Management Systems (DBMS)
A database management system (DBMS, or database system) is a collection of interrelated data and a set of programs to manage and access that data.
It provides the ability to present many different views of the database to support many different types of user. In fact, most database systems have a variety of people who use it, all for different purposes. Common categories of user include database designers, database administrators, application programmers as well as general users. These users all use the database system in different contexts with differing requirements. Here are some examples of different users of a database system:
¯ Database designers will be involved in tasks including storage and structure of the
database.
¯ Database administrators will determine access authorisation for users.
¯ Application programmers will work on the development of application programs
that operate on the data.
¯ General database users manipulate the data in the database system by means
of application programs, or by using queries to interact with the database more directly.
1.5. DATA, ENTITIES AND RELATIONSHIPS 5
1.4.1 Advantages of Using a Database System
Computerised databases have several advantages over manual databases:
¯ The ability to store large amounts of data
¯ Searching and sorting operations can be performed much more quickly and easily
than manually
¯ Information is more easily available to users, due to improved methods of data
retrieval.
¯ Databases are usually stored and accessed using computers in a local area
network (LAN). This means that any user using a computer on the LAN can access the data when required. (Please note, this is not a requirement of all database management systems as some smaller "personal" database management systems exist that cater only for single user applications. However, it is still a desirable feature, and can be essential for some applications.)
¯ Security and integrity of data is improved, due to the methods of data storage and
updating
¯ There is no data redundancy: that is, no unnecessary duplication of data. In a
relational database, data is never stored more often than is necessary
¯ Data consistency: this means that when data is updated, this is only done once,
so removing the possibility of data inconsistency. As soon as the data is updated, all users have access to it.
¯ Data independence: the software which processes the data is separate from
the data itself. This means that several programs can be developed to perform operations on the data without affecting the data itself.
1.5
Data, Entities and Relationships
A data entity is something that information will be recorded about in a database. This could be a person, an object, or something abstract like a holiday booking. Examples of entities about persons include a customer, airline passenger, a library borrower, an employee, a school pupil, a college student. Example of objects include a product, a flight, a book, a job, a course at school or college. Examples of abstract entities include a borrowing, a booking, or a rental.
An entity set is a collection of entities of the same type and is represented as a table in a database system.
Each entity has a set of attributes which describe examples or instances of that entity. For example, the attributes of a school pupil might include the following: forename, surname, address, date of birth, year group, tutor group.
An instance of a school pupil might be Joanna Hamilton, whose attributes are: Joanna, Hamilton, 10 Main Road, Kinglass, 01/04/1990, 5, SW.
The attributes of a DVD Rental entity are code, title, cost, date out, date due and
c
6 TOPIC 1. DATABASE FUNDAMENTALS
member number, and the attributes of the Member entity are member number, name and telephone number.
Attributes will have constraints on what is permitted. For example, an attribute representing a person’s name will contain characters only. This is called the domain of an attribute and will define the type and value of data that the attribute can hold. Examples of data types that attribute may use are:
Text - e.g. a person’s name
Integer - e.g. the number of books borrowed from a library Real - e.g. the conversion rate for foreign currency
Object - a photographic image Boolean - a True or False value Date - someone’s date-of-birth Time - the departure of a train
Entities
Please visit the online course to view an interactive demonstration of this concept.
1.5.1 Relationships
Entities can be linked, or related to one-another. When a link exists, it is called a. A school will have a headteacher, and that headteacher will have a school. This is a one-to-one relationship between the entities School and Headteacher.
In the same school, a pupil will study many subject and each subject will be studied by many pupils. This is a many-to-many relationship between the entities Pupil and Subject.
A one-to-many relationship in a school exists between the entities School and Pupil. Here a school has many pupils and each pupil attends one school.
These relationships form the basis of how relational database systems are modelled. There is a table on the online course shows how some entities can be linked by a relationship.
Some of these will be further exemplified in the next Topic
1.6
Summary
By now you should be able to:
¯ describe a database in terms of information storage, processing and retrieval ¯ describe the problems associated with traditional paper-based databases
1.6. SUMMARY 7
¯ describe the problems associated with early file-based systems ¯ describe the advantages of using a database management system ¯ identify data entities and relationships
End of Topic Test
An online assessment is provided to help you review this topic.
c
9
Topic 2
Data Modelling
Contents
2.1 Prior Knowledge and Revision . . . 10
2.2 Introduction . . . 10
2.3 Early Data Models . . . 10
2.4 The Relational Model . . . 12
2.5 Data Analysis and Specification . . . 13
2.6 Entity Occurrence Modelling . . . 13
2.6.1 Relationships . . . 14
2.6.2 Keys . . . 15
2.7 Summary . . . 16
Learning Objectives
By the end of this topic you should be able to:
¯ understand the historical motivation for the development of relational database systems
¯ identify other database models in addition to the relational model ¯ correctly interpret an Entity Relationship Diagram (ERD)
10 TOPIC 2. DATA MODELLING
By the end of this topic you should be able to:
¯ understand the historical motivation for the development of relational database
systems
¯ identify other database models in addition to the relational model ¯ correctly interpret an Entity Relationship Diagram (ERD)
¯ design basic ERD from an application scenario
2.1
Prior Knowledge and Revision
Please test your Prior Knowledge using the online revision questions.
2.2
Introduction
This topic builds on the fundamentals of databases and introduces the idea of modelling data. This is done by considering the early methods of data modeling that led to the development of the relational model. The relational model is introduced with some of the basic ideas covered. At this stage this extends to the use of ER diagrams as a representation of a relational data model.
2.3
Early Data Models
The first database systems (in the 1960s) were all hierarchical, with data being stored as trees of records and links as shown below:
2.3. EARLY DATA MODELS 11
Department
Project
Employee
Manager
Worker
Figure 2.1: Hierarchical StorageModel
This type of model is good for one-to-many relationships between entities e.g. a Department having many Employees, with each Employee assigned to just one Department. However, this model would be poor for many-to-many relationships as it is difficult to structure a hierarchy with multiple parent nodes for each child. For example within a Department, many Projects could be underway with many Workers involved. This would be difficult to show using a hierarchical model.
Another type of model is a network DBMS (developed from the 1960s), with data again being stored as records and links as shown below:
c
12 TOPIC 2. DATA MODELLING
Project A
Project B
Employee 1
Employee 2
Employee 3
Employee 4
Department
Figure 2.2: Networked DBMS Storage Model
Here the many-to-many relationship between Project(s) and Employee(s) can be represented. This logical data model is better than the hierarchical model for many-to-many relationships as the top-down hierarchy no longer needs to be enforced. This allows nodes to have multiple relationships with other nodes.
Both network and hierarchical systems provide an efficient means of storing large amounts of data and were a a significant improvement on file-based systems. However, there are some problems associated with network and hierarchical data models. Both models are difficult to navigate and require complex programs to be written to answer a query. Other problems include data dependence where changing the data can mean that application programs have to be changed.
2.4
The Relational Model
The hierarchical and network models do not offer data independence and therefore a better solution was needed. A model that was designed to solve some of these problems is the relational data model.
The relational model was designed to include:
¯ data independence ¯ a sound theory
¯ querying without having to traverse a tree or graph to find information
The relational data model was developed in the 1970s and is currently the most common database model. Example database management systems based on this model include
2.5. DATA ANALYSIS AND SPECIFICATION 13
Oracle, Ingres, IBM DB2, mySQL and Microsoft Access.
In the relational model, data is stored as records in tables and is interrogated using the Structured Query Language (SQL).
2.5
Data Analysis and Specification
Before a designer can create a database using any model, an understanding of the requirements of the system is necessary. This analysis of the data will lead to the choice of tables, identification of keys , and an understanding of the requirements of users of the database. At this early stage of the design process the system is considered in terms of concepts and ideas produce an abstract model of the required system.
In the relational model, this design makes use of Entity-Relationship (E-R) modelling. This modelling is independent of the choice of relational database used to implement the database so it does not matter at this stage whether the system is to be implemented on a large scale using something like Oracle, or on a small scale using a system such as mySQL, Microsoft Access or Filemaker.
The modelling of the requirements is separate from the final implementation so that important design decisions can be taken independently of a particular product.
2.6
Entity Occurrence Modelling
Entity occurrence modelling was developed to help design databases by modelling with objects (entities) and relationships between these objects. The properties associated with an entity are known as the attributes of the entity. Entity Relationship Diagrams (ERDs) use a limited number of components and are easily understood as illustrated below:
Entity
Entity
Attribute
Attribute
Attribute
Figure 2.3: An example of an Entity Relationship Diagrams (ERDs)
ERDs can include entities (represented by rectangles), attributes of these entities (ellipses) and finally relationships (lines with a trapezoid label).
c
14 TOPIC 2. DATA MODELLING
2.6.1 Relationships
Entities can be associated with each other by relationships. The number of entities that another entity relates to is known as the mapping cardinality, or the degree of the relationship, and can be one of the following:
¯ One-to-one
An entity is related to at most one other entity, written as a 1:1 relationship
¯ One-to-many
An entity is related to more than one other entity, written as a 1:M relationship
¯ Many-to-one
A number of entities are related to one entity, written as a M:1 relationship
¯ Many-to-many
Many entities relate to many other entities, written as a M:N relationship.
One to One Examples
Bowmore High School
St. Cuthbert's Academy
St. Joseph's Secondary
Lochmaben High School
Tyree High School
Martin Greer
Scott Smyth
Sarah Kelly
Paul Fryer
Christopher Onions
Figure 2.4: An example of various One to One relationships
Each School has one Head teacher and each head teacher has one school. This can be represented in an ERD as 1:1
Headteacher
School
Figure 2.5: One to One (1:1) One to Many Examples
2.6. ENTITY OCCURRENCE MODELLING 15
Standard Grade English
Higher English
Intermediate English
English for Speakers of Other Languages
Standard Grade Computing
Higher Information Systems
PC Passport Tom Dickson
Elaine Rankin
Figure 2.6: A few examples of One to Many relationships
Here a teacher can teach many courses and each course is tought by one teacher. This can be represented in an ERD as a one to many relationship
Course
Teacher
Figure 2.7: One to Many (1:M) Many to Many Example
On the online course you can see an interactive example of a Many to Many relationship This can be represented in an ERD as a many-to-many relationship.
Course
Pupil
Figure 2.8: Many to Many (M:N)
2.6.2 Keys
It is important to be able to distinguish between different entities. This is achieved by using one or more attributes of that entity as the key that identifies an entity. A key is a set of attributes that uniquely identifies an entity.
For example, the order number attribute of an order entity is a unique identifier thus
order number is a key.
When looking for the key to an entity, there may be more than one attribute or
c
16 TOPIC 2. DATA MODELLING
combination of attributes that could uniquely identify the entities. These attributes under consideration are referred to as candidate keys. Once a key is chosen from the candidate keys as the principal identifier for an entity set, this becomes the primary key.
2.7
Summary
By the end of this topic you should be able to:
¯ undertand the historical motivation for the development of relational database
systems
¯ identify other database models in addition to the relational model ¯ correctly interpret an ERDs
¯ design basic ERDs from an application scenario
End of Topic Test
17
Topic 3
The Relational Model
Contents
3.1 Prior Knowledge and Revision . . . 18 3.2 Introduction . . . 18 3.3 Data Structure . . . 18 3.4 Data Operators . . . 19 3.5 From Data Model to Database . . . 20 3.6 Keys . . . 21 3.7 Translating an Entity-Relationship Diagram into Tables . . . 22 3.8 Data Integrity . . . 23 3.8.1 Entity Integrity . . . 23 3.8.2 Referential Integrity . . . 23 3.8.3 Data Dictionary . . . 24 3.8.4 Data Validation . . . 24 3.9 Summary . . . 25 Learning Objectives
By the end of this topic you should be able to:
¯ describe the relational data model
¯ describe the correspondence between entity-relationship diagrams and relational data models.
¯ explain how data is stored in tables within a database ¯ define a key field
¯ define atomic and compound keys ¯ identify candidate keys for an entity ¯ select a primary key for an entity ¯ describe a surrogate key
¯ use foreign keys in a data model
18 TOPIC 3. THE RELATIONAL MODEL
By the end of this topic you should be able to:
¯ describe the relational data model
¯ describe the correspondence between entity-relationship diagrams and relational
data models.
¯ explain how data is stored in tables within a database ¯ define a key field
¯ define atomic and compound keys ¯ identify candidate keys for an entity ¯ select a primary key for an entity ¯ describe a surrogate key
¯ use foreign keys in a data model
¯ understand the term entity and referential integrity
3.1
Prior Knowledge and Revision
This topic takes a closer look at the relational database model and builds on the introductory work already covered in previous Topics. In order to check your understanding of previous work, take the revision questions on the website.
3.2
Introduction
This topic looks at the relational data model which is the most common data model in use today. This model is defined by the following features:
¯ data structure ¯ data operators ¯ data integrity
3.3
Data Structure
The structure of data within the relational data must comply with the following rules: Each table in the database is identified by a unique name
Each column in a table is identified by a unique name within that table The data in each column of a table is same domain and type
3.4. DATA OPERATORS 19
Each row in a table is unique Each cell in a table contains one value only Rows and columns in a table can be in any order
Columns
Columns within a table must be identified with a unique name. However it is acceptable to have the same name of a column used in separate tables in the same database. The example below shows two tables from the same database, each with columns called ’Name’ and ’Subject’
StudentID Name Subject
TeacherID Name Subject
The data in each column must be from the same domain and type. In the table above, example of the data in the ’Subject’ column would include ’English, Maths, Geography
etc’ . An entry of ’101’, for example, in this column would be incorrect. Rows
Each row within a table must be unique i.e. no row should appear more than once.
Data Items
Client No. Client Contact Date Facility
16523 Eastdyke YC Alan Donnelly
01/09/05 03/09/05 15/09/05 Pool Hall Pool
Data items should also be atomic to help search and maintain the database. However, non-atomic data items are still found in many implemented database. In the example above the Contact name is a multiple data item with two components. This would be better represented by decomposing Contact as Forename and Surname
3.4
Data Operators
Operations on data include:
¯ enter ¯ delete ¯ amend ¯ retrieve
In a relational database these operations are carried out by data operators to allow data to be inserted, deleted, amended, and retrieved respectively.
c
20 TOPIC 3. THE RELATIONAL MODEL
3.5
From Data Model to Database
As has been already mentioned, the relational model uses the following structure: Data is stored as records in tables;
The model is interrogated using a query;
This topic will describe the relational data model in detail. Most DBMS use the terminology tables and records but these are sometimes referred to as relations and tuples respectively.
Each column in a table is known as an attribute. Each table has a heading, and each row in the table is a record.
Name
Street
Town
Joe Smith 1 Brown Road Newton
Omar Singh 3 Straight Street Littletown Eng Min Lee 22 Station Road Bigtown
Attributes
Table
Rows
Once the entities, attributes and relationships in the data model have been defined, a database can be created.
Each entity in the data model becomes a table in the database. Each attribute of an entity becomes a field in a table in the database.
It is not always easy to work out exactly which entities or tables are required in a relational database. This is particularly so because many-to-many relationships between entities are not desirable. A procedure called normalisation is used to help determine which entities are required.
The list of possible values for an attribute is known as its domain. Each attribute must have a distinct name within the table as is shown below:
3.6. KEYS 21
Name Owner Species Sex Birth Death
Fluffy Harold Cat F 04-02-1993 NULL Claws Gwen Cat M 17-03-1994 NULL Buffy Harold Dog F 13-05-1989 NULL Fang Benny Dog M 27-08-1990 NULL Bowser Diane Dog M 31-08-1989 27-09-1995 Chirpy Gwen Bird F 09-12-1997 NULL Whistler Gwen Bird NULL 11-09-1998 NULL Slim Benny Snake M 29-04-1996 NULL Puffball Diane Hamster F 30-03-1999 NULL
The ordering of rows and columns is not significant. Each item must be atomic, i.e. only single items are allowed.
A database may contain multiple tables. Both the relational database itself and each table within it must have unique names assigned to them.
3.6
Keys
In order to establish the relationships between the tables in the database, each record in a table must be able to be uniquely identified. A key is a field, or set of fields, whose values uniquely identify a record.
In any table, there may be more than one field, or set of fields, which can uniquely identify each record-these are called candidate keys. The candidate key which is chosen to be used is called the primary key. Once a primary key is selected, and implemented in a database, it is wise to avoid changing anything about it as these keys act like the foundation stones of the database. Occasions may arise when you may want to change the data in primary key. For example, the contact name of someone in an organisation could be a candidate the primary key in a table called company. However, it is likely that the contact person for a company will change over time which would require significant updating if the contact name was the primary key. It is better to select a primary key that will not change in the course of time. In this example, company name would be better, but that too could change. As far as the database is concerned, arbitrary values are best. So, in this case, the best name for primary key of the company table would be something like company code with numbers identifying the company i.e. company 1, 2, 3 etc.. A key consisting of only one field is called an atomic key.
A key which is made up of a set of fields is called a compound key.
Sometimes it is necessary to "make up" a key to uniquely identify each record. This is called a surrogate key. A surrogate key is usually a number or similar code, which has
c
22 TOPIC 3. THE RELATIONAL MODEL
no meaning or purpose other than to uniquely identify each record. Surrogate keys are used when there is no candidate key available, or when the only candidate key is a large compound key.
What makes a good primary key?
In general, the key should be as short and simple as possible:
¯ an atomic key is a better choice than a compound key
¯ a numeric field is usually a better choice of key than a text field. This is because
they can be stored more compactly in the database than text fields.
A key must also have the following properties:
¯ a key must have a value-it can never be blank (or "null"). ¯ a key should not change over time.
You can now take the online Activities "Choosing a Key" and "Foreign Keys"
3.7
Translating an Entity-Relationship Diagram into Tables
Databases can be designed and specified by means of Entity-Relationship diagrams. However, it has not been shown how to take the information described in an E-R diagram and form tables for a database. In fact, any database that is specified by an E-R diagram can also be represented by a collection of tables. In other words, an E-R design can be converted into a relational design.
Converting an E-R design to a relational design makes use of the following steps:
¯ Convert each entity set to a table
¯ Convert many-to-many and one-to-many relationships ¯ Convert each relationship set to a table of relationships
Each strong entity set is converted to a table, with each attribute of the entity becoming a column in the table. The primary key for each table should be identified along with other candidate keys as appropriate.
In a similar way, each weak entity set is converted to a table consisting of those attributes of weak set combined with the corresponding attributes (i.e. primary key) from the strong set that owns the current set.
Each relationship set becomes a table of relationships with one column for each attribute of the set. Only one-to-one and one-to-many relationships need to be represented as many-to-many can be split into two one-to-many relationships.
For a 1:1 relationship a foreign key is added to one or both tables that are related. For example, consider the case that a customer and an address are related by a 1:1
3.8. DATA INTEGRITY 23
relationship (each customer has an address, and each address is associated with at most 1 customer). In this case the Customer table might have a primary key of customer id, and the Address table might have a primary key of address id. In this case, the customer table would also include a foreign key containing the address id of the Address table, and the Address table would include a foreign key containing the customer id.
Foreign keys are also used to represent one-to-many relationships. A single foreign key represents a one-to-many relation, where the "one" portion of the relation is the reference from the foreign key, and the "many" portion has the foreign key. For example, consider the case of a customer who can place multiple orders. The relationship between customer and order is one-to-many (where one customer can place multiple orders, and each order matches only 1 customer). In this case, the Order table might have a primary key of order number, and the Customer table might have a primary key of customer id. The Order table would therefore contain a foreign key that references the customer id field in the Customer table.
3.8
Data Integrity
A user of a database must have confidence and trust in the data that is stored. The data must accurately reflect what is reality. To achieve this a database uses the rules of
entity integrity and referential integrity.
3.8.1 Entity Integrity
For an entity to be stored in a relational database, it must possess two characteristics:
¯ It must have a primary key which is "non-null", i.e. its value will never be blank ¯ It must have no multi-valued attributes
Together, these two characteristics define what is called "entity integrity".
3.8.2 Referential Integrity
The rules for referential integrity ensure that data linking tables together is present, and accurate. Tables are linked by using foreign keys, therefore referential integrity is concerned with foreign keys and how they link to a primary key in a related table. To ensure referential integrity between linked tables, a foreign key should be linked to the primary key of a related table. This would assume that a foreign key should always be present in order to establish the link. However, there are circumstances when a foreign key can have a null ’value’. The following example illustrates this.
A school football club three teams U13, U15 and U16s. Each team has pool of 15 players, and one player from each team is the Captain. Extracts from two tables from the football database are shown below:
Captain
c
24 TOPIC 3. THE RELATIONAL MODEL
Captain name Team
Greig U13
McNeil U15
Charlton U16
Player
Player name Team Captain name
Simpson U13 McNeil
Henderson U15 Greig
Gemmell U13 McNeil
Ferguson U15 Greig
McNeil U13
Law U18 Charlton
The primary key for the Captain table is Captain name. This attribute appears in the related table Player where it is used as a foreign key to link the two tables. To ensure referential integrity each player should have a captain. However, McNeil is a player and a captain, so Captain name in that row is ’null’.
3.8.3 Data Dictionary
A data dictionary is a formal way of defining what is in a database. It could be considered as data about data! This is often called metadata. The data dictionary is a flat table describing and naming the tables that form a database together with the attributes of the data held in each table.
This will include:
the names of each table;
the data type, size and format of each attribute; the identification of keys and index;
and a range of validation checks.
The data dictionary is an integral part of a relational database and is used to define, control and validate the data and operations in the database. Here is an example of how table attributes are defined in a data dictionary:
Table Name Type Size Validation Key/Index Player Player name Text 50 Required Yes, Primary
Player Team Text 3 Required No
Player Captain nameText 50 Required Yes, Foreign
3.8.4 Data Validation
Data validation is carried out by defining, in the data dictionary, the constraints on
data being accepted into the database. Validation checks will include:
3.9. SUMMARY 25
domain constraints e.g. a surname is not numeric
presence e.g. data that is part of a primary key must be present range e.g. values for a temperature will not be less than -273C
permitted values e.g. American shoe sizes must be integers between 17 and 48 but not 22, 25, 27, 30, 32, 35 or 37
format e.g. a post code
3.9
Summary
By now you should be able to:
¯ describe the relational data model
¯ describe the correspondence between entity-relationship diagrams and relational
data models
¯ explain how data is stored in tables within a database ¯ define a key field
¯ define atomic and compound keys ¯ identify candidate keys for an entity ¯ select a primary key for an entity ¯ describe a surrogate key
¯ use foreign keys in a data model
End of topic Test
End of topic Test
An online assessment is provided to help you review this topic.
c
27
Topic 4
Representing the Relational Data
Model
Contents
4.1 Prior Knowledge and Revision . . . 28 4.2 Introduction . . . 28 4.3 Properties of Tables . . . 28 4.4 Relational Data Modeling . . . 29 4.4.1 Anytown High School . . . 30 4.4.2 ACTIVITIES . . . 30 4.5 Database Integrity: Primary Keys and Entity Integrity . . . 31 4.5.1 Database Integrity: Foreign Keys and Referential Integrity . . . 31 4.6 Summary . . . 32
Learning Objectives
¯ Prior Knowledge and Revision ¯ Learning Objectives
¯ Introduction
¯ Properties of Tables ¯ Relational Data Modeling ¯ Database Integrity
¯ Primary Keys and Entity Integrity ¯ Foreign Keys and Referential Integrity
28 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL
By the end of this topic you will be able to:
¯ Prior Knowledge and Revision ¯ Learning Objectives
¯ Introduction
¯ Properties of Tables ¯ Relational Data Modeling ¯ Database Integrity
¯ Primary Keys and Entity Integrity ¯ Foreign Keys and Referential Integrity
4.1
Prior Knowledge and Revision
In order to check your understanding of previous work, take the revision questions on the website.
Revision Questions
10 min
There is an Online Assesment at this point
4.2
Introduction
This topic builds on the previous one and develops a method for representing a relational database model. To do this successfully the relational database modeler must understand full how the properties of tables in the model affect the final model. Only then can the model be developed to show the relationships between the different entities. Finally, the integrity of the database has to be ensured and this is done by careful consideration of primary and foreign keys and an understanding of the these keys role determine that integrity.
4.3
Properties of Tables
You have seen how tables are used to show how data can be represented in a database. The use of ’columns’ and ’rows’ show how data is structured and how values of data items can be represented. However, in relational data modeling a table has to adhere to a set of rules:
i Every table in the database must have a unique name.
4.4. RELATIONAL DATA MODELING 29
intersect, there is only one value. For example, this table could not be used in a relational database model:
Candidate Number Exam Subject
994667823 English
994532345 English, History
994534332 History, French, Spanish
995657829 English
994657847 History
iii The values that are held under a column, i.e. the values of an attribute, must be in the same domain. Why would the table below not obey this rule?
Candidate Number SName Course
994667823 Anderson 2318
994532345 Anwar Spanish
994534332 Scanlon 2318
995657829 Weir 2318
994657847 Woodrow Spanish
iv Each column has a unique name, i.e. each attribute in the table is distinct. In a relational database the name of an attribute (i.e. the column name) is used for reference e.g. when searching the table, so each attribute must be unique.
v Each row in the table is unique, i.e. no row is duplicated anywhere else in the table. This is a fundamental principle of databases. Each row will contain a primary key that is used to identify the existence of a row in a table. Since the primary key must be unique, then we can say that each row must be unique.
vi The order of the rows and the columns are not significant. In (iv) and (v) above we have seen how reference to columns is by name (e.g. when searching) and reference to rows is by primary key, so as far as the database model is concerned, it is not necessary to have the columns or rows in any particular order. Reference to column 3 or row 5 has no meaning. Compare the table below to the table in (iii) above. What can you deduce?
Candidate Number SName Course
994667823 Anderson 2318
994534332 Scanlon 2318
994532345 Anwar Spanish
994657847 Woodrow Spanish
995657829 Weir 2318
4.4
Relational Data Modeling
This section builds on previous study of entities, attributes and relationships and the modeling techniques used i.e. ER Diagrams
c
30 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL
4.4.1 Anytown High School
Consider some of the entities found in a school:
There are pupils, there are courses, there are teachers (to teach the course) and there are teachers to advise pupils (lets call them guidance teachers). Of course, we could identify lots of other entities (like classrooms, lockers, cleaners, cooks, library books, timetables, appointments, homework, exam grades, etc. etc. etc), but like most data models you will study on this course it will be simplified to help understanding.
Firstly, consider the entities of pupil, and course:
This cannot be represented as two tables in a relational database since several Pupil identifiers(providing a foreign key to link the table) would be needed in each entry of the Course table because the same pupil can study many courses. Similarly several Course identifiers would be needed in each entry of Pupil. That would mean that in the Course table there would be multi values identifying all the pupils doing a particular course and in the Pupil table, multiple values identifying all the course each pupils studied. This breaks the rule (in both tables) of data items requiring to be atomic.
1:M & M:N Relationships
This activity takes the Anytown High Scool example and shows how relationships can be refined to overcome the problems with M:N relationships.
For example, another table can be introduced that will represent unique instances of pupils studying courses. By taking the primary keys of the existing entities we can combine them to form a new entity: A Pupil can be identified by an unique number called pupilID and a course by courseID. Combine these to form the primary key of the new table (called Enrollment) so that every instance of a pupil on a course is uniquely identified. Separately, these attributes become foreign keys in the new table and provide the link to the two existing tables showing the 1:m relationships.
4.4.2 ACTIVITIES
Redraw the ER Diagram in the stages described above and include verbs to describe the relationships between the tables in the model.
ER Diagram 1
10 min
Redraw the ER Diagram in the stages described above and include verbs to describe the relationships between the tables in the model.
From the following scenario identify the entities and relationships and construct an E-R Diagram:
ER Diagram 2
30 min
Kirklee Vegetable Club is a club of gardeners keen on growing vegetables in the allotment plots that are leased from the Council. Each member can rent more than one plot to grow vegetables. Each year the Club records details of the amount vegetables that are grown on each plot (i.e. its produce) as there are prizes awarded to the plot that produces the largest yield of each type of vegetable. Each plot has a unique number and
4.5. DATABASE INTEGRITY: PRIMARY KEYS AND ENTITY INTEGRITY 31
members are identified by their membership number. The vegetables that are grown are identified by the vegetable name (e.g. carrot, potato etc). Any vegetable type may be grown in any of the plots.
Three unique identifiers are mentioned in the scenario and you will have to introduce a new entity to deal with the m:n relationship
Task - Identify the entities and relationships and draw an E-R Diagram that would enable this scenario to be modelled for a relational database.
4.5
Database Integrity: Primary Keys and Entity Integrity
Identifying and declaring a primary key in a table introduces an inherent restriction on the table i.e. that two rows in the table cannot be the same. This allows each row to be identified. For example, the details of pupil in a school database can be found by using a unique identifier i.e. the primary key. If the database is relational then the details of that pupil will be recorded once to avoid data redundancy. There may be several attributes of a pupil that are unique e.g. Scottish Candidate Number (SCN), School Enrollment Number, and possibly National Insurance Number. Only one of these candidate keys can be used as the primary key. If the SCN is used, and no two SCNs are the same, then the uniqueness of each pupil record is guaranteed.
Also, if the primary key is the SCN, then the primary key must have a value for the pupil to exist on the database. This is the same for any primary key in any table and ensures entity integrity i.e. the primary key ensure the existence of a record in a table.
In the pupil example, the value for National Insurance may not exist, but since NI Number is not the primary key, then entity integrity is maintained i.e. a pupil can exist on the school’s database without necessarily having a National Insurance Number.
4.5.1 Database Integrity: Foreign Keys and Referential Integrity
A foreign key in one table has the same values of the primary key in another table. For example the attribute used to identify a pupil (SCN) in the Enrolment table for the Anytown High School model, will be a foreign key because the same attribute is used as the primary key in the Pupil table. It follows that both must have the same value in order to identify a pupil. This is known as referential integrity.
The same applies to the attribute identifying the Course that is been studied and both together i.e. the pupil identifier and the course identifier, form the composite primary key of the enrolment table. (The same entity integrity applies and for each enrolment to exist it must have a pupil and a course identified.)
Referential integrity ensures that a link (or relationship) exists between tables. Not only does it ensure the link, it defines the relationship as 1:m i.e a primary key in one table will exist once, but may appear many times as a foreign key in other tables.
c
32 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL
4.6
Summary
By the end of this topic you will be able to:
¯ Prior Knowledge and Revision ¯ Learning Objectives
¯ Introduction
¯ Properties of Tables ¯ Relational Data Modeling ¯ Database Integrity
¯ Primary Keys and Entity Integrity ¯ Foreign Keys and Referential Integrity
End of topic Test
End of topic Test
33
Topic 5
Normalisation
Contents
5.1 Prior Knowledge and Revision . . . 34
5.2 Introduction . . . 34
5.3 The Normalisation Process . . . 35
5.3.1 Unormalised Form (UNF) . . . 35
5.3.2 First Normal Form (1NF) . . . 37
5.3.3 Second Normal Form (2NF) . . . 39
5.3.4 Third Normal Form (3NF) . . . 41
5.3.5 Normalisation in a Nutshell . . . 42 5.3.6 Beyond 3NF . . . 42 5.4 Why Normalise? . . . 43 5.4.1 Insertion Anomolies . . . 44 5.4.2 Deletion Anomolies . . . 45 5.4.3 Update Anomalies . . . 45 5.5 Summary . . . 45 Learning Objectives
By the end of this topic you will be able to;
¯ analyse and extract data from up to two source documents to produce a model in un-normalised form (UNF)
¯ identify and remove repeating groups of data from UNF to produce a model in first normal form(1NF)
¯ identify and remove partial dependencies from 1NF to produce a model in second normal form (2NF)
¯ identify and remove transitive dependences from 2NF to produce a model in third normal form (3NF)
34 TOPIC 5. NORMALISATION
By the end of this topic you will be able to;
¯ analyse and extract data from up to two source documents to produce a model in
un-normalised form (UNF)
¯ identify and remove repeating groups of data from UNF to produce a model in first
normal form (1NF)
¯ identify and remove partial dependencies from 1NF to produce a model in second
normal form (2NF)
¯ identify and remove transitive dependences from 2NF to produce a model in third
normal form (3NF)
5.1
Prior Knowledge and Revision
The topic assumes that you have previously worked with databases that use linked tables with a single one-to-many relationship and that you can identify and remove multi-valued fields found in records. This may have been done through the study of the corresponding Unit at Intermediate 2 , or another course on Computing or ICT. In order to check your understanding of previous work, take the revision questions on the website.
Revision Questions
10 min
5.2
Introduction
So far we have dealt with entities and the relationships between them and considered the table structure of how these entities can be represented in a relational database model. This topic considers how entities, tables and records can be structured into a form that makes up a relational model. This form is called normal form and this topic will describe a methodical process that takes entities, relations and records to that required form. This process is called normalisation and can be considered in stages from an un-normalised model (UNF), to first normal form (1NF), to second normal form (2NF), and finally to third normal form (3NF). Data can be modelled to higher levels of normalisation but that is beyond the scope of this course. When data is in 3NF it is in a form that can be implemented and used effectively within a relational database software package. The process of normalisation takes account of the relationships between entities that we have been considering in previous topics and the importance of keys within the tables considered. We know that we have a normalised model when every attribute in a table that is not part of the primary key is determined by the primary key and nothing else. We will use the example of the vegetable club to go through the process of normalisation and then consider the problems that arise if the model is not normalised.
5.3. THE NORMALISATION PROCESS 35
5.3
The Normalisation Process
The best way to consider the normalisation process is by working through an example and then practising on other scenarios. The example we will use to take you through the process will be the vegetable club model that was considered earlier. We will look at this methodically to enable you to apply the process to any example you may encounter. We start by analysing an existing scenario and data recording methods (if any exist). From that we can produce an UNF model of the data, progressing to 1NF, 2NF and finally 3NF.
5.3.1 Unormalised Form (UNF)
Case Study: Kirklee Vegetable Club Kirklee Vegetable Club is a club of gardeners keen on growing vegetables in the allotment plots that are leased from the Council. Each member can rent more than one plot to grow vegetables. Each year the ClubRecords details of the amount vegetables that are grown on each plot (i.e. its produce) as there are prizes awarded to the plot that produces the largest yield of each type of vegetable. Each plot has a unique number and members are identified by their membership number. The vegetables that are grown are identified by the VegetableName (e.g. carrot, potato etc). Any vegetable type may be grown in any of the plots.
For every vegetable season the Club secretary keeps card records for each plot. An example is shown below:
Kirklee Vegetable Club 2006
PlotNumber 5
PlotSize 1.2 ha
Member No 102
Member Name James MacArthur
VegetableName Yield ClubRecord
Carrot 5.5 Kg 10.4 Kg
Potato 23.2Kg 36.8 Kg
Onion 12.3Kg 16.4Kg
At the end of the season, a card similar to this will be produced for all 64 plots on the allotment site.
Notes: This is a simplified example that would have limited use in reality e.g. variety types of the same vegetable are not recorded and it is assumed that the model is used only for the one season of growing: only the ClubRecord for yields are carried over from previous seasons. However, there is enough in the model to take you through the normalisation process which is why it has been simplified.
Do not be too concerned about the model not reflecting reality, as it is the process of normalising that you are more concerned with at this stage. As you get more comfortable with the process, then you will be able to normalise more complex situations, but at this level of course, all examples will be simplified in order to keep you focussed on the process. (Details of levels of treatment of examples are outlined in the Course Arrangements e.g. in the examination you will only ever be presented with two source document like the card above, and you will end up with a maximum of four tables when
c
36 TOPIC 5. NORMALISATION
your model is normalised.)
5.3.1.1 UNF - stage (i)
The first stage in producing an unormalised model is to create a single entity that lists all the attributes that are recorded on the original source documents and in the scenario:
PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord
Note: if the same attribute appears twice in the source documents, you only need to have it in the list once.
5.3.1.2 UNF - stage (ii)
Choose an attribute that will identify each record. (In data modelling terms we are selecting a primary key). The scenario tells us that cards are kept for each plot, so the PlotNumber should uniquely identify each card. Indicate this by underlining the attribute:
PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord
5.3.1.3 UNF - stage (iii)
From the source document(s) identify what attributes have more than one entry. From the record cards we see that VegetableName, Yield, and ClubRecord have more than one value. This is called a repeating group of attributes and should not be confused with a multi-valued attribute. There is one multi-valued attribute in this scenario - can you identify it?
Repeating groups of attributes should be identified in the UNF by indenting the list (or by some other means)
5.3. THE NORMALISATION PROCESS 37 PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord
5.3.1.4 UNF - stage (iv)
Finally, the entity should be named and the attributes enclosed in brackets: Plot (PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord) We now have the data model in UNF.
5.3.1.5 UNF Interactivity
The following activity takes you through the UNF process that you have just followed.
UNF Interactivity
5.3.2 First Normal Form (1NF)
We could consider a formal definition of when entities are in 1NF, but that would just confuse you. The simplest way to get to 1NF is to remove repeating groups and form a new entity with them. As it stands, Plot has repeating values for VegetableName, Yield and ClubRecord. An entity in this state would be impossible to implement in a database and to match the correct VegetableName with its associated values for Yield and ClubRecord. By splitting the entity into two separate entities then it would be possible to link the two entities using a foreign key but 1NF models would have some important anomalies limiting its use. These anomalies are considered later on in this topic.
5.3.2.1 1NF stage (i)
Remove the identified repeating group and give it a name:
c
38 TOPIC 5. NORMALISATION Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord)
These two tables are now separated but we need to show how they are linked.
5.3.2.2 1NF stage (ii)
We know that here is a link between them as a plot can produce many vegetables. The primary key (i.e. PlotNumber) provides the link between the entities and should be included in both: Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord PlotNumber)
The attribute PlotNumber is the primary in Plot and is a foreign key in Vegetable. Foreign keys should be indicated with an asterix:
Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord *PlotNumber) 5.3.2.3 1NF Stage (iii)
All that is left to do before the model is in 1NF is to identify the primary key in the second entity. A primary key must be able to identify a unique occurrence of the entity. The VegetableName alone would identify vegetable with the same name of which there would be many and therefore not unique.
5.3. THE NORMALISATION PROCESS 39
its present form the Vegetable entity records details of a vegetable grown on a particular plot. So each occurrence of this entity could be identified by the VegetableName and the PlotNumber. This is called a composite key and indicated by underlining all the attributes in it and listing them together:
Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName *PlotNumber Yield ClubRecord) The model is now in 1NF.
5.3.2.4 1NF Activity
The following activity takes you through the 1NF process you have just followed.
1NF Activity
5.3.3 Second Normal Form (2NF)
Tables are in 2NF when every non-primary key attribute is determined by the entire primary key. In the first table Plot, all attributes other than PlotNumber, which is the primary key, can be determined by the primary key. What this means is that as it stands the non-primary key attributes are dependant on the primary key. You may be thinking that Member Name is dependent on Member Number, but in this form, Member Number is not the primary key and so Member Name is indirectly dependent on the primary key i.e. Member Name is dependent on Member Number which in turn is dependent on PlotNumber, so we can conclude that Member Name is dependent on PlotNumber. This is called transitive dependency which we shall return to in the next normalisation form. The table Vegetable has a composite key consisting of VegetableName and PlotNumber. Vegetable (VegetableName
*PlotNumber Yield
ClubRecord)
Each non-primary key is dependent on the primary key, but are they each dependent on the entire primary key? If not, the table is not in 2NF.
The Yeild of a vegetable is determined by the the type of vegetable it is (i.e. the VegetableName) and where it is grown (i.e. the PlotNumber). There will be many values of Yield for the same VegetableName that can be stored in this form, so in order to find a unique occurrence of a particular vegetable yield we must know both the PlotNumber and the VegetableName. We can conclude that Yield is entirely dependent on the primary key and this part of the model is in 2NF.
c
40 TOPIC 5. NORMALISATION
What about ClubRecord? This is a value for the largest amount of a vegetable grown in the Club in any one year. This vegetable could have been grown in any plot i.e. it is not determined by the PlotNumber; it is determined by the VegetableName. This means that ClubRecord is not fully dependent on the entire Primary Key and therefore not in 2NF. Since it is dependent on part of the primary key we call this a partial dependency and in order to get to 2NF we must split the entity to remove this condition.
(VegetableName (VegetableName *PlotNumber ClubRecord) Yield)
When splitting entities to remove partial dependencies, we must ensure that existing primary key is not split. VegetableName and PlotNumber is still the primary key on the left hand entity, but VegetableName alone becomes the primary key of the entity on the right.
For this example it would be appropriate to assign the name of Vegetable to the entity on the right and assign a new name to the entity on the left. So we would have:
Produce (VegetableName Vegetable (VegetableName
*PlotNumber ClubRecord)
Yield)
The primary key in the entity we are now calling Vegetable is VegetableName. This attribute is left in the entity we are now calling Produce as a foreign key and determining the relationship between the entities. So indicating the keys would give:
Produce (*VegetableName Vegetable (VegetableName
*PlotNumber ClubRecord)
Yield)
These entities are now represented in 2NF The three entities together would be: Plot (PlotNumber PlotSize MemberNumber MemberName) Produce (*VegetableName *PlotNumber Yield) Vegetable (VegetableName ClubRecord)
The relationship between these entities would be represented by an E-R Diagram. The attribute PlotNumber now appears in the entities Plot (as the primary key) and in Produce (as a foreign key), so that must be where the relationship exists. Furthermore, we can deduce from the keys that the relationship is 1:n.
5.3. THE NORMALISATION PROCESS 41
and Vegetable.
2NF Activity
5.3.4 Third Normal Form (3NF)
Entities in third normal form ensure that every attribute is dependent on the primary key and nothing else. When we consider the Plot entity we discovered that although PlotNumber was the primary key identifying each occurrence of the entity, MemberName is also determined by MemberNumber i.e. MemberName is transitively dependent on PlotNumber.
Plot (PlotNumber PlotSize
MemberNumber MemberName)
In 3NF it is not permissible to have attributes transitively dependent on the primary key. To remove transitive dependencies we must create a new entity with MemberNumber the primary key which determines MemberName:
Plot (PlotNumber Member (MemberNumber
MemberNumber MemberName)
MemberName)
MemberName becomes the linking attribute and therefore the foreign key in Plot indicating a m:1 relationship:
Plot (PlotNumber Member (MemberNumber
*MemberNumber) MemberName)
The model would now appear to be in 3NF but if we check the Rules for normalisation we discover that the attribute MemberName is not atomic i.e. it is made up of two character strings representing the members first name and surname.
So by splitting this attribute into atomic values we complete the normalisation process: Plot (PlotNumber PlotSize *MemberNumber) Member (MemberNumber MemberFName MemberSName) Produce (*VegetableName *PlotNumber Yield) Vegetable (VegetableName ClubRecord) c
42 TOPIC 5. NORMALISATION
The normalisation process has only produced 1:m relationships which is the intended outcome.
3NF Activity
5.3.5 Normalisation in a Nutshell
When presented with a scenario to normalise the following stages are required:
1. To get to UNF
a) From the scenario (and source documents) list all the attributes you can identify
b) Identify any repeating group of attribute and indent them on the list c) Choose an attribute to act as a primary key which determines the rest d) Close the list with brackets and give it a name
2. To get to 1NF
a) Form a new entity from the repeating group
b) Include the primary key from UNF to act as foreign key in the new entity c) Identify the primary key in the new entity
d) Add brackets and give it a name
3. To get to 2NF
a) Identify and remove partial dependences making sure that primary keys are present as foreign keys in any new entities
b) Note: this will only apply where there is a composite key (i.e. more than one attribute) in a 1NF entity and where some non-key attributes are dependent only on part of the primary key
4. To get to 3NF
a) Identify and remove transitive dependencies again ensuring the primary/foreign key link
b) Note: transitive dependent attributes and dependent on the primary key via a non-key attribute
5. Finally check the normalisation rules for any other anomalies e.g. non-atomic attributes
5.3.6 Beyond 3NF
At 3NF a relational data model can be adequately implemented and further levels of normalisation are not required. However, they do exist but are beyond the scope of this course. In fact, most of the relational models that you will encounter will satisfy the rules of higher levels of normalisation than 3NF.
You will come across a similar type of situation at the normalisation levels that you will be dealing with in this course. For example, it is not uncommon in some cases to find