• No results found

SQA Higher Information Systems Unit 2: Relational Database Systems

N/A
N/A
Protected

Academic year: 2021

Share "SQA Higher Information Systems Unit 2: Relational Database Systems"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

SCHOLAR Study Guide

SQA Higher Information Systems

Unit

2:

Relational

Database

Systems

David Bethune

Heriot-Watt University

Tom Kelly

Heriot-Watt University

Tom Liversidge

Heriot-Watt University Heriot-Watt University

(2)

First published 2007 by Heriot-Watt University

This edition published in 2007 by Heriot-Watt University Copyright c­2007 Heriot-Watt University

Members of the SCHOLAR Forum may reproduce this publication in whole or in part for educational purposes within their establishment providing that no profit accrues at any stage, Any other use of the materials is governed by the general copyright statement that follows. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, without written permission from the publisher. Heriot-Watt University accepts no responsibility or liability whatsoever with regard to the information contained in this study guide.

SCHOLAR Study Guide Unit 2: Information Systems 1. Information Systems

Printed and bound in Great Britain by Graphic and Printing Services, Heriot-Watt University, Edinburgh.

(3)

Acknowledgements

Thanks are due to the members of Heriot-Watt University’s SCHOLAR team who planned and created these materials, and to the many colleagues who reviewed the content.

We would like to acknowledge the assistance of the education authorities, colleges, teachers and students who contributed to the SCHOLAR programme and who evaluated these materials. Grateful acknowledgement is made for permission to use the following material in the

SCHOLAR programme:

The Scottish Qualifications Authority for permission to use Past Papers assessments. The Scottish Executive for financial support.

All brand names, product names, logos and related devices are used for identification purposes only and are trademarks, registered trademarks or service marks of their respective holders.

(4)

i

Contents

1 Database Fundamentals 1

1.1 Prior Knowledge and Revision . . . 2

1.2 Introduction . . . 2

1.3 Traditional Databases . . . 3

1.4 Database Management Systems (DBMS) . . . 4

1.5 Data, Entities and Relationships . . . 5

1.6 Summary . . . 6

2 Data Modelling 9 2.1 Prior Knowledge and Revision . . . 10

2.2 Introduction . . . 10

2.3 Early Data Models . . . 10

2.4 The Relational Model . . . 12

2.5 Data Analysis and Specification . . . 13

2.6 Entity Occurrence Modelling . . . 13

2.7 Summary . . . 16

3 The Relational Model 17 3.1 Prior Knowledge and Revision . . . 18

3.2 Introduction . . . 18

3.3 Data Structure . . . 18

3.4 Data Operators . . . 19

3.5 From Data Model to Database . . . 20

3.6 Keys . . . 21

3.7 Translating an Entity-Relationship Diagram into Tables . . . 22

3.8 Data Integrity . . . 23

3.9 Summary . . . 25

4 Representing the Relational Data Model 27 4.1 Prior Knowledge and Revision . . . 28

4.2 Introduction . . . 28

4.3 Properties of Tables . . . 28

4.4 Relational Data Modeling . . . 29

4.5 Database Integrity: Primary Keys and Entity Integrity . . . 31

4.6 Summary . . . 32

5 Normalisation 33 5.1 Prior Knowledge and Revision . . . 34

5.2 Introduction . . . 34

(5)

ii CONTENTS

5.4 Why Normalise? . . . 43 5.5 Summary . . . 45

Glossary 47

c

(6)
(7)

1

Topic 1

Database Fundamentals

Contents

1.1 Prior Knowledge and Revision . . . 2

1.2 Introduction . . . 2

1.2.1 Database Terminology . . . 2

1.3 Traditional Databases . . . 3

1.3.1 Problems with file-based systems . . . 3

1.3.2 The Database Solution . . . 4

1.4 Database Management Systems (DBMS) . . . 4

1.4.1 Advantages of Using a Database System . . . 5

1.5 Data, Entities and Relationships . . . 5

1.5.1 Relationships . . . 6

1.6 Summary . . . 6

Learning Objectives

By the end of this topic you should be able to:

¯ define a database management system

¯ describe the problems associated with traditional paper-based databases ¯ define a database

¯ describe the advantages of using computerised databases ¯ identify data entities and relationships

(8)

2 TOPIC 1. DATABASE FUNDAMENTALS

1.1

Prior Knowledge and Revision

The topic assumes that you have previously considered some of the basic principles of database design including the benefits of electronic systems in comparison with manual databases. This may have been done through the study of the corresponding Unit at Intermediate 2 or in Standard Grade Computing Studies, or any other course on Computing or ICT.

Check your understanding by trying some revision questions.

1.2

Introduction

This topic takes an introductory look at how the storage, processing and retrieval of information have developed from paper based systems to computerised databases. The problems encountered in such systems are covered and what needs to be considered to overcome them. Included in this topic are:

1. the terminology of databases

2. a look at tradition forms of databases

3. problems associated with traditional methods 4. advantages in using database systems

1.2.1 Database Terminology

A database is a collection of related information about a set of persons, objects or events.

Databases have been around for longer than computers! Some common everyday

examples of databases which are stored on paper are:

an address book which stores names, addresses and other contact details about people

the Phone BookTM which stores names, addresses and telephone numbersabout people or companies a recipe book which stores ingredients and instructions about dishes

Nowadays, most databases are stored electronically on computers or other electronic equipment. Examples include:

your contacts list in your email or instant messaging application

which stores contact information about people

names and numbers in your mobile phone which stores contact information about people

playlists in an MP3 player which stores information about tracks of music

A database management system (DBMS) is a software package which is used to create, manipulate and present data from electronic databases. Example include Microsoft Access, Filemaker Pro, MySQL, Oracle, Ingres, Postgres, SQL Server and many others.

(9)

1.3. TRADITIONAL DATABASES 3

1.3

Traditional Databases

Why do database systems exist?

In order to understand the answer to this important question, it is useful to consider what was used before database systems were created. The first databases were manual paper-based systems. Often paper records were organised into folders, and stored in filing cabinets. There were several problems associated with paper-based databases:

¯ the storage of paper records was very bulky, often requiring several large filing

cabinets

¯ it was very easy to mis-file a paper record, or for records to be lost or damaged ¯ data was often duplicated in several records

¯ keeping records up-to-date was difficult and time consuming, and often resulted in

data inconsistency, where duplicated values were updated in one record but not in others

¯ many people were employed to maintain the records, which was costly ¯ searching for records was time consuming

¯ producing reports, such as sorted lists or data collated from several sources, was

extremely time consuming, if not impossible

The first computerised data processing systems were file-based systems with data stored in files, and application programs operating on that data.

Although this system was an improvement on previous approaches where data was stored within the application itself, there were still various difficulties with file-based systems.

1.3.1 Problems with file-based systems

If a system is developed following an evolutionary approach, with the systems being created and modified, then the structure of the data may need to change. Changes to this data means that all programs that refer to this data must also be changed. This problem is often known as data dependence.

c

(10)

4 TOPIC 1. DATABASE FUNDAMENTALS

Data dependence

Please visit the online course to view an interactive demonstration of this concept. In a file-based systems, the same information may well be duplicated in multiple files. For example, the full name and address of a customer may appear in several files, depending on what sort of orders they placed (small or large) and when they placed them. This is known as duplication of data, sometimes known as data redundancy because the same information does not need to be repeated.

This duplication of data often leads to another problem, that of inconsistency of the data where different copies of the data do not agree.

Data Redundancy

Please visit the online course to view an interactive demonstration of this concept.

1.3.2 The Database Solution

As a result of these problems a more effective solution was required. This solution is having a collection of related data in a database. This database is then controlled and accessed by means of a database management system (DBMS). It should be noted, however, that although a DBMS can help to address the problem, it still requires certain rules to be followed. In particular, if a database is not designed correctly, issues such as duplication leading to inconsistency can still occur.

Good design of the system is therefore essential to avoid these potential problems.

1.4

Database Management Systems (DBMS)

A database management system (DBMS, or database system) is a collection of interrelated data and a set of programs to manage and access that data.

It provides the ability to present many different views of the database to support many different types of user. In fact, most database systems have a variety of people who use it, all for different purposes. Common categories of user include database designers, database administrators, application programmers as well as general users. These users all use the database system in different contexts with differing requirements. Here are some examples of different users of a database system:

¯ Database designers will be involved in tasks including storage and structure of the

database.

¯ Database administrators will determine access authorisation for users.

¯ Application programmers will work on the development of application programs

that operate on the data.

¯ General database users manipulate the data in the database system by means

of application programs, or by using queries to interact with the database more directly.

(11)

1.5. DATA, ENTITIES AND RELATIONSHIPS 5

1.4.1 Advantages of Using a Database System

Computerised databases have several advantages over manual databases:

¯ The ability to store large amounts of data

¯ Searching and sorting operations can be performed much more quickly and easily

than manually

¯ Information is more easily available to users, due to improved methods of data

retrieval.

¯ Databases are usually stored and accessed using computers in a local area

network (LAN). This means that any user using a computer on the LAN can access the data when required. (Please note, this is not a requirement of all database management systems as some smaller "personal" database management systems exist that cater only for single user applications. However, it is still a desirable feature, and can be essential for some applications.)

¯ Security and integrity of data is improved, due to the methods of data storage and

updating

¯ There is no data redundancy: that is, no unnecessary duplication of data. In a

relational database, data is never stored more often than is necessary

¯ Data consistency: this means that when data is updated, this is only done once,

so removing the possibility of data inconsistency. As soon as the data is updated, all users have access to it.

¯ Data independence: the software which processes the data is separate from

the data itself. This means that several programs can be developed to perform operations on the data without affecting the data itself.

1.5

Data, Entities and Relationships

A data entity is something that information will be recorded about in a database. This could be a person, an object, or something abstract like a holiday booking. Examples of entities about persons include a customer, airline passenger, a library borrower, an employee, a school pupil, a college student. Example of objects include a product, a flight, a book, a job, a course at school or college. Examples of abstract entities include a borrowing, a booking, or a rental.

An entity set is a collection of entities of the same type and is represented as a table in a database system.

Each entity has a set of attributes which describe examples or instances of that entity. For example, the attributes of a school pupil might include the following: forename, surname, address, date of birth, year group, tutor group.

An instance of a school pupil might be Joanna Hamilton, whose attributes are: Joanna, Hamilton, 10 Main Road, Kinglass, 01/04/1990, 5, SW.

The attributes of a DVD Rental entity are code, title, cost, date out, date due and

c

(12)

6 TOPIC 1. DATABASE FUNDAMENTALS

member number, and the attributes of the Member entity are member number, name and telephone number.

Attributes will have constraints on what is permitted. For example, an attribute representing a person’s name will contain characters only. This is called the domain of an attribute and will define the type and value of data that the attribute can hold. Examples of data types that attribute may use are:

Text - e.g. a person’s name

Integer - e.g. the number of books borrowed from a library Real - e.g. the conversion rate for foreign currency

Object - a photographic image Boolean - a True or False value Date - someone’s date-of-birth Time - the departure of a train

Entities

Please visit the online course to view an interactive demonstration of this concept.

1.5.1 Relationships

Entities can be linked, or related to one-another. When a link exists, it is called a. A school will have a headteacher, and that headteacher will have a school. This is a one-to-one relationship between the entities School and Headteacher.

In the same school, a pupil will study many subject and each subject will be studied by many pupils. This is a many-to-many relationship between the entities Pupil and Subject.

A one-to-many relationship in a school exists between the entities School and Pupil. Here a school has many pupils and each pupil attends one school.

These relationships form the basis of how relational database systems are modelled. There is a table on the online course shows how some entities can be linked by a relationship.

Some of these will be further exemplified in the next Topic

1.6

Summary

By now you should be able to:

¯ describe a database in terms of information storage, processing and retrieval ¯ describe the problems associated with traditional paper-based databases

(13)

1.6. SUMMARY 7

¯ describe the problems associated with early file-based systems ¯ describe the advantages of using a database management system ¯ identify data entities and relationships

End of Topic Test

An online assessment is provided to help you review this topic.

c

(14)
(15)

9

Topic 2

Data Modelling

Contents

2.1 Prior Knowledge and Revision . . . 10

2.2 Introduction . . . 10

2.3 Early Data Models . . . 10

2.4 The Relational Model . . . 12

2.5 Data Analysis and Specification . . . 13

2.6 Entity Occurrence Modelling . . . 13

2.6.1 Relationships . . . 14

2.6.2 Keys . . . 15

2.7 Summary . . . 16

Learning Objectives

By the end of this topic you should be able to:

¯ understand the historical motivation for the development of relational database systems

¯ identify other database models in addition to the relational model ¯ correctly interpret an Entity Relationship Diagram (ERD)

(16)

10 TOPIC 2. DATA MODELLING

By the end of this topic you should be able to:

¯ understand the historical motivation for the development of relational database

systems

¯ identify other database models in addition to the relational model ¯ correctly interpret an Entity Relationship Diagram (ERD)

¯ design basic ERD from an application scenario

2.1

Prior Knowledge and Revision

Please test your Prior Knowledge using the online revision questions.

2.2

Introduction

This topic builds on the fundamentals of databases and introduces the idea of modelling data. This is done by considering the early methods of data modeling that led to the development of the relational model. The relational model is introduced with some of the basic ideas covered. At this stage this extends to the use of ER diagrams as a representation of a relational data model.

2.3

Early Data Models

The first database systems (in the 1960s) were all hierarchical, with data being stored as trees of records and links as shown below:

(17)

2.3. EARLY DATA MODELS 11

Department

Project

Employee

Manager

Worker

Figure 2.1: Hierarchical StorageModel

This type of model is good for one-to-many relationships between entities e.g. a Department having many Employees, with each Employee assigned to just one Department. However, this model would be poor for many-to-many relationships as it is difficult to structure a hierarchy with multiple parent nodes for each child. For example within a Department, many Projects could be underway with many Workers involved. This would be difficult to show using a hierarchical model.

Another type of model is a network DBMS (developed from the 1960s), with data again being stored as records and links as shown below:

c

(18)

12 TOPIC 2. DATA MODELLING

Project A

Project B

Employee 1

Employee 2

Employee 3

Employee 4

Department

Figure 2.2: Networked DBMS Storage Model

Here the many-to-many relationship between Project(s) and Employee(s) can be represented. This logical data model is better than the hierarchical model for many-to-many relationships as the top-down hierarchy no longer needs to be enforced. This allows nodes to have multiple relationships with other nodes.

Both network and hierarchical systems provide an efficient means of storing large amounts of data and were a a significant improvement on file-based systems. However, there are some problems associated with network and hierarchical data models. Both models are difficult to navigate and require complex programs to be written to answer a query. Other problems include data dependence where changing the data can mean that application programs have to be changed.

2.4

The Relational Model

The hierarchical and network models do not offer data independence and therefore a better solution was needed. A model that was designed to solve some of these problems is the relational data model.

The relational model was designed to include:

¯ data independence ¯ a sound theory

¯ querying without having to traverse a tree or graph to find information

The relational data model was developed in the 1970s and is currently the most common database model. Example database management systems based on this model include

(19)

2.5. DATA ANALYSIS AND SPECIFICATION 13

Oracle, Ingres, IBM DB2, mySQL and Microsoft Access.

In the relational model, data is stored as records in tables and is interrogated using the Structured Query Language (SQL).

2.5

Data Analysis and Specification

Before a designer can create a database using any model, an understanding of the requirements of the system is necessary. This analysis of the data will lead to the choice of tables, identification of keys , and an understanding of the requirements of users of the database. At this early stage of the design process the system is considered in terms of concepts and ideas produce an abstract model of the required system.

In the relational model, this design makes use of Entity-Relationship (E-R) modelling. This modelling is independent of the choice of relational database used to implement the database so it does not matter at this stage whether the system is to be implemented on a large scale using something like Oracle, or on a small scale using a system such as mySQL, Microsoft Access or Filemaker.

The modelling of the requirements is separate from the final implementation so that important design decisions can be taken independently of a particular product.

2.6

Entity Occurrence Modelling

Entity occurrence modelling was developed to help design databases by modelling with objects (entities) and relationships between these objects. The properties associated with an entity are known as the attributes of the entity. Entity Relationship Diagrams (ERDs) use a limited number of components and are easily understood as illustrated below:

Entity

Entity

Attribute

Attribute

Attribute

Figure 2.3: An example of an Entity Relationship Diagrams (ERDs)

ERDs can include entities (represented by rectangles), attributes of these entities (ellipses) and finally relationships (lines with a trapezoid label).

c

(20)

14 TOPIC 2. DATA MODELLING

2.6.1 Relationships

Entities can be associated with each other by relationships. The number of entities that another entity relates to is known as the mapping cardinality, or the degree of the relationship, and can be one of the following:

¯ One-to-one

An entity is related to at most one other entity, written as a 1:1 relationship

¯ One-to-many

An entity is related to more than one other entity, written as a 1:M relationship

¯ Many-to-one

A number of entities are related to one entity, written as a M:1 relationship

¯ Many-to-many

Many entities relate to many other entities, written as a M:N relationship.

One to One Examples

Bowmore High School

St. Cuthbert's Academy

St. Joseph's Secondary

Lochmaben High School

Tyree High School

Martin Greer

Scott Smyth

Sarah Kelly

Paul Fryer

Christopher Onions

Figure 2.4: An example of various One to One relationships

Each School has one Head teacher and each head teacher has one school. This can be represented in an ERD as 1:1

Headteacher

School

Figure 2.5: One to One (1:1) One to Many Examples

(21)

2.6. ENTITY OCCURRENCE MODELLING 15

Standard Grade English

Higher English

Intermediate English

English for Speakers of Other Languages

Standard Grade Computing

Higher Information Systems

PC Passport Tom Dickson

Elaine Rankin

Figure 2.6: A few examples of One to Many relationships

Here a teacher can teach many courses and each course is tought by one teacher. This can be represented in an ERD as a one to many relationship

Course

Teacher

Figure 2.7: One to Many (1:M) Many to Many Example

On the online course you can see an interactive example of a Many to Many relationship This can be represented in an ERD as a many-to-many relationship.

Course

Pupil

Figure 2.8: Many to Many (M:N)

2.6.2 Keys

It is important to be able to distinguish between different entities. This is achieved by using one or more attributes of that entity as the key that identifies an entity. A key is a set of attributes that uniquely identifies an entity.

For example, the order number attribute of an order entity is a unique identifier thus

order number is a key.

When looking for the key to an entity, there may be more than one attribute or

c

(22)

16 TOPIC 2. DATA MODELLING

combination of attributes that could uniquely identify the entities. These attributes under consideration are referred to as candidate keys. Once a key is chosen from the candidate keys as the principal identifier for an entity set, this becomes the primary key.

2.7

Summary

By the end of this topic you should be able to:

¯ undertand the historical motivation for the development of relational database

systems

¯ identify other database models in addition to the relational model ¯ correctly interpret an ERDs

¯ design basic ERDs from an application scenario

End of Topic Test

(23)

17

Topic 3

The Relational Model

Contents

3.1 Prior Knowledge and Revision . . . 18 3.2 Introduction . . . 18 3.3 Data Structure . . . 18 3.4 Data Operators . . . 19 3.5 From Data Model to Database . . . 20 3.6 Keys . . . 21 3.7 Translating an Entity-Relationship Diagram into Tables . . . 22 3.8 Data Integrity . . . 23 3.8.1 Entity Integrity . . . 23 3.8.2 Referential Integrity . . . 23 3.8.3 Data Dictionary . . . 24 3.8.4 Data Validation . . . 24 3.9 Summary . . . 25 Learning Objectives

By the end of this topic you should be able to:

¯ describe the relational data model

¯ describe the correspondence between entity-relationship diagrams and relational data models.

¯ explain how data is stored in tables within a database ¯ define a key field

¯ define atomic and compound keys ¯ identify candidate keys for an entity ¯ select a primary key for an entity ¯ describe a surrogate key

¯ use foreign keys in a data model

(24)

18 TOPIC 3. THE RELATIONAL MODEL

By the end of this topic you should be able to:

¯ describe the relational data model

¯ describe the correspondence between entity-relationship diagrams and relational

data models.

¯ explain how data is stored in tables within a database ¯ define a key field

¯ define atomic and compound keys ¯ identify candidate keys for an entity ¯ select a primary key for an entity ¯ describe a surrogate key

¯ use foreign keys in a data model

¯ understand the term entity and referential integrity

3.1

Prior Knowledge and Revision

This topic takes a closer look at the relational database model and builds on the introductory work already covered in previous Topics. In order to check your understanding of previous work, take the revision questions on the website.

3.2

Introduction

This topic looks at the relational data model which is the most common data model in use today. This model is defined by the following features:

¯ data structure ¯ data operators ¯ data integrity

3.3

Data Structure

The structure of data within the relational data must comply with the following rules: Each table in the database is identified by a unique name

Each column in a table is identified by a unique name within that table The data in each column of a table is same domain and type

(25)

3.4. DATA OPERATORS 19

Each row in a table is unique Each cell in a table contains one value only Rows and columns in a table can be in any order

Columns

Columns within a table must be identified with a unique name. However it is acceptable to have the same name of a column used in separate tables in the same database. The example below shows two tables from the same database, each with columns called ’Name’ and ’Subject’

StudentID Name Subject

TeacherID Name Subject

The data in each column must be from the same domain and type. In the table above, example of the data in the ’Subject’ column would include ’English, Maths, Geography

etc’ . An entry of ’101’, for example, in this column would be incorrect. Rows

Each row within a table must be unique i.e. no row should appear more than once.

Data Items

Client No. Client Contact Date Facility

16523 Eastdyke YC Alan Donnelly

01/09/05 03/09/05 15/09/05 Pool Hall Pool

Data items should also be atomic to help search and maintain the database. However, non-atomic data items are still found in many implemented database. In the example above the Contact name is a multiple data item with two components. This would be better represented by decomposing Contact as Forename and Surname

3.4

Data Operators

Operations on data include:

¯ enter ¯ delete ¯ amend ¯ retrieve

In a relational database these operations are carried out by data operators to allow data to be inserted, deleted, amended, and retrieved respectively.

c

(26)

20 TOPIC 3. THE RELATIONAL MODEL

3.5

From Data Model to Database

As has been already mentioned, the relational model uses the following structure: Data is stored as records in tables;

The model is interrogated using a query;

This topic will describe the relational data model in detail. Most DBMS use the terminology tables and records but these are sometimes referred to as relations and tuples respectively.

Each column in a table is known as an attribute. Each table has a heading, and each row in the table is a record.

Name

Street

Town

Joe Smith 1 Brown Road Newton

Omar Singh 3 Straight Street Littletown Eng Min Lee 22 Station Road Bigtown

Attributes

Table

Rows

Once the entities, attributes and relationships in the data model have been defined, a database can be created.

Each entity in the data model becomes a table in the database. Each attribute of an entity becomes a field in a table in the database.

It is not always easy to work out exactly which entities or tables are required in a relational database. This is particularly so because many-to-many relationships between entities are not desirable. A procedure called normalisation is used to help determine which entities are required.

The list of possible values for an attribute is known as its domain. Each attribute must have a distinct name within the table as is shown below:

(27)

3.6. KEYS 21

Name Owner Species Sex Birth Death

Fluffy Harold Cat F 04-02-1993 NULL Claws Gwen Cat M 17-03-1994 NULL Buffy Harold Dog F 13-05-1989 NULL Fang Benny Dog M 27-08-1990 NULL Bowser Diane Dog M 31-08-1989 27-09-1995 Chirpy Gwen Bird F 09-12-1997 NULL Whistler Gwen Bird NULL 11-09-1998 NULL Slim Benny Snake M 29-04-1996 NULL Puffball Diane Hamster F 30-03-1999 NULL

The ordering of rows and columns is not significant. Each item must be atomic, i.e. only single items are allowed.

A database may contain multiple tables. Both the relational database itself and each table within it must have unique names assigned to them.

3.6

Keys

In order to establish the relationships between the tables in the database, each record in a table must be able to be uniquely identified. A key is a field, or set of fields, whose values uniquely identify a record.

In any table, there may be more than one field, or set of fields, which can uniquely identify each record-these are called candidate keys. The candidate key which is chosen to be used is called the primary key. Once a primary key is selected, and implemented in a database, it is wise to avoid changing anything about it as these keys act like the foundation stones of the database. Occasions may arise when you may want to change the data in primary key. For example, the contact name of someone in an organisation could be a candidate the primary key in a table called company. However, it is likely that the contact person for a company will change over time which would require significant updating if the contact name was the primary key. It is better to select a primary key that will not change in the course of time. In this example, company name would be better, but that too could change. As far as the database is concerned, arbitrary values are best. So, in this case, the best name for primary key of the company table would be something like company code with numbers identifying the company i.e. company 1, 2, 3 etc.. A key consisting of only one field is called an atomic key.

A key which is made up of a set of fields is called a compound key.

Sometimes it is necessary to "make up" a key to uniquely identify each record. This is called a surrogate key. A surrogate key is usually a number or similar code, which has

c

(28)

22 TOPIC 3. THE RELATIONAL MODEL

no meaning or purpose other than to uniquely identify each record. Surrogate keys are used when there is no candidate key available, or when the only candidate key is a large compound key.

What makes a good primary key?

In general, the key should be as short and simple as possible:

¯ an atomic key is a better choice than a compound key

¯ a numeric field is usually a better choice of key than a text field. This is because

they can be stored more compactly in the database than text fields.

A key must also have the following properties:

¯ a key must have a value-it can never be blank (or "null"). ¯ a key should not change over time.

You can now take the online Activities "Choosing a Key" and "Foreign Keys"

3.7

Translating an Entity-Relationship Diagram into Tables

Databases can be designed and specified by means of Entity-Relationship diagrams. However, it has not been shown how to take the information described in an E-R diagram and form tables for a database. In fact, any database that is specified by an E-R diagram can also be represented by a collection of tables. In other words, an E-R design can be converted into a relational design.

Converting an E-R design to a relational design makes use of the following steps:

¯ Convert each entity set to a table

¯ Convert many-to-many and one-to-many relationships ¯ Convert each relationship set to a table of relationships

Each strong entity set is converted to a table, with each attribute of the entity becoming a column in the table. The primary key for each table should be identified along with other candidate keys as appropriate.

In a similar way, each weak entity set is converted to a table consisting of those attributes of weak set combined with the corresponding attributes (i.e. primary key) from the strong set that owns the current set.

Each relationship set becomes a table of relationships with one column for each attribute of the set. Only one-to-one and one-to-many relationships need to be represented as many-to-many can be split into two one-to-many relationships.

For a 1:1 relationship a foreign key is added to one or both tables that are related. For example, consider the case that a customer and an address are related by a 1:1

(29)

3.8. DATA INTEGRITY 23

relationship (each customer has an address, and each address is associated with at most 1 customer). In this case the Customer table might have a primary key of customer id, and the Address table might have a primary key of address id. In this case, the customer table would also include a foreign key containing the address id of the Address table, and the Address table would include a foreign key containing the customer id.

Foreign keys are also used to represent one-to-many relationships. A single foreign key represents a one-to-many relation, where the "one" portion of the relation is the reference from the foreign key, and the "many" portion has the foreign key. For example, consider the case of a customer who can place multiple orders. The relationship between customer and order is one-to-many (where one customer can place multiple orders, and each order matches only 1 customer). In this case, the Order table might have a primary key of order number, and the Customer table might have a primary key of customer id. The Order table would therefore contain a foreign key that references the customer id field in the Customer table.

3.8

Data Integrity

A user of a database must have confidence and trust in the data that is stored. The data must accurately reflect what is reality. To achieve this a database uses the rules of

entity integrity and referential integrity.

3.8.1 Entity Integrity

For an entity to be stored in a relational database, it must possess two characteristics:

¯ It must have a primary key which is "non-null", i.e. its value will never be blank ¯ It must have no multi-valued attributes

Together, these two characteristics define what is called "entity integrity".

3.8.2 Referential Integrity

The rules for referential integrity ensure that data linking tables together is present, and accurate. Tables are linked by using foreign keys, therefore referential integrity is concerned with foreign keys and how they link to a primary key in a related table. To ensure referential integrity between linked tables, a foreign key should be linked to the primary key of a related table. This would assume that a foreign key should always be present in order to establish the link. However, there are circumstances when a foreign key can have a null ’value’. The following example illustrates this.

A school football club three teams U13, U15 and U16s. Each team has pool of 15 players, and one player from each team is the Captain. Extracts from two tables from the football database are shown below:

Captain

c

(30)

24 TOPIC 3. THE RELATIONAL MODEL

Captain name Team

Greig U13

McNeil U15

Charlton U16

Player

Player name Team Captain name

Simpson U13 McNeil

Henderson U15 Greig

Gemmell U13 McNeil

Ferguson U15 Greig

McNeil U13

Law U18 Charlton

The primary key for the Captain table is Captain name. This attribute appears in the related table Player where it is used as a foreign key to link the two tables. To ensure referential integrity each player should have a captain. However, McNeil is a player and a captain, so Captain name in that row is ’null’.

3.8.3 Data Dictionary

A data dictionary is a formal way of defining what is in a database. It could be considered as data about data! This is often called metadata. The data dictionary is a flat table describing and naming the tables that form a database together with the attributes of the data held in each table.

This will include:

the names of each table;

the data type, size and format of each attribute; the identification of keys and index;

and a range of validation checks.

The data dictionary is an integral part of a relational database and is used to define, control and validate the data and operations in the database. Here is an example of how table attributes are defined in a data dictionary:

Table Name Type Size Validation Key/Index Player Player name Text 50 Required Yes, Primary

Player Team Text 3 Required No

Player Captain nameText 50 Required Yes, Foreign

3.8.4 Data Validation

Data validation is carried out by defining, in the data dictionary, the constraints on

data being accepted into the database. Validation checks will include:

(31)

3.9. SUMMARY 25

domain constraints e.g. a surname is not numeric

presence e.g. data that is part of a primary key must be present range e.g. values for a temperature will not be less than -273C

permitted values e.g. American shoe sizes must be integers between 17 and 48 but not 22, 25, 27, 30, 32, 35 or 37

format e.g. a post code

3.9

Summary

By now you should be able to:

¯ describe the relational data model

¯ describe the correspondence between entity-relationship diagrams and relational

data models

¯ explain how data is stored in tables within a database ¯ define a key field

¯ define atomic and compound keys ¯ identify candidate keys for an entity ¯ select a primary key for an entity ¯ describe a surrogate key

¯ use foreign keys in a data model

End of topic Test

End of topic Test

An online assessment is provided to help you review this topic.

c

(32)
(33)

27

Topic 4

Representing the Relational Data

Model

Contents

4.1 Prior Knowledge and Revision . . . 28 4.2 Introduction . . . 28 4.3 Properties of Tables . . . 28 4.4 Relational Data Modeling . . . 29 4.4.1 Anytown High School . . . 30 4.4.2 ACTIVITIES . . . 30 4.5 Database Integrity: Primary Keys and Entity Integrity . . . 31 4.5.1 Database Integrity: Foreign Keys and Referential Integrity . . . 31 4.6 Summary . . . 32

Learning Objectives

¯ Prior Knowledge and Revision ¯ Learning Objectives

¯ Introduction

¯ Properties of Tables ¯ Relational Data Modeling ¯ Database Integrity

¯ Primary Keys and Entity Integrity ¯ Foreign Keys and Referential Integrity

(34)

28 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL

By the end of this topic you will be able to:

¯ Prior Knowledge and Revision ¯ Learning Objectives

¯ Introduction

¯ Properties of Tables ¯ Relational Data Modeling ¯ Database Integrity

¯ Primary Keys and Entity Integrity ¯ Foreign Keys and Referential Integrity

4.1

Prior Knowledge and Revision

In order to check your understanding of previous work, take the revision questions on the website.

Revision Questions

10 min

There is an Online Assesment at this point

4.2

Introduction

This topic builds on the previous one and develops a method for representing a relational database model. To do this successfully the relational database modeler must understand full how the properties of tables in the model affect the final model. Only then can the model be developed to show the relationships between the different entities. Finally, the integrity of the database has to be ensured and this is done by careful consideration of primary and foreign keys and an understanding of the these keys role determine that integrity.

4.3

Properties of Tables

You have seen how tables are used to show how data can be represented in a database. The use of ’columns’ and ’rows’ show how data is structured and how values of data items can be represented. However, in relational data modeling a table has to adhere to a set of rules:

i Every table in the database must have a unique name.

(35)

4.4. RELATIONAL DATA MODELING 29

intersect, there is only one value. For example, this table could not be used in a relational database model:

Candidate Number Exam Subject

994667823 English

994532345 English, History

994534332 History, French, Spanish

995657829 English

994657847 History

iii The values that are held under a column, i.e. the values of an attribute, must be in the same domain. Why would the table below not obey this rule?

Candidate Number SName Course

994667823 Anderson 2318

994532345 Anwar Spanish

994534332 Scanlon 2318

995657829 Weir 2318

994657847 Woodrow Spanish

iv Each column has a unique name, i.e. each attribute in the table is distinct. In a relational database the name of an attribute (i.e. the column name) is used for reference e.g. when searching the table, so each attribute must be unique.

v Each row in the table is unique, i.e. no row is duplicated anywhere else in the table. This is a fundamental principle of databases. Each row will contain a primary key that is used to identify the existence of a row in a table. Since the primary key must be unique, then we can say that each row must be unique.

vi The order of the rows and the columns are not significant. In (iv) and (v) above we have seen how reference to columns is by name (e.g. when searching) and reference to rows is by primary key, so as far as the database model is concerned, it is not necessary to have the columns or rows in any particular order. Reference to column 3 or row 5 has no meaning. Compare the table below to the table in (iii) above. What can you deduce?

Candidate Number SName Course

994667823 Anderson 2318

994534332 Scanlon 2318

994532345 Anwar Spanish

994657847 Woodrow Spanish

995657829 Weir 2318

4.4

Relational Data Modeling

This section builds on previous study of entities, attributes and relationships and the modeling techniques used i.e. ER Diagrams

c

(36)

30 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL

4.4.1 Anytown High School

Consider some of the entities found in a school:

There are pupils, there are courses, there are teachers (to teach the course) and there are teachers to advise pupils (lets call them guidance teachers). Of course, we could identify lots of other entities (like classrooms, lockers, cleaners, cooks, library books, timetables, appointments, homework, exam grades, etc. etc. etc), but like most data models you will study on this course it will be simplified to help understanding.

Firstly, consider the entities of pupil, and course:

This cannot be represented as two tables in a relational database since several Pupil identifiers(providing a foreign key to link the table) would be needed in each entry of the Course table because the same pupil can study many courses. Similarly several Course identifiers would be needed in each entry of Pupil. That would mean that in the Course table there would be multi values identifying all the pupils doing a particular course and in the Pupil table, multiple values identifying all the course each pupils studied. This breaks the rule (in both tables) of data items requiring to be atomic.

1:M & M:N Relationships

This activity takes the Anytown High Scool example and shows how relationships can be refined to overcome the problems with M:N relationships.

For example, another table can be introduced that will represent unique instances of pupils studying courses. By taking the primary keys of the existing entities we can combine them to form a new entity: A Pupil can be identified by an unique number called pupilID and a course by courseID. Combine these to form the primary key of the new table (called Enrollment) so that every instance of a pupil on a course is uniquely identified. Separately, these attributes become foreign keys in the new table and provide the link to the two existing tables showing the 1:m relationships.

4.4.2 ACTIVITIES

Redraw the ER Diagram in the stages described above and include verbs to describe the relationships between the tables in the model.

ER Diagram 1

10 min

Redraw the ER Diagram in the stages described above and include verbs to describe the relationships between the tables in the model.

From the following scenario identify the entities and relationships and construct an E-R Diagram:

ER Diagram 2

30 min

Kirklee Vegetable Club is a club of gardeners keen on growing vegetables in the allotment plots that are leased from the Council. Each member can rent more than one plot to grow vegetables. Each year the Club records details of the amount vegetables that are grown on each plot (i.e. its produce) as there are prizes awarded to the plot that produces the largest yield of each type of vegetable. Each plot has a unique number and

(37)

4.5. DATABASE INTEGRITY: PRIMARY KEYS AND ENTITY INTEGRITY 31

members are identified by their membership number. The vegetables that are grown are identified by the vegetable name (e.g. carrot, potato etc). Any vegetable type may be grown in any of the plots.

Three unique identifiers are mentioned in the scenario and you will have to introduce a new entity to deal with the m:n relationship

Task - Identify the entities and relationships and draw an E-R Diagram that would enable this scenario to be modelled for a relational database.

4.5

Database Integrity: Primary Keys and Entity Integrity

Identifying and declaring a primary key in a table introduces an inherent restriction on the table i.e. that two rows in the table cannot be the same. This allows each row to be identified. For example, the details of pupil in a school database can be found by using a unique identifier i.e. the primary key. If the database is relational then the details of that pupil will be recorded once to avoid data redundancy. There may be several attributes of a pupil that are unique e.g. Scottish Candidate Number (SCN), School Enrollment Number, and possibly National Insurance Number. Only one of these candidate keys can be used as the primary key. If the SCN is used, and no two SCNs are the same, then the uniqueness of each pupil record is guaranteed.

Also, if the primary key is the SCN, then the primary key must have a value for the pupil to exist on the database. This is the same for any primary key in any table and ensures entity integrity i.e. the primary key ensure the existence of a record in a table.

In the pupil example, the value for National Insurance may not exist, but since NI Number is not the primary key, then entity integrity is maintained i.e. a pupil can exist on the school’s database without necessarily having a National Insurance Number.

4.5.1 Database Integrity: Foreign Keys and Referential Integrity

A foreign key in one table has the same values of the primary key in another table. For example the attribute used to identify a pupil (SCN) in the Enrolment table for the Anytown High School model, will be a foreign key because the same attribute is used as the primary key in the Pupil table. It follows that both must have the same value in order to identify a pupil. This is known as referential integrity.

The same applies to the attribute identifying the Course that is been studied and both together i.e. the pupil identifier and the course identifier, form the composite primary key of the enrolment table. (The same entity integrity applies and for each enrolment to exist it must have a pupil and a course identified.)

Referential integrity ensures that a link (or relationship) exists between tables. Not only does it ensure the link, it defines the relationship as 1:m i.e a primary key in one table will exist once, but may appear many times as a foreign key in other tables.

c

(38)

32 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL

4.6

Summary

By the end of this topic you will be able to:

¯ Prior Knowledge and Revision ¯ Learning Objectives

¯ Introduction

¯ Properties of Tables ¯ Relational Data Modeling ¯ Database Integrity

¯ Primary Keys and Entity Integrity ¯ Foreign Keys and Referential Integrity

End of topic Test

End of topic Test

(39)

33

Topic 5

Normalisation

Contents

5.1 Prior Knowledge and Revision . . . 34

5.2 Introduction . . . 34

5.3 The Normalisation Process . . . 35

5.3.1 Unormalised Form (UNF) . . . 35

5.3.2 First Normal Form (1NF) . . . 37

5.3.3 Second Normal Form (2NF) . . . 39

5.3.4 Third Normal Form (3NF) . . . 41

5.3.5 Normalisation in a Nutshell . . . 42 5.3.6 Beyond 3NF . . . 42 5.4 Why Normalise? . . . 43 5.4.1 Insertion Anomolies . . . 44 5.4.2 Deletion Anomolies . . . 45 5.4.3 Update Anomalies . . . 45 5.5 Summary . . . 45 Learning Objectives

By the end of this topic you will be able to;

¯ analyse and extract data from up to two source documents to produce a model in un-normalised form (UNF)

¯ identify and remove repeating groups of data from UNF to produce a model in first normal form(1NF)

¯ identify and remove partial dependencies from 1NF to produce a model in second normal form (2NF)

¯ identify and remove transitive dependences from 2NF to produce a model in third normal form (3NF)

(40)

34 TOPIC 5. NORMALISATION

By the end of this topic you will be able to;

¯ analyse and extract data from up to two source documents to produce a model in

un-normalised form (UNF)

¯ identify and remove repeating groups of data from UNF to produce a model in first

normal form (1NF)

¯ identify and remove partial dependencies from 1NF to produce a model in second

normal form (2NF)

¯ identify and remove transitive dependences from 2NF to produce a model in third

normal form (3NF)

5.1

Prior Knowledge and Revision

The topic assumes that you have previously worked with databases that use linked tables with a single one-to-many relationship and that you can identify and remove multi-valued fields found in records. This may have been done through the study of the corresponding Unit at Intermediate 2 , or another course on Computing or ICT. In order to check your understanding of previous work, take the revision questions on the website.

Revision Questions

10 min

5.2

Introduction

So far we have dealt with entities and the relationships between them and considered the table structure of how these entities can be represented in a relational database model. This topic considers how entities, tables and records can be structured into a form that makes up a relational model. This form is called normal form and this topic will describe a methodical process that takes entities, relations and records to that required form. This process is called normalisation and can be considered in stages from an un-normalised model (UNF), to first normal form (1NF), to second normal form (2NF), and finally to third normal form (3NF). Data can be modelled to higher levels of normalisation but that is beyond the scope of this course. When data is in 3NF it is in a form that can be implemented and used effectively within a relational database software package. The process of normalisation takes account of the relationships between entities that we have been considering in previous topics and the importance of keys within the tables considered. We know that we have a normalised model when every attribute in a table that is not part of the primary key is determined by the primary key and nothing else. We will use the example of the vegetable club to go through the process of normalisation and then consider the problems that arise if the model is not normalised.

(41)

5.3. THE NORMALISATION PROCESS 35

5.3

The Normalisation Process

The best way to consider the normalisation process is by working through an example and then practising on other scenarios. The example we will use to take you through the process will be the vegetable club model that was considered earlier. We will look at this methodically to enable you to apply the process to any example you may encounter. We start by analysing an existing scenario and data recording methods (if any exist). From that we can produce an UNF model of the data, progressing to 1NF, 2NF and finally 3NF.

5.3.1 Unormalised Form (UNF)

Case Study: Kirklee Vegetable Club Kirklee Vegetable Club is a club of gardeners keen on growing vegetables in the allotment plots that are leased from the Council. Each member can rent more than one plot to grow vegetables. Each year the ClubRecords details of the amount vegetables that are grown on each plot (i.e. its produce) as there are prizes awarded to the plot that produces the largest yield of each type of vegetable. Each plot has a unique number and members are identified by their membership number. The vegetables that are grown are identified by the VegetableName (e.g. carrot, potato etc). Any vegetable type may be grown in any of the plots.

For every vegetable season the Club secretary keeps card records for each plot. An example is shown below:

Kirklee Vegetable Club 2006

PlotNumber 5

PlotSize 1.2 ha

Member No 102

Member Name James MacArthur

VegetableName Yield ClubRecord

Carrot 5.5 Kg 10.4 Kg

Potato 23.2Kg 36.8 Kg

Onion 12.3Kg 16.4Kg

At the end of the season, a card similar to this will be produced for all 64 plots on the allotment site.

Notes: This is a simplified example that would have limited use in reality e.g. variety types of the same vegetable are not recorded and it is assumed that the model is used only for the one season of growing: only the ClubRecord for yields are carried over from previous seasons. However, there is enough in the model to take you through the normalisation process which is why it has been simplified.

Do not be too concerned about the model not reflecting reality, as it is the process of normalising that you are more concerned with at this stage. As you get more comfortable with the process, then you will be able to normalise more complex situations, but at this level of course, all examples will be simplified in order to keep you focussed on the process. (Details of levels of treatment of examples are outlined in the Course Arrangements e.g. in the examination you will only ever be presented with two source document like the card above, and you will end up with a maximum of four tables when

c

(42)

36 TOPIC 5. NORMALISATION

your model is normalised.)

5.3.1.1 UNF - stage (i)

The first stage in producing an unormalised model is to create a single entity that lists all the attributes that are recorded on the original source documents and in the scenario:

PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord

Note: if the same attribute appears twice in the source documents, you only need to have it in the list once.

5.3.1.2 UNF - stage (ii)

Choose an attribute that will identify each record. (In data modelling terms we are selecting a primary key). The scenario tells us that cards are kept for each plot, so the PlotNumber should uniquely identify each card. Indicate this by underlining the attribute:

PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord

5.3.1.3 UNF - stage (iii)

From the source document(s) identify what attributes have more than one entry. From the record cards we see that VegetableName, Yield, and ClubRecord have more than one value. This is called a repeating group of attributes and should not be confused with a multi-valued attribute. There is one multi-valued attribute in this scenario - can you identify it?

Repeating groups of attributes should be identified in the UNF by indenting the list (or by some other means)

(43)

5.3. THE NORMALISATION PROCESS 37 PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord

5.3.1.4 UNF - stage (iv)

Finally, the entity should be named and the attributes enclosed in brackets: Plot (PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord) We now have the data model in UNF.

5.3.1.5 UNF Interactivity

The following activity takes you through the UNF process that you have just followed.

UNF Interactivity

5.3.2 First Normal Form (1NF)

We could consider a formal definition of when entities are in 1NF, but that would just confuse you. The simplest way to get to 1NF is to remove repeating groups and form a new entity with them. As it stands, Plot has repeating values for VegetableName, Yield and ClubRecord. An entity in this state would be impossible to implement in a database and to match the correct VegetableName with its associated values for Yield and ClubRecord. By splitting the entity into two separate entities then it would be possible to link the two entities using a foreign key but 1NF models would have some important anomalies limiting its use. These anomalies are considered later on in this topic.

5.3.2.1 1NF stage (i)

Remove the identified repeating group and give it a name:

c

(44)

38 TOPIC 5. NORMALISATION Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord)

These two tables are now separated but we need to show how they are linked.

5.3.2.2 1NF stage (ii)

We know that here is a link between them as a plot can produce many vegetables. The primary key (i.e. PlotNumber) provides the link between the entities and should be included in both: Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord PlotNumber)

The attribute PlotNumber is the primary in Plot and is a foreign key in Vegetable. Foreign keys should be indicated with an asterix:

Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord *PlotNumber) 5.3.2.3 1NF Stage (iii)

All that is left to do before the model is in 1NF is to identify the primary key in the second entity. A primary key must be able to identify a unique occurrence of the entity. The VegetableName alone would identify vegetable with the same name of which there would be many and therefore not unique.

(45)

5.3. THE NORMALISATION PROCESS 39

its present form the Vegetable entity records details of a vegetable grown on a particular plot. So each occurrence of this entity could be identified by the VegetableName and the PlotNumber. This is called a composite key and indicated by underlining all the attributes in it and listing them together:

Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName *PlotNumber Yield ClubRecord) The model is now in 1NF.

5.3.2.4 1NF Activity

The following activity takes you through the 1NF process you have just followed.

1NF Activity

5.3.3 Second Normal Form (2NF)

Tables are in 2NF when every non-primary key attribute is determined by the entire primary key. In the first table Plot, all attributes other than PlotNumber, which is the primary key, can be determined by the primary key. What this means is that as it stands the non-primary key attributes are dependant on the primary key. You may be thinking that Member Name is dependent on Member Number, but in this form, Member Number is not the primary key and so Member Name is indirectly dependent on the primary key i.e. Member Name is dependent on Member Number which in turn is dependent on PlotNumber, so we can conclude that Member Name is dependent on PlotNumber. This is called transitive dependency which we shall return to in the next normalisation form. The table Vegetable has a composite key consisting of VegetableName and PlotNumber. Vegetable (VegetableName

*PlotNumber Yield

ClubRecord)

Each non-primary key is dependent on the primary key, but are they each dependent on the entire primary key? If not, the table is not in 2NF.

The Yeild of a vegetable is determined by the the type of vegetable it is (i.e. the VegetableName) and where it is grown (i.e. the PlotNumber). There will be many values of Yield for the same VegetableName that can be stored in this form, so in order to find a unique occurrence of a particular vegetable yield we must know both the PlotNumber and the VegetableName. We can conclude that Yield is entirely dependent on the primary key and this part of the model is in 2NF.

c

(46)

40 TOPIC 5. NORMALISATION

What about ClubRecord? This is a value for the largest amount of a vegetable grown in the Club in any one year. This vegetable could have been grown in any plot i.e. it is not determined by the PlotNumber; it is determined by the VegetableName. This means that ClubRecord is not fully dependent on the entire Primary Key and therefore not in 2NF. Since it is dependent on part of the primary key we call this a partial dependency and in order to get to 2NF we must split the entity to remove this condition.

(VegetableName (VegetableName *PlotNumber ClubRecord) Yield)

When splitting entities to remove partial dependencies, we must ensure that existing primary key is not split. VegetableName and PlotNumber is still the primary key on the left hand entity, but VegetableName alone becomes the primary key of the entity on the right.

For this example it would be appropriate to assign the name of Vegetable to the entity on the right and assign a new name to the entity on the left. So we would have:

Produce (VegetableName Vegetable (VegetableName

*PlotNumber ClubRecord)

Yield)

The primary key in the entity we are now calling Vegetable is VegetableName. This attribute is left in the entity we are now calling Produce as a foreign key and determining the relationship between the entities. So indicating the keys would give:

Produce (*VegetableName Vegetable (VegetableName

*PlotNumber ClubRecord)

Yield)

These entities are now represented in 2NF The three entities together would be: Plot (PlotNumber PlotSize MemberNumber MemberName) Produce (*VegetableName *PlotNumber Yield) Vegetable (VegetableName ClubRecord)

The relationship between these entities would be represented by an E-R Diagram. The attribute PlotNumber now appears in the entities Plot (as the primary key) and in Produce (as a foreign key), so that must be where the relationship exists. Furthermore, we can deduce from the keys that the relationship is 1:n.

(47)

5.3. THE NORMALISATION PROCESS 41

and Vegetable.

2NF Activity

5.3.4 Third Normal Form (3NF)

Entities in third normal form ensure that every attribute is dependent on the primary key and nothing else. When we consider the Plot entity we discovered that although PlotNumber was the primary key identifying each occurrence of the entity, MemberName is also determined by MemberNumber i.e. MemberName is transitively dependent on PlotNumber.

Plot (PlotNumber PlotSize

MemberNumber MemberName)

In 3NF it is not permissible to have attributes transitively dependent on the primary key. To remove transitive dependencies we must create a new entity with MemberNumber the primary key which determines MemberName:

Plot (PlotNumber Member (MemberNumber

MemberNumber MemberName)

MemberName)

MemberName becomes the linking attribute and therefore the foreign key in Plot indicating a m:1 relationship:

Plot (PlotNumber Member (MemberNumber

*MemberNumber) MemberName)

The model would now appear to be in 3NF but if we check the Rules for normalisation we discover that the attribute MemberName is not atomic i.e. it is made up of two character strings representing the members first name and surname.

So by splitting this attribute into atomic values we complete the normalisation process: Plot (PlotNumber PlotSize *MemberNumber) Member (MemberNumber MemberFName MemberSName) Produce (*VegetableName *PlotNumber Yield) Vegetable (VegetableName ClubRecord) c

(48)

42 TOPIC 5. NORMALISATION

The normalisation process has only produced 1:m relationships which is the intended outcome.

3NF Activity

5.3.5 Normalisation in a Nutshell

When presented with a scenario to normalise the following stages are required:

1. To get to UNF

a) From the scenario (and source documents) list all the attributes you can identify

b) Identify any repeating group of attribute and indent them on the list c) Choose an attribute to act as a primary key which determines the rest d) Close the list with brackets and give it a name

2. To get to 1NF

a) Form a new entity from the repeating group

b) Include the primary key from UNF to act as foreign key in the new entity c) Identify the primary key in the new entity

d) Add brackets and give it a name

3. To get to 2NF

a) Identify and remove partial dependences making sure that primary keys are present as foreign keys in any new entities

b) Note: this will only apply where there is a composite key (i.e. more than one attribute) in a 1NF entity and where some non-key attributes are dependent only on part of the primary key

4. To get to 3NF

a) Identify and remove transitive dependencies again ensuring the primary/foreign key link

b) Note: transitive dependent attributes and dependent on the primary key via a non-key attribute

5. Finally check the normalisation rules for any other anomalies e.g. non-atomic attributes

5.3.6 Beyond 3NF

At 3NF a relational data model can be adequately implemented and further levels of normalisation are not required. However, they do exist but are beyond the scope of this course. In fact, most of the relational models that you will encounter will satisfy the rules of higher levels of normalisation than 3NF.

You will come across a similar type of situation at the normalisation levels that you will be dealing with in this course. For example, it is not uncommon in some cases to find

References

Related documents

The various instructional conditions could be categorized into three: instructional variables, mainly the instructional approaches adopted and teaching strategies

We discussed many examples of this interaction, for different types of institutions (such as political and legal institutions, regulation, and the welfare state) and

Appointed Circuit Judge by the Illinois Supreme Court in March 2000, Judge Hall was elected Circuit Judge in the 2000 general election and took his oath of office on December

For interval-valued intu- itionistic fuzzy multicriteria group decision-making problem with incomplete information on the weights of criteria, an entropy weight model is established

2 of research on six major areas (age, gender, motivation, intelligence, aptitude and personality) in the field of individual differences, as well as the most

[r]

The paper argues that there is a breakup of community social structures as evident from changes in family relations in critical domains such as extra-marital relations, sexual

Top panel: samples from a MC simulation (gray lines), mean computed over these samples (solid blue line) and zero-order PCE coefficient from the SG (dashed red line) and ST