Candidate Key Explained
A candidate key is one or more data elements that uniquely identify an entity instance.
A candidate key is one or more data elements that uniquely identify an entity instance.
Sometimes a single data element identifies an e
Sometimes a single data element identifies an entity instance, such as ISBN for a book, ntity instance, such as ISBN for a book, oror Account Code
Account Code for an account. Sometimes it takes more than one data element to uniquely for an account. Sometimes it takes more than one data element to uniquely identify an entity instance. For example, both a
identify an entity instance. For example, both a Promotion CodePromotion Code and and Promotion Start DatePromotion Start Date are are necessary to identify a promotion. When more than one data element makes up a key, we use the necessary to identify a promotion. When more than one data element makes up a key, we use the term 'composite key'. So
term 'composite key'. So Promotion CodePromotion Code and and Promotion Start DatePromotion Start Date together are a composite together are a composite candidate key for a promotion.
candidate key for a promotion.
A candidate key has three main characteristics:
A candidate key has three main characteristics:
•
• Unique.Unique. There cannot be duplicate values in the data in a candidate key and it cannot be There cannot be duplicate values in the data in a candidate key and it cannot be empty (also known as 'nullable'). Therefore,
empty (also known as 'nullable'). Therefore, the number of distinct values of a the number of distinct values of a candidatecandidate key must be equal to
key must be equal to the number of distinct entity instances. If the entity the number of distinct entity instances. If the entity Book has ISBNBook has ISBN as its candidate key, and if there are 500 book instances, there will also be 500 unique as its candidate key, and if there are 500 book instances, there will also be 500 unique ISBNs.
ISBNs.
•
• Non-volatile.Non-volatile. A candidate key value on an entity instance should never change. Since a A candidate key value on an entity instance should never change. Since a candidate key is used to find a unique entity instance, you would be unable to find that candidate key is used to find a unique entity instance, you would be unable to find that instance if you were still trying to use the value before it was changed. Changing a instance if you were still trying to use the value before it was changed. Changing a
candidate key would also mean changing it in every other entity in which it appears with candidate key would also mean changing it in every other entity in which it appears with the original value.
the original value.
•
• Minimal.Minimal. A candidate key should contain only those data elements that are needed to A candidate key should contain only those data elements that are needed to uniquely identify an entity instance. If four data elements are listed as the composite uniquely identify an entity instance. If four data elements are listed as the composite candidate key for an entity, but only three are really needed for uniqueness, then only candidate key for an entity, but only three are really needed for uniqueness, then only those three should make up the candidate key.
those three should make up the candidate key.
Figure 7.1 contains a data model before candidate keys have been identified.
Figure 7.1 contains a data model before candidate keys have been identified.
Figure
Figure 7.1:7.1: DataData modelmodel beforebefore candidatecandidate keyskeys havehave beenbeen identifiedidentified
•
• Each Student may attend one or many Classes.Each Student may attend one or many Classes.
•
• Each Class may contain one or many Students.Each Class may contain one or many Students.
Note that we have a many-to-many relationship between Student and Class that was replaced by Note that we have a many-to-many relationship between Student and Class that was replaced by
the entity Attendance and two one-to-many relationships (more on this in our normalization the entity Attendance and two one-to-many relationships (more on this in our normalization section). In reading a many-to-many relationship, I hav
section). In reading a many-to-many relationship, I have found it helpful to ignore the e found it helpful to ignore the entity inentity in the middle (Attendance, in this example) and just read the labels between the entities on either the middle (Attendance, in this example) and just read the labels between the entities on either side. For example, each Student may attend one or many Classes and each Class may contain one side. For example, each Student may attend one or many Classes and each Class may contain one or many Students.
or many Students.
Table 7.1 contains sample value
Table 7.1 contains sample values for each of these entities.s for each of these entities.
Table
Table 7.1:7.1: SampleSample valuesvalues forfor FigureFigure 7.17.1 OpenOpentabletableasasspreadsheetspreadsheet
Student Student
Student Number Student First Name Student Last Name Student Date Of Birth Student Number Student First Name Student Last Name Student Date Of Birth SM385932
SM385932 Steve Steve Martin Martin 1/25/19581/25/1958 EM584926
EM584926 Eddie Eddie Murphy Murphy 3/15/19713/15/1971 HW742615
HW742615 Henry Henry Winkler Winkler 2/14/19842/14/1984 MM481526
MM481526 Mickey Mickey Mouse Mouse 5/10/19825/10/1982
Student Student
Student Number Student First Name Student Last Name Student Date Of Birth Student Number Student First Name Student Last Name Student Date Of Birth DD857111
DD857111 Donald Donald Duck Duck 5/10/19825/10/1982 MM573483
MM573483 Minnie Minnie Mouse Mouse 4/1/19864/1/1986 LR731511
LR731511 Lone Lone Ranger Ranger 10/21/194910/21/1949 EM876253
EM876253 Eddie Eddie Murphy Murphy 7/1/19927/1/1992 Open
Opentabletableasasspreadsheetspreadsheet
Attendance
Opentabletableasasspreadsheetspreadsheet
Class
Name Class Class Description Description TextText Data Modeling
An introductory class covering basic data modeling An introductory class covering basic data modeling concepts and principles.
A fast-paced class covering techniques such as advanced A fast-paced class covering techniques such as advanced normalization and ragged hierarchies.
Based on our definition of a candidate key and a candidate key's characteristics of being unique, Based on our definition of a candidate key and a candidate key's characteristics of being unique, non-volatile, and minimal, what would you choose as the candidate keys for each of these
non-volatile, and minimal, what would you choose as the candidate keys for each of these entities?
entities?
For Student,
For Student, Student NumberStudent Number appears to be a valid candidate key. There are eight students and appears to be a valid candidate key. There are eight students and eight distinct values for
eight distinct values for Student NumberStudent Number. So unlike. So unlike Student First NameStudent First Name and and Student LastStudent Last NameName, which can contain duplicates like Eddie Murphy,, which can contain duplicates like Eddie Murphy, Student NumberStudent Number appears to be unique. appears to be unique.
Of Birth for both Mickey Mouse and Donald Duck. However, the combination of
Of Birth for both Mickey Mouse and Donald Duck. However, the combination of Student FirstStudent First Name
Name,, Student Last NameStudent Last Name, and, and Student Date Of BirthStudent Date Of Birth may make a valid candidate key. may make a valid candidate key.
For Attendance, we are currently missing a candidate key. Although the
For Attendance, we are currently missing a candidate key. Although the Attendance DateAttendance Date is is unique in our sample data, we will probably need to know which student attended which class on unique in our sample data, we will probably need to know which student attended which class on this particular date.
this particular date.
For Class, on first glance it appears that any of its data elements are unique and would therefore For Class, on first glance it appears that any of its data elements are unique and would therefore qualify as a candidate key. However, Juggling does not have a
qualify as a candidate key. However, Juggling does not have a Class Short NameClass Short Name. So because. So because Class Short Name
Class Short Name can be empty, we cannot consider it a candidate key. Also, one of the can be empty, we cannot consider it a candidate key. Also, one of the characteristics of a candidate key is that
characteristics of a candidate key is that it is non-volatile. I know, based on it is non-volatile. I know, based on my teachingmy teaching experience, that class descriptions can change. Therefore,
experience, that class descriptions can change. Therefore, Class Description TextClass Description Text also needs to also needs to be ruled out as a candidate key, leaving
be ruled out as a candidate key, leaving Class Full NameClass Full Name as the best option for a candidate key. as the best option for a candidate key.