Learning Objectives
After completing the session, you will be able to:
Identify different means of achieving data integrity across the database by DB2 Introduction to Data Integrity
Data integrity refers to the validity, consistency, and accuracy of the data in a database. It cannot be overstated that the level of accuracy of the information retrieved from the database is in direct proportion to the level of data integrity imposed within the database. Data integrity is one of the most important aspects of the database design process, and it should not be underestimated, overlooked, or even partially neglected. To make any of these mistakes would result in a high risk of undetectable errors.
There are three types of data integrity and are as follows:
Entity Integrity
Referential Integrity
Domain Integrity
Entity Integrity
This is the “Table-level integrity” which ensures that the field that identifies each record within the table is unique and is never missing its value.
Entity integrity requires the specification of a primary key (PK) for each table.
Key Notes about Primary Key:
Each table can have zero or one primary key.
Primary key should not be Null and if the primary key is a composite key, make sure that each component should not be Null.
Every primary key explicitly defined for a table must be associated with a corresponding unique index.
If you do not create a unique index for a primary key, then an incomplete key is defined for the table, making the table inaccessible.
Unique Constraint:
A unique constraint is similar to a primary key constraint which also enforces unique values on an individual or a group of columns. Each table can have zero, one, or many unique constraints consisting of one or more columns each. The values stored in the unique column, or combination of columns, must be unique within the table. Unique constraint column should not be Null.
A unique index needs to be created on the columns of the unique constraint to ensure uniqueness.
Unique Index:
In addition of creating unique index in primary key column or unique constraint column (which is mandatory), you can create as many unique indexes as we need on any other columns of the table to ensure uniqueness.
The following table will show the difference between unique index on Primary Key/Unique constraint column and other column other than primary/unique constraint column.
Primary Key or unique constraint column Column other than primary/unique constraint but defined with unique index
A table can contain only one primary key constraint and multiple unique constraints
A table can contain multiple unique indexes
Cannot allow NULL values Can allow NULL values
Supports Referential Integrity Cannot support Referential Integrity
Referential Integrity
This is a “Relationship-level integrity” which ensures that the relationship between a pair of tables is sound and that there is synchronization between the two tables whenever data is entered, updated, or deleted.
Referential integrity is achieved through the foreign key.
Foreign Key:
A foreign key (FK) is a column or combination of columns used to establish and enforce a link between the data in two tables.
A link is created between two tables by adding the column or columns that hold one table's primary key values to the other table. This column becomes a foreign key in the second table.
The table with the primary key is called parent table and the table with the foreign key is called dependent table (or child table).
Referential integrity (RI) means each row in a dependent table must have a foreign key that is equal to a primary key in the parent table.
Rules ensuring RI:
Insert:
When inserting a row with a foreign key in the dependent table, DB2 checks the values of the foreign key column against the values of the primary key column in the parent table. If there is a matching primary key column, the insert is allowed. If there is no matching primary key column, the insert will not happen.
A new row can be inserted in the parent table as long as the primary key value of the new row is unique.
Update:
When updating foreign key values in the dependent table, DB2 checks whether there is a matching primary key in the parent table or not. If there is a matching primary key, update is allowed. If there is no matching primary key, update is not allowed.
The primary key in the parent table cannot be updated if any rows in the dependent tables refer to it.
Delete:
Deleting a row with a foreign key in the dependent table is permitted.
When deleting a row with a primary key in the parent table, DB2 takes one of the following actions as indicated while defining the table.
RESTRICT: Disallows the deletion of the primary key row if any foreign keys relate to that row.
CASCADE: Allows the deletion of the primary key row and also deletes the foreign key rows that relate to it.
SET TO NULL: Allows the deletion of the primary key row and, instead of deleting all related foreign key rows, sets the foreign key columns to NULL.
Operation Child Table Parent Table
Insert Allowed if the foreign key value matches the value of Primary key of the parent table.
Allowed as long as the primary key value is unique.
Update Allowed if the foreign key value matches the value of Primary key of the parent table.
Allowed if there are no foreign key references in the child tables.
Delete Allowed. Depending upon the action specified during table definition.
Restrict: not allowed if there are any foreign key references
Cascade: allowed and deletes the foreign key references if any in the child tables.
SET TO NULL: allowed and a Null value will be set in the foreign key references of the child tables
Domain Integrity
This is the “Field-level integrity” which ensures that the structure of every field is sound, that the values in each field are valid, consistent, and accurate, and that fields of the same type (such as City fields) are consistently defined throughout the database.
Domain integrity is enforced using:
By specifying the data type and maximum length for each column when a table is created, the DBMS will automatically ensure that only the correct type of data with the maximum length allowed is stored in that column.
Default Values
When columns are created within tables, they can be assigned a default value that will be used when inserting or loading the data which do not provide an explicit value for that column. Each column can have only one default value. User can provide a default value for a column. If there is no user specific default value, DB2 will assign the default value based on the data type of that column.
DB2 Data Types with Default Values:
Data Type COBOL PIC
COBOL USAGE
Default Description
CHAR(n) PIC X(n) DISPLAY Blanks Fixed length character (EBCDIC) data
VARCHAR(n) PIC X(n) DISPLAY Empty String Variable length character data SMALLINT PIC S9(4) COMP OR DATE PIC X(10) DISPLAY Current Date Date data (yyyy-mm-dd) TIME PIC X(8) DISPLAY Current Time Time data (hh.mm.ss) TIMESTAMP PIC X(26) DISPLAY Current
timestamp
Date and Time data with microseconds
(yyyy-mm-dd-hh.mm.ss.mmmmmm) GRAPHIC PIC G(n) DISPLAY-1 Blanks Double byte character set
(DBCS) data
VARGRAPHIC PIC G(n) DISPLAY-1 Empty String Variable length DBCS data FLOAT(n) None COMP-1 or
COMP-2
0 Floating point data in single or double precision format
NULL Values:
Whenever a value is missing or unknown, it is said to be Null. A null value represents neither zero (in the case of numeric data) nor blank (represented by one or more spaces in the case of textual data). Zero and blank are actual values and can be meaningful in some way under certain circumstances. For example, a zero can represent the current state of an Account Balance; a blank in a Middle Initial field can represent the fact that an employee has no middle initial in his or her name. In the following figure, a blank represents the fact that Washington, D.C., is not located in any county whatsoever.
A null value is typically used to represent an unknown value in a field. In the above figure, for example, there are null values in the County field. Shannon McLain did not know what county she lived in at the time her data was entered into the database, so no entry was made into the County field. As a result, the County field contains a null value. This value can be changed, however, once Shannon finds out what county she lives in.
A null value is also used to represent a missing value in a field. If the person who entered the data for Shannon McLain failed to ask her for the name of the county she lives in, the data is
considered missing since no entry was made into the County field due to operator error. Once the error is recognized, it can be easily corrected by obtaining the appropriate value from Ms. McLain.
A drawback to null values is that they cannot be evaluated by mathematical expressions or aggregate functions. If a null value is used in a mathematical expression, that expression will
logically reasonable—if the number is unknown, the value will necessarily be unknown. Also there is a serious undetected error that occurs if all the values in the Total Value field are then added together: an inaccurate total. The only way to obtain an accurate total is to provide a value for the entries in the Qty On Hand field that are currently Null.
The result of an aggregate function, such as "Count()," will be Null if it is based on a field that contains null values. For example, the following figure shows the results of a summary query that counts the total number of occurrences of each category in the PRODUCTS table shown above.
The value of Total Occurrences in the summary query is the result of the function expression
"Count([Total Occurrences])." Notice that the summary query shows 0 occurrences of an
unspecified Category, implying that each product has been assigned a category. This information is clearly inaccurate because there are two products in the PRODUCTS table that have not been assigned a category.
Check Constraint
A check constraint is a rule that specifies the values that are allowed in one or more columns of every row of a table. Check constraint enforces business rules directly into the database without requiring additional application logic. This can be defined during column definition.
Summary
Data Integrity ensures the accuracy of the data in a database.
Types of Data Integrity
o Entity Integrity
o Referential Integrity
o Domain Integrity
Entity integrity enforces each occurrence of an entity must be uniquely identifiable and is achieved through primary key.
A table definition can be complete only when a unique index is created on its primary key.
Unique constraint also behaves similar to primary key constraint except the fact of the number of unique constraints can be more than one in a table.
Unique index column allows null values and does not support RI as against primary or unique constraint column.
Foreign key is a column in a child table which holds the value of primary key column of a parent table.
Referential integrity (RI) ensures each row in a dependent table must have a foreign key that is equal to a primary key in the parent table.
Rules to ensure Referential integrity:
Operation Child Table Parent Table Insert Allowed if the foreign key value
matches the value of Primary key of the parent table.
Allowed as long as the primary key value is unique.
Update Allowed if the foreign key value matches the value of Primary key of the parent table.
Allowed if there are no foreign key references in the child tables.
Delete Allowed. Depending upon the action specified during table definition.
Restrict: Not allowed if there are any foreign key references
Cascade: Allowed and deletes the foreign key references if any in the child tables.
SET TO NULL: Allowed and a Null value will be set in the foreign key references of the child tables
Domain integrity ensures the possible values of a column and is achieved through:
o Data Type and Length:
o Commonly used DB2 data types: Char, Varchar, Smallint, Integer, Decimal, Date, Time, Timestamp
o Default Values:
Can be user defined or depends on the data type
Used for columns with the missing values during insertion
o NULL Values: Unknown or missing value
o Check constraint: Enforces business rules and allows only the predefined values Test Your Understanding
1. What is data integrity?
2. What is entity integrity and how do you achieve it?
3. Can you create a table without a primary key?
4. Can a primary key column hold a value of null?
5. State the importance of unique index in the primary key.
6. What is a unique constraint?
7. Differentiate between the primary key constraint and unique constraint.
8. State the differences between primary key index and unique index.
9. What is a foreign key?
10. What is a referential integrity?
11. List the rules applied to ensure RI while inserting, updating, and deleting the data both in the parent table as well as in the child table.
12. What is domain integrity and how to achieve it?