Dbms Notes for Anna University

(1)

Velammal College of Engineering and Technology Department of Computer Science and Engineering

Database Management Systems UNIT-I

Introduction: Definition:

DBMS contains information about a particular enterprise. DBMS is a • Collection of interrelated data and

• Set of programs to access those data

Primary Goal of a DBMS: is to provide an Environment that is both convenient and efficient for people to use in retrieving and storing database information

Advantages of a DBMS:

Using a DBMS to manage data has many advantages

a. Data Independence (DBMS provide an abstract View of the data to insulate application code from internal details such as data representation and storage) b. Redundancy Control (DBMS control the redundancy of persistent data)

c. Efficient Data Access (DBMS uses sophisticated techniques to Store and Retrieve data efficiently)

d. Data Integrity and Security (DBMS can enforce Integrity Constraints on the data and can enforce access controls that that governs what data is visible to different classes of users)

e. Concurrent access (DBMS provides correct, concurrent access to data by multiple users)

f. Authorized Access (DBMS provides Security and Authorization Subsystem, which the DBA uses to create accounts and to specify Account restriction)

g. Data Administration (When several users share the data, centralizing the Administration of offer significant improvements)

h. Reduced Application Development (DBMS in conjunction with the High-Level Interface to the data can facilitates Quick Development of Applications)

i. Back up and Crash Recovery (DBMS provides facilities for Recovering from Hardware and Software Failure. The Backup and Recovery subsystem of DBMS is responsible for Recovery)

j. Database Communication Interfaces (Current DBMS provides Special Communication Routines designed to allow the database to accept End User Requests within a Computer Networks. For example, the DBMS provides communication functions to access the database through the Internet, using Internet Browsers such as Netscape or Explorer as the Front End)

k. Flexibility

l. Availability of Up-to-Date Information to all Users

Disadvantages of DBMS: • When not to use a DBMS

• High Initial Investment --Cost of Hardware --Cost of Software

--Cost for Training people to use DBMS --Cost of maintaining the DBMS

(2)

--Overhead for providing security, recovery, integrity, and concurrency control.

• When a DBMS may be unnecessary:

--If the database and application are simple, well Defined, and not expected to change

--If there are stringent real-time requirements that may not be met because of DBMS overhead.

--If access to data by multiple users is not required. Database Applications:

• Banking: all transactions

• Airlines: reservations, schedules • Universities: registration, grades • Sales: customers, products, purchases

• Online retailers: order tracking, customized recommendations • Manufacturing: production, inventory, orders, supply chain • Human resources: employee records, salaries, tax deductions Purpose:

In the early days, database applications were built directly on top of file systems Drawbacks of using file systems to store data:

• Data redundancy and inconsistency

Multiple file formats, duplication of information in different files • Difficulty in accessing data

Need to write a new program to carry out each new task • Data isolation — multiple files and formats

• Integrity problems

Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly

Hard to add new constraints or change existing ones

Earlier System: File Management System (FMS)

The FMS was the first method used to store data in a Computerized Database. The data item is stored SEQUENTIALLY in one large file. And the data items are accessed using some Access Programs written in a Third Generation Language (3GL) such as BASIC, COBOL, FORTRAN, C and Visual Basic. A particular relationship cannot be drawn between the stored items other than the sequence in which it is stored. If a particular data item has to be located, the Search starts at the beginning and items are checked sequentially till the required items is found. (This is SEQUENTIAL Search)

Characteristics of FMS:

1. Sequential Record Access Method.

2. Meta-data is embedded into programs accessing files.

3. File System exhibit Structural Dependence (i.e. the access to the file is dependent on its Structure (Fields))

4. File System exhibit Data Dependence (i.e. change in data types of File Data item such as from “int” to “float” requires changes in all programs that access the File.)

(3)

Disadvantages of File System: (Purpose/Need of/for Database Systems) Drawbacks of using file systems to store data are:

1. Data Redundancy and Inconsistency:

In the File Processing System, the same information may appear in more than one file. For example the column “LOCATION” appears in two Files. So the same Information was duplicated in several places (in both the Files). This occurrence of same data column in several places (Files) is called Data Redundancy and it leads to Higher Storage, Access Cost and mainly Data Inconsistency. If suppose Mr. Abraham is transferred from “Chennai” to “Madurai”, the job of Data Processing Specialist is to update the “LOCATION” column exists in both the Files using Access Programs but unfortunately he updates the data item in File “ Employee’s Detail” only with File “Employee Location” Un-updated. Now, generating the Report yields Inconsistency results depending on which version of the data is used. This is called the Data Inconsistency and in simple terms it is the Lack of Data Integrity.

2. Difficulty in accessing Data:

Suppose the HR of the Company ask the Data Processing Department to generate the list of Employee whose Postal Code is Less than 66666666. Because this request was not anticipated when the original system was developed, there is no application program on hand to meet it. But there is an application program to generate the List of all Employees with their Postal Code. So the HR has to two Options: Either he can obtain the list of all Employees and Work out manually to find the persons, or he can ask the DP Specialist to write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose such a program is written, and several days later, the same Person needs to trim that list to include only those whose Salary is > $20000. As expected, a program to generate such a list does not exist. Again, the HR has the same two options, neither of which is satisfactory. There is a need to write a New Program to carry out Each New Task. So the Conventional File processing environments do not allow needed data to be retrieved in a convenient and efficient manner. 3. Data Isolation:

Because data are scattered in various files, and files may be in different formats, it is difficult to write new application programs to retrieve the appropriate.

4. Integrity Problems:

The data values stored in the Database must satisfy certain types of Consistency Constraints. For example, the minimum salary of any person should never fall below the prescribed amount (say, $7/hr). Developers should enforce these constraints in the system by adding appropriate code in the various applications programs. However, when new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files.

5. Atomicity of Updates:

A Computer System, like any other mechanical or electrical device, is subject to failure. In many applications, it is crucial to ensure that, once a failure has occurred and has been detected, the data are restored to the consistent state that existed prior to the failure. Consider a program to change the salary of “Mr. Sudarshan” from Rs.20000 to Rs.22000. If a system failure occurs during the execution the program, it is possible that the Rs.20000 was removed from the Salary column of “Mr. Sudarshan” but it is not added with the new value of Rs.22000, resulting in an Inconsistent Database State. Clearly, it is essential that either both the operation (Removing and Adding) occurs, or that neither occurs. That is the Salary Change must be Atomic—it must happen in its entirety or not at all. It is difficulty to ensure this property in a conventional File Processing System. In simpler words, the change of Salary should either Complete or not happen at all.

6. Concurrent –Access Anomalies: (Concurrent access by multiple users)

Many systems allow multiple users to update the data simultaneously, so that the overall performance of the system is improved and a faster response time is possible.

(4)

Uncontrolled concurrent accesses can lead to inconsistencies. For example, reading a balance by two people and updating it at the same Time can leave the System Inconsistent.

/*consider Bank account A, containing $500. If two customers withdraw funds $50 and $100 respectively from Account A at the same time, the result of the concurrent executions may leave the account in an Incorrect (or Inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If two programs run concurrently, they may both read the value $500, and write back $450 and $400, respectively. Depending on which one writes the value last, the account may contain $450 or $400, rather than the correct value of $350. To guard against this possibility, the system must maintain some form of Supervision. Because data may be accessed previously, however, Supervision is difficult to provide.*/ 7. Security Problems:

Not every user of the Database System should be able to access all the data. For example, in a University Database System, the Students need to see only that part of the database that has information about the various Courses and Syllabus available to them for Study. They do not need access to information about Personal details of Faculty Members. Since application programs are added to the System in an Ad Hoc (or Unplanned or Unprepared) Manner, it is difficulty to enforce such Security Constraints on Files.

8. Cost:

Since due to Redundancy, Inconsistency, Concurrent access and low level Security offered by FMS, the Higher Cost is involved in every area ranges from the Low Programmer Productivity to Maintenance.

Database systems offer solutions to all the above problems. These difficulties have prompted the development of DBMS.

History of Database Systems 1950s and early 1960s:

• Data processing using magnetic tapes for storage Tapes provide only sequential access • Punched cards for input

Late 1960s and 1970s:

• Hard disks allow direct access to data

• Network and hierarchical data models in widespread use • Ted Codd defines the relational data model

Would win the ACM Turing Award for this work IBM Research begins System R prototype

UC Berkeley begins Ingres prototype • for the era) transaction processing

1980s:

• evolve into commercial systems

SQL becomes industrial standard • Parallel and distributed database systems • Object-oriented database systems

1990s:

• Large decision support and data-mining applications • Large multi-terabyte data warehouses

• Emergence of Web commerce 2000s:

• XML and XQuery standards • Automated database administration

(5)

Levels of Abstraction

The major purpose of the database system is to provide users with an abstract view of the data.

• Physical level: describes how a record (e.g., customer) is stored.

• Logical level: describes what data are stored in database, and what relationships exists among the data.

type customer = record

customer_id : string; customer_name : string; customer_street : string; customer_city : integer; end;

• View level: it describes only part of the entire database. The application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes.

VIEW OF DATA: (Three Levels of Abstraction) An architecture for a database system

INSTANCES AND SCHEMAS Instances:

The collection of information stored in the database at a particular moment is called as an instance of the database.

Schema:

The overall design of the database is called as the database schema. Ie)

(6)

Example: The database consists of information about a set of customers and accounts and the relationship between them

Analogous to type information of a variable in a program

Physical schema: it specifies the database design at the physical level Logical schema: it specifies the database design at the logical level

Subschema : the different schemas at the view level are called as subschemas. Instance – the actual content of the database at a particular point in time

Analogous to the value of a variable

o The applications programs are depend on the logical schema. So the logical schema is most important.

o Physical Data Independence is the ability to modify the physical schema without changing the logical schema

o In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.

DATA MODELS

Definition: A collection of conceptual tools for describing • Data

• Data relationships • Data semantics • Data constraints

TYPES:

Relational model: Relational Model uses a collection tables to represent both data and the relationship among those data. It is an example of record-based model. The relational model is at a lower level of abstraction than E-R Model.

Entity-Relationship data model (mainly for database design) : it consists of a collection of basic objects called “Entities”, and the relationships among those entities.

Object-based data models (Object-oriented and Object-relational)

Semistructured data model (XML) Other older models:

o Network model

o Hierarchical model Hierarchical Model:

In this model, Data items are represented using Records and its Parent/Child (1:N) Relationship is implemented using Links. Each parent has many Child but each Child may have only one Parent. Records are organized as collection of Trees. This model supports only One-to Many Relationship.

Advantages of OO Data Model:

• Structural Independence and Data

Independence

• Addition of Semantic content to the data

model gives the data greater meaning

• Easier to visualize more complex

relationships within and between objects

• Database Integrity is protected by the use of

Inheritance

Disadvantages of OO Data Model:

• No Standard Data Access

method

• Implementation requires

substantial Hardware and O.S Overhead

(7)

Network Model:

In this model, data are represented as collection of records, and the Parent/Child Relationship is implemented using Links (Pointer) and too Ring Structures. This model supports Many-to-Many Relationship. This model is also called as CODASYL or DBTG model. The records in the database are organized as collections of arbitrary graphs.

Database Languages:

Data Definition Language (DDL):

Specification notation for defining the database schema Example: create table account ( account-number char(10),

balance integer)

DDL compiler generates a set of tables stored in a data dictionary or data directory. Data dictionary contains metadata (i.e., data about data)

• Database schema

• Data storage and definition language

Specifies the storage structure and access methods used • Integrity constraints

Domain constraints

Referential integrity (references constraint in SQL) Assertions

• Authorization

Data Manipulation Language (DML):

DML is a Language for accessing and manipulating the data organized by the appropriate data model. DML also known as query language

Advantages of Hierarchical Model:

• Data Sharing

• Data Independence

• Parent/Child Relationship

promotes Database Integrity

• Efficient when dealing with

a Large Database

• Large Installed (Mainframe)

Base

• Abundant Business

Applications

Disadvantages of Hierarchical Model:

• Requirement of Knowledge of physical level

of data storage

• Complex and Inflexible to manage the

database

• Lacks Structural Independence

• Time Consuming and Complicated

Application Programming

• Lack Ad Hoc Query capability for End

Users

• Extensive Programming Activity is required

• No DDL or DML in DBMS

Advantages of Network Model:

• Easier M:N relationship

Implementation

• Better Data Access comparing

Hierarchical Data Model

• Sufficient Data Independence

• Enforced Data Integrity

Disadvantages of Network Model:

• Use of Pointers leads to Complexity in

Structure. As a result of the increased Complexity, Mapping of related record becomes very difficult.

• Difficult to make change in the database

(Lacks Structural Independence)

(8)

Attributes Two classes of DMLs are:

• Procedural DMLs require a user to specify what data is required and how to get those data

• Declarative (nonprocedural) DMLs require a user to specify what data is required without specifying how to get those data. It is easy to learn and use DMLs.

• A query is a statement requesting the retrieval of information. • SQL is the most widely used query language

RELATIONAL MODEL:

Advantages of Relational Model:

• Data Independence and Structural Independence • Easy to design the Database and to manage its contents • Less Programming Effort required

• Powerful and flexible Query capability (SQL) • Ad Hoc capability

Disadvantages of Relational Model:

• It tends to slower than other Database System

• RDBMS requires substantial Hardware and Operating System Overhead Example of tabular data in the relational model

(9)

SQL

SQL: widely used non-procedural language

Example 1: Find the name of the customer with customer-id 192-83-7465 select customer.customer_name

from customer

where customer.customer_id = ‘192-83-7465’

Example 2: Find the balances of all accounts held by the customer with customer-id 192-83-7465

select account.balance from depositor,account

where depositor.customer_id=‘192-83-7465’and depositor.account_number = account.account_number

Application programs generally access databases through one of Language extensions to allow embedded SQL

Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database

Database Design

: The process of designing the general structure of the database:

Logical Design – Deciding on the database schema. Database design requires that we find a “good” collection of relation schemas.

Business decision – What attributes should we record in the database? Computer Science decision – What relation schemas should we have

and how should the attributes be distributed among the various relation schemas?

Physical Design – Deciding on the physical layout of the database ER Model:

It Models an enterprise as a collection of entities and relationships

Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects. It is described by a set of attributes.

Relationship: an association among several entities Represented diagrammatically by an entity-relationship diagram:

(10)

Advantages of ER Model:

o Structural Independence and Data Independence

o ER model gives Database Designer, Programmers, and end users an easily understood Visual Representation of the data and the data relationship.

o ER model is well integrated with the Relational database model

o Visual modeling helps in conceptual simplicity Disadvantages of ER Model:

o No DML

o Limited Constraint Representation

o Limited Relationship Representation

o Loss of Information Content because attributes are usually removed to avoid crowed displays

Object-Relational Data Models

Extend the relational data model by including object orientation and constructs to deal with added data types.

Allow attributes of tuples to have complex types, including non-atomic values such as nested relations.

Preserve relational foundations, in particular the declarative access to data, while extending modeling power.

Provide upward compatibility with existing relational languages. XML: Extensible Markup Language

Defined by the WWW Consortium (W3C)

Originally intended as a document markup language not a database language

The ability to specify new tags, and to create nested tag structures made XML a great way to exchange data, not just documents

XML has become the basis for all new generation data interchange formats.

A wide variety of tools is available for parsing, browsing and querying XML documents/data

Database Architecture

Database Users and Administrators:

The people who work with the database can be categorized as database users or database administrator.

Database Users:

Users are differentiated by the way they expect to interact with the system • Application programmers – interact with system through DML calls • Sophisticated users – form requests in a database query language

• Specialized users – write specialized database applications that do not fit into the traditional data processing framework

• Naïve users – invoke one of the permanent application programs that have been written previously

(11)

Database Administrator

Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.

Functions of the DBA:

• Schema Definition: The DBA creates the original database schema by writing a set of definitions that is translated by the DDL Compiler to a set of tables that is stored permanently in the Data Dictionary.

• Storage Structure and Access-Method Definition: The DBA creates appropriate storage structures and access methods by writing a set of definitions, which is translated by the Data Storage and DDL Compiler.

• Schema and Physical-Organization Modification: Programmers accomplish the relatively rare modifications either to the database Schema or to the description of the Physical Storage Organization by writing a set of definitions that is used by either DDL Compiler or the Data-Storage and DDL Compiler to generate modifications to the appropriate Internal System Tables ( Eg: Data Dictionary)

• Granting of Authorization for Data Access: The granting of different types of Authorization allows the DBA to regulate which parts of the Database various Users can Access. The Authorization information is kept in a special system Structure that is consulted by the DB System whenever access to the Data is attempted in the system.

• Integrity-Constraint Specification: The data values stored in the database must satisfy certain Consistency Constraints. For example, the Salary of the any employee(Programmer) in an organization should never fall below some limit(say, $4000 / month). This constraint must be specified explicitly by the DBA. And these Integrity Constraints are kept in a Special System Structure that is consulted by the DB System whenever an update takes place in the System.

• Routine Maintenance: Examples of the DBA’s routine maintenance activities are:

o Periodically backing up the Database, either onto Tapes or onto Remote servers, to prevent Loss of Data in case of Disasters such as Flooding.

o Ensuring that enough free Disk Space is available for normal operations, and updating disk space as required.

• Monitoring jobs running on the Database and ensuring that performance is not degraded by very expensive tasks submitted by some users.

Transaction management:

A transaction is a collection of operations that performs a single logical function in a database application

Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

The architecture of database systems is greatly influenced by the underlying computer system on which the database is running:

(12)

• Centralized • Client-server

• Parallel (multi-processor) • Distributed

System Structure/Architecture of a Database System (DBMS Components):

Usually, Database System is partitioned into various modules of functions. Some of the functions of the database System is provided by the Operating System. So, the design of the Database System must include consideration of the interface between the Database system and OS.

The functional components of the Database System are: 1. Query Processor

2. Storage Manager 1. Query Processor:

This module contain the following Components:

a. DML Compiler: This component translates the DML Statements in a Query Language into the Low-Level instructions that the Query Evaluation Engine understands. And too, it attempts to translate transform a user’s request into an equivalent and efficient form, thus finding a good strategy for executing the query (Query Optimization).

b. Embedded DML Precompiler: This Component converts DML statements embedded in an Application Program to normal Procedure Calls in the Host Language. And it interacts with the DML Compiler to generate the appropriate Code.

c. DDL Interpreter: This Component interprets DDL statements and records them in a Set of Tables containing Metadata (Data Dictionary)

d.Query Evaluation Engine: This component executes Low-Level Instructions generated by the DML Compiler.

2. Storage Manager:

It is a Program Module that provides the interfaces between the Low-Level data stored in the database and the application programs and queries submitted to the System. And it is responsible for the interaction with the File Manager. The raw data are stored on the disk using the File System, which is provided by the OS. The Storage Manager translates the various DML statements into Low-Level File-System commands. Thus, the Storage Manager is responsible for Storing, retrieving, and updating data in the database. The components present in this module are:

a.Authorization and Integrity Manager: This component tests for the satisfaction of the Integrity Constraints and checks the authority of the users to access the data. b.Transaction Manager: This component ensures that the database remains in a

consistent (correct) state despite System Failure, and that Concurrent Transaction executions proceed without conflicting.

c. File Manager: This component manages the allocation of the space on the Disk Storage and the Data Structures used to represent information stored on the Disk. d.Buffer Manager: This component is responsible for Fetching data from the Disk

Storage into Main Memory, and deciding what data to cache in Memory. This is a Critical part of the database, since it enables the database to handle data sizes that are much larger than the size of Main Memory.

(13)

Overall System Structure

Data Structures used by the Storage Manager for the Physical System Implementation of Database System:

1.Data Files: It Stores the Database itself.

2.Data Dictionary: It stores the Metadata about the Structure of the database, in particular the Schema of the Database. Since it is heavily used, greater emphasis should be placed in developing a good Design and Implementation of it.

3.Indices: It provides fast access to data items that hold particular values.

4. Statistical Data: It stores Statistical Information about the data in the Database. This information is used by the Query Processor to select efficient ways to execute a Query.

Data Dictionary:

It is a data structure used by Storage Manager to store Metadata (data about the data) that is Structure of the database, in particular the Schema of the Database. Since it is heavily used, greater emphasis should be placed in developing a good Design and Implementation of it.

E-R Model:

It Models an enterprise as a collection of entities and relationships

Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects. It is described by a set of attributes.

Relationship: an association among several entities Entity Sets

• An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant

(14)

• Entities have attributes

Example: people have names and addresses

• An entity set is a set of entities of the same type that share the same properties. Example: set of all persons, companies, trees, holidays

• An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.

• Domain – Each entity has a set of values for each of its attributes. The set of permitted values for each attribute is known as domain

• Attribute types ( in E-R model):

• Simple and composite attributes:

Simple Attributes: The attributes which can not be divided into subparts. Ex. Roll number

Composite Attributes: The attributes which can be divided into subparts. Ex. Address (Door number, street name, city, state etc.) • Single-valued and multi-valued attributes

Single Valued Attributes: the attributes which are having single value for a particular entity. Ex. Loan number

Multi valued Attribute: the attributes which are having more than one value. Ex. A person may have more than one phone number

• Derived attributes: the value of this type of attribute can be computed from other attributes. E.g. age can be computed from the date of birth.

An attribute may take null values when the values are not known for the entities.

Relationship Sets

■ A relationship is an association among several entities Example:

Hayes depositor A-102

customer entity relationship set account entity

■ A relationship set is a mathematical relation among n ≥ 2 entities, each taken from entity sets

{(e1, e2, … en) | e1 ∈ E1, e2 ∈ E2, …, en ∈ En} where (e1, e2, …, en) is a relationship

★ Example:

(Hayes, A-102) ∈ depositor

Relational Model

Basic Structure:

Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn

Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai ∈ Di Example: If

customer_name = {Jones, Smith, Curry, Lindsay} customer_street = {Main, North, Park}

customer_city = {Harrison, Rye, Pittsfield} Then r = { (Jones, Main, Harrison),

(15)

(Curry, North, Rye),

(Lindsay, Park, Pittsfield) } is a relation over

customer_name x customer_street x customer_city

Attribute Types:

Each attribute of a relation has a name

• The set of allowed values for each attribute is called the domain of the attribute

• Attribute values are (normally) required to be atomic; that is, indivisible Note: multivalued attribute values are not atomic

Note: composite attribute values are not atomic • The special value null is a member of every domain

• The null value causes complications in the definition of many operations

Relation Schema

A1, A2, …, An are attributes

R = (A1, A2, …, An ) is a relation schema Example:

Customer_schema = (customer_name, customer_street, customer_city)

r(R) is a relation on the relation schema R Example:

customer (Customer_schema)

Relation Instance

The current values (relation instance) of a relation are specified by a table An element t of r is a tuple, represented by a row in a table

Keys

Let K ⊆ R

K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R)

• by “possible r ” we mean a relation r that could exist in the enterprise we are modeling.

Example: {customer_name, customer_street} and {customer_name}

are both superkeys of Customer, if no two customers can possibly have the same name. K is a candidate key if K is minimal

Example: {customer_name} is a candidate key for Customer, since it is a

superkey (assuming no two customers can possibly have the same name), and no subset of it is a superkey.

(16)

Query Languages

Language in which user requests information from the database. Categories of languages

Procedural

Non-procedural, or declarative “Pure” languages:

Relational algebra Tuple relational calculus Domain relational calculus

Pure languages form underlying basis of query languages that people use.

Relational Algebra:

Procedural language Six basic operators

• select: σ • project: ∏ • union: ∪ • set difference: – • Cartesian product: x • rename:

ρ

The operators take one or two relations as inputs and produce a new relation as a result.

Select Operation – Example

Relation r

σA=B ^ D > 5 (r)

Select Operation:

Notation:

σ

p(r)

p is called the selection predicate Defined as:

σ

p(r) = {t | t ∈ r and p(t)} A B C D

α

β

α

β

1 5 12 23 7 7 3 10 A B C D

α

β

α

β

1 2 3 7 1 0

(17)

Where p is a formula in propositional calculus consisting of terms connected by : ∧ (and), ∨ (or), ¬ (not)

Each term is one of:

o <attribute> op <attribute> or <constant>

o where op is one of: =, ≠, >, ≥. <. ≤

Example of selection:

σ

branch_name=“Perryridge”(account) Project Operation – Example

Relation r ∏∏∏∏A,C (r)

Notation: where A1, A2 are attribute names and r is a relation name.

The result is defined as the relation of k columns obtained by erasing the columns that are not listed

Duplicate rows removed from result, since relations are sets

Example: To eliminate the branch_name attribute of account

∏account_number, balance (account) Union Operation – Example

Notation: r ∪ s

Defined as: r ∪ s = {t | t ∈ r or t ∈ s} For r ∪ s to be valid.

• r, s must have the same arity (same number of attributes)

• The attribute domains must be compatible (example: 2nd column of r deals with the same type of values as does the 2nd column of s)

Example: to find all customers with either an account or a loan ∏customer_name (depositor) ∪ ∏customer_name (borrower)

Relation: r s rUs A B C

α

β

10 20 30 40 1 1 1 2 A C

α

β

1 1 1 2 A C

α

β

1 1 2 = A B

α

β

2 3 A B 1 2 1

α

β

A B

α

β

1 2 1 3

(18)

Set Difference Operation – Example Relation:r s Notation r – s Defined as: o r – s = {t | t ∈ r and t ∉ s}

Set differences must be taken between compatible relations.

o r and s must have the same arity

o attribute domains of r and s must be compatible

Cartesian-Product Operation – Example

Notation r x s Defined as:

r x s = {t q | t ∈ r and q ∈ s}

Assume that attributes of r(R) and s(S) are disjoint. (That is, R ∩ S =

∅

). If attributes of r(R) and s(S) are not disjoint, then renaming must be used.

r – s:

_A _B

α

β

1 1 A B

α

β

1 2 1 A B

α

β

2 3

(19)

Composition of Operations

• Can build expressions using multiple operations • Example: σA=C(r x s)

r x s

σ

A=C(r x s)

Rename Operation

• Allows us to name, and therefore to refer to, the results of relational-algebra expressions.

• Allows us to refer to a relation by more than one name. Example:

ρ

x (E) A B

α

β

1 1 1 1 2 2 2 2 C D 10 10 20 10 10 10 20 10 E a a b b a a b b

α

β

γ

α

β

γ

A B

α

β

1 2 A B

α

β

1 1 1 1 2 2 2 2 C D

α

β

γ

α

β

γ

10 10 20 10 10 10 20 10 E a a b b a a b b C D

α

β

γ

10 10 20 10 E a a b b

r

s

r x s

A B C D E a a b

α

β

1 2 2

α

β

10 10 20 =

(20)

returns the expression E under the name X • If a relational-algebra expression E has arity n, then

returns the result of expression E under the name X, and with the attributes renamed to A1 , A2 , …., An .

Example queries:

• Find all loans of over $1200 : σamount > 1200 (loan)

• Find the loan number for each loan of an amount greater than $1200: ∏loan_number (σamount > 1200 (loan))

• Find the names of all customers who have a loan, an account, or both, from the bank: ∏customer_name (borrower) ∪ ∏customer_name (depositor) • Find the names of all customers who have a loan and an account at bank.

∏customer_name (borrower) ∩ ∏customer_name (depositor) • Find the names of all customers who have a loan at the Perryridge branch.

∏customer_name (σbranch_name=“Perryridge”

(

σ

borrower.loan_number = loan.loan_number(borrower x loan))) • Find the names of all customers who have a loan at the Perryridge branch but do

not have an account at any branch of the bank.

∏customer_name (σbranch_name = “Perryridge” (σborrower.loan_number = loan.loan_number(borrower x loan))) – ∏customer_name(depositor) • Find the names of all customers who have a loan at the Perryridge branch. • Query 1

∏customer_name (σbranch_name = “Perryridge” (

σborrower.loan_number = loan.loan_number (borrower x loan))) • Query 2

∏customer_name(σloan.loan_number = borrower.loan_number ( (σbranch_name = “Perryridge” (loan)) x borrower)) • Find the largest account balance

Strategy:

o Find those balances that are not the largest

o Rename account relation as d so that we can compare each account balance with all others

o Use set difference to find those account balances that were not found in the earlier step.

o The query is:

balance(account) - ∏account.balance

(σaccount.balance < d.balance (account x rd (account)))

Formal Definition:

• A basic expression in the relational algebra consists of either one of the following: A relation in the database

A constant relation ) ( ) ,..., , (A1A2 An E x ρ

(21)

• Let E1 and E2 be algebra expressions; the following are all relational-algebra expressions:

E1 ∪ E2 E1 – E2 E1 x E2

σ

p (E1), P is a predicate on attributes in E1

∏s(E1), S is a list consisting of some of the attributes in E1

ρ

x (E1), x is the new name for the result of E1

Additional Definition:

e define additional operations that do not add any power to the relational algebra, but that simplify common queries.

Set intersection Natural join Division Assignment Set-Intersection Operation Notation: r ∩ s Defined as: r ∩ s = { t | t ∈ r and t ∈ s } Assume:

o r, s have the same arity

o attributes of r and s are compatible Note: r ∩ s = r – (r – s) Example: R s r∩∩∩∩ s Natural-Join Operation: Notation: r s

o Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R ∪ S obtained as follows:

o Consider each pair of tuples tr from r and ts from s.

o If tr and ts have the same value on each of the attributes in R ∩ S, add a tuple t to the result, where

t has the same value as tr on r t has the same value as ts on s Example:

R = (A, B, C, D) S = (E, B, D)

Result schema = (A, B, C, D, E) A B

α

β

1 2 1 A B

α

β

2 3 A B α 2

(22)

r s is defined as: ∏r.A, r.B, r.C, r.D, s.E (σr.B = s.B ∧ r.D = s.D (r x s)) Example: Relation r s Division Operation • Notation: r ÷ s

• Suited to queries that include the phrase “for all”.

• Let r and s be relations on schemas R and S respectively where R = (A1, …, Am , B1, …, Bn )

S = (B1, …, Bn)

The result of r ÷ s is a relation on schema R – S = (A1, …, Am)

r ÷ s = { t | t ∈ ∏ R-S (r) ∧ ∀ u ∈ s ( tu ∈ r ) }

Where tu means the concatenation of tuples t and u to produce a single tuple

Relation r s r ÷ s A B 1 2 4 1 2 C D

α

γ

β

γ

β

a a b a b

α

β

γ

α

δ

1 3 1 a a a A B

α

δ

1 1 1 1 2 C D

α

γ

β

a a a a b E

α

γ

α

γ

δ

B D E

α

β

γ

(23)

Properties:

• Property

Let q = r ÷ s

Then q is the largest relation satisfying q x s ⊆ r

• Definition in terms of the basic algebra operation Let r(R) and s(S) be relations, and let S ⊆ R

r ÷ s = ∏R-S (r ) – ∏R-S ( ( ∏R-S (r ) x s ) – ∏R-S,S(r )) To see why

∏R-S,S (r) simply reorders attributes of r

∏R-S (∏R-S (r ) x s ) – ∏R-S,S(r) ) gives those tuples t in ∏R-S (r ) such that for some tuple u ∈ s, tu ∉ r.

Assignment Operation

• The assignment operation (←) provides a convenient way to express complex queries.

Write query as a sequential program consisting of a series of assignments

followed by an expression whose value is displayed as a result of the query.

Assignment must always be made to a temporary relation variable. Example: Write r ÷ s as

temp1←∏R-S(r)

temp2←∏R-S ((temp1xs) – ∏R-S,S (r )) result = temp1 – temp2

o The result to the right of the ← is assigned to the relation variable on the left of the ←.

o May use variable in subsequent expressions.

B 1 2 A B α α α β γ δ δ δ ∈ ∈ β 1 2 3 1 1 1 3 4 6 1 2 A

α

β

=

(24)

24 EXTENDED RELATIONAL-ALGEBRA-OPERATIONS Generalized Projection Aggregate Functions Outer Join Generalized Projection

Extends the projection operation by allowing arithmetic functions to be used in the projection list.

E is any relational-algebra expression

Each of F1, F2, …, Fn are are arithmetic expressions involving constants and attributes in the schema of E.

Given relation credit_info(customer_name, limit, credit_balance), find how much more each person can spend:

∏customer_name, limit – credit_balance (credit_info) Aggregate Functions and Operations

• Aggregation function takes a collection of values and returns a single value as a result. avg: average value

min: minimum value max: maximum value sum: sum of values count: number of values • Aggregate operation in relational algebra E is any relational-algebra expression

G1, G2 …, Gn is a list of attributes on which to group (can be empty) Each Fi is an aggregate function

Each Ai is an attribute name

g sum(c) (r) Relation r: • Result of aggregation does not have a name

Can use rename operation to give it a name

For convenience, we permit renaming as part of aggregate operation

Outer Join

• An extension of the join operation that avoids loss of information.

• Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.

• Uses null values:

null signifies that the value is unknown or does not exist

All comparisons involving null are (roughly speaking) false by definition. We shall study precise meaning of comparisons with nulls later

Relation loan Relation borrower

) ( ,..., , 2 1 F Fn E F ∏ A B C

α

β

α

β

7 7 3 10 sum(c ) 27

branch_name

g

sum(balance) as sum_balance

(account)

L-170 loan_numbe

r customer

(25)

Inner join

Loan Borrower

Left outer Join

Loan Borrower

Null Values

It is possible for tuples to have a null value, denoted by null, for some of their attributes

null signifies an unknown value or that a value does not exist. The result of any arithmetic expression involving null is null. Aggregate functions simply ignore null values (as in SQL)

For duplicate elimination and grouping, null is treated like any other value, and two nulls are assumed to be the same (as in SQL)

Comparisons with null values return the special truth value: unknown

o If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5

Three-valued logic using the truth value unknown:

o OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown

o AND: (true and unknown) = unknown, (false and unknown) = false,

(unknown and unknown) = unknown

o NOT: (not unknown) = unknown

o In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown Result of select predicate is treated as false if it evaluates to unknown

3000 4000 1700 loan_nu mber L-170 L-230 L-260 branch_na me amount Downtown Redwood Perryridge loan_number amount L-170 L-230 3000 4000 customer_name Jones Smith branch_name Downtown Redwood loan_number amount L-170 L-230 3000 4000 customer_name Jones Smith branch_name Downtown Redwood Jones Smith Hayes

(26)

Modification of the Database

The content of the database may be modified using the following operations: Deletion

Insertion Updating

All these operations are expressed using the assignment operator. Deletion

A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected tuples are removed from the database.

Can delete only whole tuples; cannot delete values on only particular attributes A deletion is expressed in relational algebra by:

r ← r – E

where r is a relation and E is a relational algebra query.

Example:

Delete all account records in the Perryridge branch.

ccount ← account – σ branch_name = “Perryridge” (account ) Delete all loan records with amount in the range of 0 to 50

oan ← loan – σ amount

≥

0and amount

≤

50 (loan) Delete all accounts at branches located in Needham.

r1 ← σ branch_city = “Needham” (account branch ) r2 ← ∏branch_name, account_number, balance (r1) r3 ← ∏ customer_name, account_number (r2 depositor) account ← account – r2

depositor ← depositor – r3

Insertion

To insert data into a relation, we either:

o specify a tuple to be inserted

o write a query whose result is a set of tuples to be inserted in relational algebra, an insertion is expressed by:

← r ∪ E

here r is a relation and E is a relational algebra expression.

The insertion of a single tuple is expressed by letting E be a constant relation containing one tuple.

Example:

Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch.

ccount ← account ∪ {(“Perryridge”, A-973, 1200)} epositor ← depositor ∪ {(“Smith”, A-973)}

Updating

A mechanism to change a value in a tuple without charging all values in the tuple Use the generalized projection operator to do this task

Each Fi is either

(27)

o if the attribute is to be updated Fi is an expression, involving only constants and the attributes of r, which gives the new value for the attribute

Example:

Make interest payments by increasing all balances by 5 percent.

ccount ← ∏ account_number, branch_name, balance * 1.05 (account)

Relational Calculus:

It is a Non-Procedural Query Language associated with the Relational Model It is a formal Query Language where we write declarative expression to specify a

retrieval request and hence there is no description of how to evaluate a Query. So it is also called as Declarative Query Language

A Calculus Expression specifies what is to be retrieved rather than how to retrieve it.

The Relational Calculus had a big influence in the Design of Commercial Query Languages such as SQL and, especially Query-by-Example (QBE)

Types of Relational Calculus:

o Tuple Relational Calculus (TRC)

o Domain Relational Calculus (DRC)

Tuple Relational Calculus (TRC):

It is a Non-Procedural/Declarative Query Language A Variables in TRC take on Tuples as Values TRC has had a more influence on SQL

A Query in a TRC is expressed as { t | P(t) } Where t a tuple variable

P(t) Predicate Calculus Formula/ Condition

A tuple variable can ranges over a particular database relation ie it can take any individual tuple from that relation as its value

The result of the query is set of all tuples t that satisfy the Condition/ formula P(t)

Domain Relational Calculus (DRC):

It is a Non-Procedural/Declarative Query Language This method uses domain variables

Domain variables take values from an attributes’ domain, rather than values from an entire tuple

DRC has had a more influence on QBE

An expression in DRC has the general form of

{ < x1, x2 , x3 , ….., xn > | P( x1, x2 , x3 , ….., xn ) } where

x1, x2 , x3 , ….., xn domain variables ranges over domains of attributes P Formula/ Condition

(28)

UNIT-II SQL:

Introduction:

It is the most widely used Commercial and Standard Relational Database Language

It is originally developed at IBM’s San Jose Research Laboratory

This language is originally called as SEQUEL and implemented as part of the System R project (1974—1977)

Almost other vendors introduced DBMS products based on SQL, and it is now a de facto standard

Now the SEQUEL is called as SQL(Structured Query Language)

In 1986, the ANSI(American National Standards Institute) and ISO (International Standards Organization) published an SQL standard, called SQL86

IBM published its own corporate SQL standard, the Systems Application Architecture Database Interface (SAA-SQL) in 1987

In 1989, an extended standard for SQL was published SQL-89 and the Database System available today supports at least the features of SQL-89

And in 1992, ANSI/ISO proposed a new SQL standard namely SQL-92

The Current Standard is SQL99 or SQL 1999. And in this Standard, the “Object Relational” concepts have been added.

Foreseen Standard is SQL:200x , which is in draft form.

SQL can either be specified by a command-line tool or it can be embedded into a general purpose programming language such as Cobol, "C", Pascal, etc.

Parts in SQL Language:

The SQL Language has 7 parts:

1. Data Definition Language (DDL): It provides commands for defining relation schemas, deleting relations, creating indices, and modifying relation schemas 2. Interactive Data-Manipulation (DML): It includes a Query Language based on

both the relational algebra and the Tuple Relational Calculus. It includes commands to insert tuples into, delete from, and to modify tuples in the database 3. Embedded DML: This form of SQL is designed to use within general purpose

Programming Languages, such as PL/I, COBOL, PASCAL, FORTRAN, and C 4. View Definition: The SQL DDL includes commands for defining View

5. Authorization: The SQL DDL includes commands for specifying access rights to relations and Views

6. Integrity: The SQL DDL includes commands for specifying Integrity Constraints that the data stored in the database must satisfy. Updates that violates the ICs are disallowed

7. Transaction Control: SQL includes commands for specifying the beginning and Ending of Transactions.

Basic Structure of SQL:

The basic structure of an SQL expression consists of three clauses: Select, from and where.

Select clause corresponds to the Projection operation of the Relational Algebra. It is used to list the desired attributes in the result relation.

From clause corresponds to the Cartesian Product operation of the Relational Algebra. It lists the relations to be scanned in the evaluation of expression.

(29)

Where clause corresponds to the Selection Predicate of the Relational Algebra. It consists of a predicate involving attributes of the relations that appear in the from clause

A typical SQL query has the form : Select A1, A2 , A3 , …….., An From r1 , r2 , ……, rm Where P A1, A2 , A3 , …….., An Attributes r1 , r2 , ……, rm Relations P Predicate

The Relational algebra equivalent of the above query is

∏A1, A2 , A3 , …….., An₍P (r1 × r2 ×……× rm))

The SELECT Clause:

1. Write the Description 2. Write the Syntax

3. Explain the use of distinct , * , and all keywords 4. Provide at least 4 examples Select Queries The WHERE Clause:

1. Write the Description

2. Write the Syntax (Select A1, A2 ,….., An from r1 , r2 , ……, rm where P ) 3. List the Logical Connectives that can be used in where clause

4. Provide atleast 3 example queries that use Logical connectives 5. Explain the use of between and and keywords

6. Provide atleast 3 example queries that use between and and keywords 7. Explain the use of not between keywords

8. Provide atleast 2 example queries that use not between clause The FROM Clause:

1. Write the description

2. Write the Syntax (Select A1, A2 ,….., An from r1 , r2 , ……, rm where P ) 3. Provide atleast 3 example queries that involves single relation/table only 4. Provide atleast 3 example queries that involves two relations/tables CREATE TABLE:

An SQL relation is defined using the create table command: create table r (A1 D1, A2 D2, ..., An Dn,

(integrity-constraint1), ..., (integrity-constraintk))

• r is the name of the relation

• each Ai is an attribute name in the schema of relation r • Di is the data type of values in the domain of attribute Ai Example:

create table branch (branch_namechar(15) not null, branch_city char(30), assets integer)

Example: Declare branch_name as the primary key for branch and ensure that the values of assets are non-negative.

(30)

create table branch (branch_namechar(15), branch_city char(30), assets integer, primary key (branch_name))

DROP AND ALTER TABLE:

The drop table command deletes all information about the dropped relation from the database.

The alter table command is used to add attributes to an existing relation:

alter table r add A D

where A is the name of the attribute to be added to relation r and D is the domain of A.

The alter table command can also be used to drop attributes of a relation: alter table r drop A

where A is the name of an attribute of relation r

Rename Operation:

SQL provides mechanism for renaming both relations and attributes General form is : oldname as newname

The as clause can appear in both the select and from clause Provide atleast 3 example queries

String Operation:

Some of the pattern-matching operators available in SQL are a. Like it a comparison operator

b. Percent(%) _{% character matches any substring} c. Underscore(_) _ character matches any character Provide atleast 3 example queries

SQL allows us to search for mismatches instead of matches by using the not like SQL also permits a variety of functions on character strings, such as concatenation

(||), extracting substrings , finding the length of strings, converting between uppercase and lowercase

Provide atleast 3 example queries which we saw in the cs235 lab String Functions Tuple Variable:

Tuple variables are defined in the from clause via the use of the as clause.

Example: Find the customer names and their loan numbers for all customers having a loan at some branch.

select customer_name, T.loan_number, S.amount from borrower as T, loan as S

where T.loan_number = S.loan_number

Ordering the Display of Tuples:

Tuples present in the relation can sorted

SQL provides the order by clause to arrange the tuples in result relation in sorted order

Provide atleast 3 example queries that uses order by clause for sorting in ascending order

Explain the use of desc keyword

Provide atleast 3 example queries which uses order by with desc Set Operations:

1. UNION operation a. UNION syntax b. UNION ALL syntax Eaxmple:

(31)

Find all customers who have a loan, an account, or both: (select customer_name from depositor)

union

(select customer_name from borrower)

2. INTERSECT Operation a. INTERSECT syntax b. INTERSECT ALL syntax

Example: Find all customers w`ho have both a loan and an account. (select customer_name from depositor)

intersect

3. EXCEPT Operation a. EXCEPT Syntax b. EXCEPT ALL syntax

Example: Find all customers who have an account but no loan. (select customer_name from depositor)

except

Aggregate Operation:

Aggregation functions are 1. Average : avg 2. Minimum : min 3. Maximum : max 4. Total : sum 5. Count : count View:

Views are “virtual relations” defined by a Query expression

Any relation that is not part of the logical model but is made visible to a user as virtual relation, is called a View

Views are useful mechanism for simplifying database queries, but modification of the database through views nay have potential disadvantageous consequences. Why we need a View?

It is not desirable for all users to see the entire logical model that is certain data has to be hided from users for Security concern

To create a personalized collection of relations that is matched with user’s intuition rather than the entire Logical model

View definition:

We can define/create a view using the create view statement. Syntax: create view <view name> as <query expression>

where <query expression> _{any relational Algebra expression}

View name can be used in places wherever a relation name can be allowed Example:1: create view vstud as Пrollno,sname (Stud)

Example:2: create view vstudfees as Пrollno,sname,(total—paid) AS feebalance (Studfees)

(32)

Example:4: create view vstudperc as Π_{rollno,sname(} σ_{pcen>85 (StudPercent) )} Consider the following view definition

Create view vstudper as Π_{rollno,pcen(StudPercent)}

Here if there is any modification(Insertion, Deletion or Update) in relation studpercent, then the set of tuples in the view vstudper also changes. So at any given time, the set of tuples in the view relation is defined as the result of evaluation of the Query expression that defines the view at that time.

Updates through Views and Null Values:

Although views are a useful tool for queries, they present significant problems if Updates, Insertions, or Deletions are expressed with them

The difficulty is that a modification to database expressed in terms of a view must be translated o modification to the actual relations in the Logical model of the database

Consider the relation studfees(rollno,name,feebal) and following view definition Create view vstudfees as Π_{rollno,name(Studfees)}

Suppose we plan to insert the following tuple into the view vstudfees vstudfees ∪ { (100190, “kumar” ) }

This insertion must also take place in the relation studfees since the view vstudfees is constructed from this relation.

But to insert a tuple into the original relation, we need a value for feebal. There are two approaches to deal with the insertion

o Reject the insertion, and return an error message to the user

o Insert a tuple (100190, “kumar”, null) into the relation studfees

Due to these problems, modifications are generally not permitted on the views, except in limited cases.

Null Values:

• It is possible for tuples to have a null value, denoted by null, for some of their attributes

• null signifies an unknown value or that a value does not exist. • The predicate is null can be used to check for null values.

o Example: Find all loan number which appear in the loan relation with null values for amount.

select loan_number from loan

where amount is null

• The result of any arithmetic expression involving null is null

o Example: 5 + null returns null

• However, aggregate functions simply ignore nulls • Any comparison with null returns unknown

o Example: 5 < null or null <> null or null = null • Three-valued logic using the truth value unknown:

o OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown

o AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown

o NOT: (not unknown) = unknown