Part 4: Database Language - SQL

(1)

Junping Sun Database Systems 4-1

Part 4: Database Language - SQL

Database Languages and Implementation Data Model

Data Model = Data Schema + Database Operations + Constraints

• Database Languages such as SQL and QUEL can be viewed as a tool to implement database schema and data operations at logical or implementation level.

• Database Language = Database Definition Language (DDL) + Database Manipulation Language (DML) • DDL implements database schema

• DML implements database operations

• Separation of DDL and DML is the major distinction between the application systems developed by database languages and developed by programming languages.

(2)

SQL - Structural Query Language

SQL:

• It is the most accepted and implemented interface language for relational database systems(intergalactic dataspeak).

History of Relational Database Languages: • SEQUEL (1974 -- 1975)

• It was the Application Programing Interface (API) to System R.

• It was revised to SEQUEL/2 after several years, and later SEQUEL/2 was changed to SQL.

• SQL/DS (1981) • DB2 (1983)

• SQL (ANSI-86) the first standardized version of SQL, called SQL1 • SQL (ANSI-89)

• SQL (ANSI-92), called SQL2

• SQL3, support recursive operation and object-oriented paradigm

• SQL-99 Standard

Data Definition

Schema Definition at Three Level of Databases: View data schema (table) definition:

A view table can be defined on the top of one or more base table Base data table schema definition:

A base table is corresponding to one physical data file in the storage system. Physical

• Each base table can be stored in different type of storage schema or data organization structure such as

sequential file, hash index, ISAM, VSAM

B-Tree, B+_{-Tree, B}*_{-Tree, K-D Tree, KDB Tree, R-Tree, R}+_{-Tree, R}*_-Tree • Integrity constraints on schema

• Authorization, and security mechanism on user defined database operations such as query, update, and insert/delete operations.

(3)

Data Definition

Create Statements:

• create table statement (to define a base table)

• create index statement (to define an index at internal level) • create view statement (to define an view at user level)

• create schema statement (to treat a database as whole unit in SQL89 &SQL2) Drop Statements:

• drop table statement (to delete the definition and all instances of the table) • drop index statement (to remove an existing index)

• drop view statement (to delete the view) • drop schema statement (to delete schema)

Schema and Catalog in ANSI-SQL Standard

SQL Schema:

• It is identified by a schema name , and includes an authorization identifier to indicate the user or account who owns the schema.

Example:

CREATE SCHEMA COMPANY AUTHORIZATION JSMITH;

• It creates a schema called COMPANY, owned by the user with authorization identifier JSMITH.

Syntax:

schema ::= CREATE SCHEMA schema-name AUTHORIZATION user

(4)

CREATE TABLE EMPLOYEE Statement

CREATE TABLE EMPLOYEE

(NAME VARCHAR2(19) NOT NULL,

SSN CHAR(9), BDATE DATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY NUMBER(10,2), SUPERSSN CHAR(9),

DNO VARCHAR(8) NOT NULL,

CONSTRAINT EMPPK PRIMARY KEY(SSN),

CONSTRAINT EMPSUPERFRK

FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE (SSN) DISABLE,

CONSTRAINT EMPDUMFRK

FOREIGN KEY (DNO) REFERENCES DEPARTMENT (DNUMBER) DISABLE);

• The constraint can be enabled by using the ALTER TABLE statement after the data is loaded into the table.

ALTER TABLE EMPLOYEE ENABLE CONSTRAINT EMPSUPERFRK;

Specifying Referential Triggered Actions

CREATE TABLE EMPLOYEE

(NAME VARCHAR2(19) NOT NULL,

SSN CHAR(9),

BDATE DATE,

ADDRESS VARCHAR(30),

SEX CHAR,

SALARY NUMBER (10,2)

CHECK SALARY BETWEEN 10000 AND 99000,

DNO VARCHAR(9) NOT NULL DEFAULT “1”,

CONSTRAINT EMPPK PRIMARY KEY (SSN),

CONSTRAINT EMPSUPERFK

FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE (SSN) ON DELETE CASCADE DISABLE);

(5)

Specifying Referential Triggered Actions

CREATE TABLE DEPARTMENT

(DNAME VARCHAR2(15) NOT NULL,

DNUMBER VARCHAR(8),

MGRSSN CHAR(9) NOT NULL DEFAULT “888665555”,

CONSTRAINT DEPTPK

PRIMARY KEY (DNUMBER),

CONSTRAINT DEPTSK UNIQUE (DNAME),

CONSTRAINT DEPTMGRFRK

FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN)

ON DELETE CASCADE DISABLE);

ALTER TABLE EMPLOYEE ADD (CONSTRAINT EMPDNOFRK

FOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) );

Data Types

SQL Data Types: (ANSI-SQL) SQL Data Types: (ORACLE)

CHARACTER(n) CHAR(n)

CHARACTER VARYING(n) VARCHAR(n)

VARCHAR2(2) NUMERIC(p,s) NUMBER(p,s) DECIMAL(p,s) INTEGER NUMBER(38) INT SMALLINT FLOAT(b) NUMBER DOUBLE PRECISION REAL DATE DATE RAW LONG LONG RAW ROWID

(6)

Data Manipulation in SQL

Data Manipulation at Base Table Level: • Query the database via select statement

• Modify data (tuples) in a table of the database via update statement • Remove data (tuples) from a table of the database via delete statement. • Append data (tuples) into a table in the database via insert statement. Data Manipulation at View (virtual table) Level:

• Query the partial database via select statement from view • Update or modify the partial data defined at the view level

mapping view update to the underlying base table • single table update

• multiple table update still has unsolved problem.

Query Database In SQL

• Querying database in SQL is done via select statement. General format of select statement:

select <attribute list> from <table list> where <condition>

• <attribute list> is a list of attribute names whose values are to be retrieved by the query.

• <table list> is a list of the relation names required to process the query. multiple tables listed in the <table list> implies join operation involved. • <condition> is a conditional (Boolean) expression that identifies the tuples to

be retrieved by the query.

<condition> specifies the selection and join operations.

<condition> can include another select statement as a subquery of nested query.

(7)

SELECT-PROJECT QUERY

Q0: Retrieve the birth date and address of the employee whose name is ‘John B. Smith’.

SQL Script for Q0:

Q0: select bdate, address from employee

where fname =‘John’ and minit = ‘B’ and lname = ‘Smith’; Relation Algebra Expression for Q0:

S

<bdate, address> (

V

fname = ‘John’ and minit = ‘b’ and lname =‘smith’(employee) Target Attribute: bdate, address

Constraint: fname =‘John’ and minit = ‘B’ and lname = ‘Smith’ Target Relation: employee

SELECT-PROJECT-JOIN QUERY

Q1. Retrieve the first and last names and addresses of all employees who work for the 'Research ' department.

select fname, lname, address from employee, department

where dname = 'Research' and dnumber = dno;

Target Attributes: fname, lname, address Constraint:

Select Condition: dname = 'Research' Join Condition: dnumber = dno Target Relations: employee, department

• This query involves one selection on department relation and a join on relations employee and department.

(8)

Q2. For every project located in 'Stafford’, list the project number, the controlling department number, and the department manager's last name, address, and birthdate.

select pnumber, dnum, lname, address, bdate from project, department, employee

where plocation = 'Stafford' and dnum = dnumber and mgrssn = ssn;

Target Attributes: pnumber, dnum, lname, address, bdate Constraints:

Select Condition: plocation='Stafford'

Join Condition: dnum=dnumber, mgrssn = ssn Target Relations: project, department, employee

• selection operation on project relation to select project tuples located in 'Stafford'.

• join with project and department relation to find the controlling department • join with department and employee relation to find manager’s information in

employee relation.

• two join operations implement two relationships in ER schema of the database, MANAGES and Controls.

Dealing with Ambiguous Attribute Names and Aliasing

Q1A: select fname, lname, address from employee, department

where department.dname = 'Research' and department.dnumber = employee.dnumber ;

• if the attribute names for department number are the same in both employee and department tables, then qualifier will be necessary in specifying a query to avoid ambiguity.

Q8. For each employee, retrieve the employee's first and last name and the first and last name of his or her immediate supervisor.

select e.fname, e.lname, s.fname, s.lname from employee e, employee s where e.superssn = s.ssn;

(9)

Discussion on Aliasing

• ambiguity will arise in the case of queries that refer to the same relation name twice.

• the above query statement declares alternative relation names of employee relation e and s.

• e and s can be imagined as two different copies of the employee relation. e represents employees in the role of supervisees

s represents employees in the role of supervisors • join and selection operations are involved. • join attributes are superssn and ssn.

the join condition e.superssn = s.ssn links the employee’s supervisor’s corresponding information such as fname and lname.

• the join condition implements the recursive relationship supervision in original ER schema.

• this is an example of one level recursion.

• a general recursive query, with unknown number of levels, can be not specified.

Query Examples

Query with PROJECT:

Q9: List all employees’ social security number. select ssn

from employee; Query with SELECT:

Q1C: Retrieve all employees’ tuples from department 5. select *

from employee where dno = 5;

(10)

Query Examples

Query with CARTESIAN PRODUCT:

Q10: List all combinations of EMPLOYEE SSN and DEPARTMENT DNAME select ssn, dname

from employee, department; Query with Retrieving Distinct Attribute Values: Q11: Retrieve the salary of every employee

select ALL salary from employee;

Q11A: Retrieve all distinct salary values select DISTINCT salary from employee;

Query Involving with Union

Q4. Make a list of all project numbers for projects that involve an employee whose last name is ’Smith’ as a worker or as a manager of the

department that controls the project. (select distinct pnumber

from project, employee, department

where lname = ’Smith’ and dnum = dnumber and mgrssn = ssn) union

(select distinct pnumber

from project, employee, works_on

where lname = ’Smith’ and pnumber = pno and essn = ssn); • the first select query retrieves the projects that involve a 'Smith' as a

manager of department that controls the project.

• the second select query retrieves the projects that involve a 'Smith' as a worker on the project.

• if several employees have the last name 'Smith', the project names involving any of them would be retrieved.

(11)

Discussion

The first part of union:

Target Attributes: pnumber Constraints:

Select Condition: lname = ‘Smith’

Join Condition: dnum = dnumber (implement relationship control) mgrssn = ssn (implement relationship manager) Target Relations: project, employee, department

The second part of union:

Target Attributes: pnumber Constraints:

Select Condition: lname = ‘Smith’

Join Condition: pnumber = pno and essn = ssn (implement M:N relationship works_on) Target Relations: project, employee, works_on

Predicate IN

• The IN predicates selects those rows for which a specified value appears in a list of constant values enclosed in parentheses or the results from a

subquery.

Q13: Retrieve the social security numbers of all employees who work on any one of the project with project number 1, 2, or 3.

select distinct essn from works_on where pno in (1, 2, 3); Result from the query:

essn 123456789 666884444 453453453 333445555

(12)

Workson Table

Predicate NOT IN

• The NOT IN predicate is true if the expression preceding the keyword IN does not match any value in the list.

Q13b: Retrieve the social security numbers of all employees who work on the project other than projects 1, 2, and 3.

select essn from works_on

where pno not in (1, 2, 3); Result from the query:

essn 333445555 888665555 987654321 987987987 999887777

(13)

Quantifier ANY/SOME

Predicate ANY /SOME:

• The ANY/SOME predicates select those rows for which a specified value appears in the results from a subquery.

Query: Retrieve the social security numbers of employees who works on some projects controlled by department 5.

select distinct essn from works_on

where pno = any (select pnumber from project where dnum = 5); • =any predicate is same as the IN predicate.

• ANSI-SQL supports both ANY and SOME predicates, even they are equivalent.

• ORACLE only supports ANY predicate not SOME.

• The difference between IN and = ANY(=SOME) predicates is that IN could be connected with a set of values but ANY(SOME) only subqueries.

Quantifier SOME and ANY

• Both SOME and ANY are designed to link a simple relational operator with a subquery that return a multi-row result.

• The sequence preceding the subquery has the following format:

{expression relational-operator quantifier} is called quantifier predicate Expression Comparison-operator Quantifier Subquery

quantity > ANY (select ... )

• The whole quantifier predicate will be applied to each row of subquery result in return.

Logical expression is true if and only if one or more rows in the subquery result satisfy the comparison.

It is false if and only if absolutely none of the subquery result rows satisfy the comparison.

(14)

Quantifier ALL

Quantifier ALL:

• The ALL predicates evaluates to true if and only if a comparison between a single value and the set of values retrieved by the subquery is true for all values retrieved by the subquery.

Query: List the names of employees whose salary is greater than the salary of all the employees in department 5.

select lname, fname from employee

where salary > all (select salary from employee where dno = 5);

• Predicate ANY, SOME, and ALL could be prefixed with any comparison operators such as { =, t!d z}

• z can be expressed by <> or != in the sql condition expression.

Discussions on Predicates IN and NOT IN

• The predicate

a IN (x, y, z) is equivalent to a = x OR a = y OR a = z select essn

from works_on

where pno = 1 or pno = 2 or pno = 3; • The predicate

a NOT IN (x, y, z) is equivalent to a <> x AND a <> y AND a<> z a NOT IN (x, y, z) is equivalent to a <> ALL (x, y, z)

select essn from works_on

where pno <> and pno <> 2 and pno <> 3; • The predicate

(15)

Nested Query (Type-N)

Q4A. Make a list of all project names for projects that involve an employee whose last name is ’Smith’ as a worker, or as a manager of the department that controls the project.

select distinct pname from project

where pnumber in (select pnumber

from project, department, employee where lname =’Smith’ and

dnum = dnumber and mgrssn =ssn) or

pnumber in (select pno

from works_on, employee

where lname = ’Smith’ and essn = ssn);

• The comparison operator IN compares a value V (here V is pnumber) with a set of (or multiset) of values V and evaluates to TRUE if V is one of the elements in V.

Decomposition of Nested Query

Subquery 1:

temp1: select pnumber

from project, department, employee

where dnum = dnumber and mgrssn =ssn and lname ='Smith' Subquery 2:

temp2: select pno

from workson, employee

where essn = ssn and lname = 'Smith'

Subquery 3:

select distinct pnumber from project

(16)

Comparison Nested and Flatten Queries

Query: Retrieve the social security numbers of employees who work on some projects controlled by department 5.

select distinct essn from works_on

where pno = (select pnumber from project where dnum = 5); Equivalent Query:

select essn

from works_on, project

where dnum = 5 and pno = pnumber ;

• The first implementation by using subquery can avoid join operation. • The second implementation has to use join operation where

pno = pnumber is the join condition or join path.

Correlated Nested Query (Type-J)

Q12. Retrieve the name of each employee who has a dependent with the same first name and same sex as the employee.

select e.fname, e.lname from employee e

where e.ssn in (select essn from dependent where essn = e.ssn and

sex = e.sex and

e.fname = dependent_name); • The where clause of inner query block contains join predicates that

references the table of an outer query block (and the table is not included in the from clause of the inner query block).

• essn = e.ssn correlates the current dependent tuple with the corresponding employee the dependent belongs to.

• sex = e.sex and e.fname = dependent_name checks the equivalence of sex and fname values between employee and dependent tuples.

(17)

Rule for Subqueries and Nested Queries

1. The subquery should be enclosed within parentheses.

2. Subqueries may contain nested subqueries. When subqueries are nested, SQL evaluates them from the inside out.

a. The innermost query is processed first

b. Then the result of query is passed to the next outer query.

3. In general, we might have several levels of nested queries, the ambiguity among attribute names will be possible if attributes of the same name exist, one in a relation in the from-clause of the outer query, and the other in a relation in the from-clause of the nested query (inner query).

The rule is that a reference to an unqualified attribute refers to the relation declared in the innermost nested query.

4. Column name in a subquery are implicitly qualified by the table name in the FROM clause of the subquery (that is the FROM clause at the same level). 5. A subquery may refer only to column names from tables which are named in

outer queries or in subquery’s own FROM clause.

A subquery may not access tables which are used only by a child query. 6. When a subquery is one of the two operands involved in a comparison, the

subquery must be written as the second operand.

Query with Exists Function

Q12B: Retrieve the name of employee who has a dependent with the same first name and same sex as the employee.

select e.fname, e.lname from employee e

where exists (select *

from dependent where essn = e.ssn and

sex = e.sex and

(18)

The Exists Function in SQL

• exists and not exists in SQL is used to check whether the result of a

correlated query is empty.

• exists and not exists in SQL are usually used in conjunction with a

correlated nested query.

• In the example 12, the nest query within the exists function references the

ssn, fname, and sex attributes of employee relation from the outer query.

• For each employee tuple, evaluate the nested query, which retrieves all

dependent tuples with the same social security number ssn, sex and name as the employee tuple.

if at least one tuple exists in the results of the nested query, then select that employee tuple.

In general,

exists(Q) returns TRUE if there is at least one tuple in the result of query Q and returns FALSE otherwise.

not exists(Q) returns TRUE if there are no tuples in the result of query Q and returns FALSE otherwise.

Query with Not Exists Function

Q6: Retrieve the names of employees who have no dependents. select fname, lname

from employee

where not exists (select *

from dependent where ssn = essn);

• The correlated nested query retrieves all dependent tuples related to an

employee tuple, if none exist, the employee tuple is selected.

• For each employee tuple, the nested query selects all dependent tuples

whose essn value matches the employee ssn.

• If the result of the nested query is empty then no dependents are related to

the employee, so that employee tuple is selected and its fname and lname are retrieved.

(19)

Nested Query with Two Exists Function

Q7. List the names of managers who have at least one dependent. select fname, lname

where exists (select *

from dependent where ssn = essn) and exists (select * from department where ssn = mgrssn);

• the first nested query selects all dependent tuple related to an employee • the second nested query selects all department tuples managed by the

employee tuple.

• if at least one of the fist one and at least one of the second exist with the

same ssn, the employee tuple is selected and the fname and lname are retrieved.

• this is the implementation of intersection operation.

Query with Division (use contains)

Q3. Retrieve the name of each employee who works on all the projects controlled by department 5.

select fname, lname from employee where ((select pno

from works_on where ssn = essn) contains (select pnumber from project where dnum = 5));

• the second nested query which is not correlated to the outer query retrieves

the project numbers of all projects controlled by department 5.

• for each employee tuple, the first nested query, which is correlated, retrieves

the project numbers on which the employee works; if these contain all projects controlled by department 5, the employee tuples is selected and the name of that tuple is retrieved.

(20)

Query with Division

Q3: Retrieve the name of each employee who works on all the projects controlled by department 5.

select fname, lname from employee e where not exists

( (select pnumber from project where dnum = 5) minus (select pno from workson w

where e.ssn = w.essn) )

Query with Division

Q3: Retrieve the name of each employee who works on all the projects controlled by department 5.

select fname, lname from employee where not exists (select *

from workson b

where (b.pno in (select pnumber from project where dnum = 5)) and

not exists (select *

from workson c where c.essn = ssn and

(21)

Discussion

• The outer nested query selects any works_on (b) tuples whose pno is of a

project controlled by department 5 and there is not a works_on (c) with the same pno and the same ssn as that of the employee tuple under

consideration in the outer query.

if no such tuple exists, we select the employee tuple, and retrieve the fname and lname of that employee tuple.

the equivalent interpretation of the query script is as follows:

there does not exist a project controlled by department 5 that the employee does not work on.

equivalently,

select each employee who works on all the projects controlled by department 5.

Renaming Attributes and Join Tables

Q8a: Retrieve the last name of each employee and his or her supervisor, while renaming the resulting attribute names as employee_name and supervisor_name.

select e.lname as employee_name, s.lname as supervisor_name from employee as e, employee as s

where e.superssn = s.ssn;

Q1a: Retrieve the names of the employees who work for ‘Research’ department.

select fname, lname, address

from (employee join department on dno = dnumber) where dname = ‘Research’;

(22)

Natural Join, Outer Join, and Nested Join

Q1b: select fname, lname, address from (employee natural join

(department as dept(dname, dno, mssn, msdate) where dname = ‘Research’;

Q8b: Retrieve the last names of all employees and his or her supervisor if these employees have a supervisor.

select e.lname as employee_name, s.lname as supervisor_name from (employee e left outer join employee s

on e.superssn = s.ssn);

Q2A: select pnumber, dnum, lname, address, bdate

from ((project join department on dnum = dnumber) join employee on mgrssn = ssn)

where plocation = ‘Stafford’;

Outer Join in ORACLE

Q8b: Retrieve the last names of all employees and his or her supervisor if these employees have a supervisor.

select e.lname as employee_name, s.lname as supervisor_name from employee e, employee s

where e.superssn = s.ssn (+);

• This is equivalent to that the employee table as the role of employee left outer joins the employee table as the role of supervisor.

Q8c: Retrieve the last names of all employees and his or her supervisees if these employees have a supervisee.

select s.lname as employee_name, e.lname as supervisor_name from employee s, employee e

where s.ssn = e.superssn (+);

• This is equivalent to that the employee as the role of supervisor left outer joins the employee table as the role of supervisee.

(23)

Aggregation Functions

Aggregate Functions:

• It takes an entire column as an argument and compute a single value based on the contents of the column.

• The function result is an “aggregate” of the individual data values in the rows of the column.

Q15’: Find the total number of employees in the company, the sum of the salaries of all employees, the maximum, the minimum, and the average salary.

select count(*), sum(salary), max(salary), min(salary), avg(salary) from employee;

• count(*) is applied to count the total number of tuple from employee tuple. • sum(), max(), min(), and avg() functions is applied to salary column value of

the tuples in employee table.

Q16’: Find the total number of employees of the ‘Research’ department, as well as the summation of the salaries, the maximum salary, the minimum salary, and the average salary in this department. select count(*), sum(salary), max(salary), min(salary), avg(salary) from employee

where dno = dnumber and dname = ‘Research’;

• all the aggregation functions, count(), sum(), max(), min(), and avg() are applied to these employee tuples from ‘Research’ department.

• the constraints dno = dnumber and dname = ‘Research’ in where clause are evaluated first before aggregate functions are evaluated.

Q19: Count the number of distinct salary values in the database. select count (distinct salary)

(24)

Q5: Retrieve the names of all employees who have two or more dependents Incorrect one: select lname, fname

from employee where (select count(*)

from dependent where ssn = essn ) >= 2;

• when a subquery is one of the two operands involved in a comparison, the subquery must be written as the second operand.

Correct one:

select lname, fname from employee

where 2 <= (select count(*)

from dependent where ssn = essn );

Group By Clause

• In many cases, we want to apply aggregate functions to subgroups of tuples

in a relation based on some attribute values. Example:

Find the average salary of employees in each department find the number of employees who work on each project.

• In these cases, we want to group the tuples have the same value of some

attribute(s), called the grouping attribute(s), and apply the function to each such group independently.

• SQL has a group by clause for this purpose.

• The group by clause specifies the grouping attributes, which must also

appear in the select clause, so that the value of applying each function on the group of tuples appears along with the value of the grouping attribute(s).

(25)

Group by Clause

Q20: For each department, retrieve the department number, the number of employees in the department, and their average salary.

select dno, count(*), avg(salary) from employee

group by dno;

Q21: For each project, retrieve the project number, the project name, and number of employees who work on that project.

select pnumber, pname, count(*) from project, works_on

where pnumber = pno group by pnumber, pname;

• the grouping and aggregate functions are applied after the joining of the two

relations.

Having Clause

Q22. For each project on which more than two employees work, retrieve the project number, project name, and number of employees work on that project.

select pnumber, pname, count(*) from project, workson

where pnumber = pno group by pnumber, pname having count(*) > 2;

• SQL provides a having clause, which can appear only in conjunction with

group by clause

• having provides a condition on the group of tuples associated with each

value of the grouping attributes, and only the groups that satisfy the condition are retrieved in the result of the query.

• selection condition in the where clause limits the tuples to which group

function are applied.

(26)

Q23. For each project, retrieve the project number, project name, and number of employee from department 5 who works on that project

select pnumber, pname, count(*) from project, workson, employee

where pnumber = pno and ssn = essn and dno = 5 group by pnumber, pname;

Q5. Retrieve the name s of all employees who have two or more dependents.

select lname, fname from employee

where ssn in (select essn from dependent where ssn = essn group by essn

having count (essn) >= 2);

Where Condition before Having

Q24. Count the total number of employees with salaries greater than $40,000 who work in each department, but only these department with more than five employees.

select dname, count(*) from department, employee

where dnumber = dno and salary > 40000 group by dname

having count(*) > 5;

• this is not the correct query statement.

• selection condition (salary > 40000) has eliminated these employee tuples

whose salary <= 40000 before the group by and having clauses.

• it will select only departments that have more than five employees who each

earns more than $40,000.

• the rule is that the where clause is executed first to select individual tuples;

the having clause is applied later to select individual groups of tuples.

• the tuples are already restricted to employees earning more than $40,000

(27)

The correct one:

select dname, count(*) from department, employee

where dnumber = dno and salary > 40000 and dno in (select dno

from employee group by dno

having count(*) > 5) group by dname;

• the constraints dnumber = dno and salary > 40000 in where clause join the department tuples with employee tuples whose salary is greater than 40000. • the subquery which includesfive employees work.

Having Clause

• HAVING clause is designed for use in conjunction with GROUP BY when it is desired to restrict the groups which appears in the final result.

• HAVING conditions often involve aggregation functions, permitting the filtering of groups based on summary calculations.

• Aggregation functions may not be used within a WHERE clause.

• WHERE clause filters individual rows going to the final result or intermediate result.

• HAVING filters groups going into the final result.

• WHERE and HAVING may be used together cooperatively:

WHERE is applied first to filter single rows, then group are formed from the rows which remain, then finally the HAVING clause is applied to filter the groups.

(28)

Summary of GROUP BY/HAVING Clauses

1. Attribute names or column names not listed in the GROUP BY clause may not appear in the HAVING condition in ANSI-1989 and ANSI-1992 SQL.

2. Aggregation functions may always be used in the HAVING clause, even if they do not appear in the SELECT attribute list.

3. The HAVING condition can involve compound conditions formed by

combining simple logical expressions with the logical operators AND, OR, and NOT.

4. HAVING and WHERE can work together.

• HAVING condition is always applied to GROUP BY Clause.

• WHERE condition is always applied to attributes involved in selection or join. 5. Non-aggregation expression may be used in the HAVING clause, providing the

expressions involve only columns which are named in the GROUP BY clause.

Syntax Structure of SELECT Statements

SELECT <attribute list> FROM <table list> [WHERE <condition>]

[GROUP BY <grouping attribute(s)>] [HAVING <grouping condition>] [ORDER BY <attribute list>]

• SELECT clause lists the attributes or functions to be retrieved.

• FROM clause specifies all relations needed in the query but not those in nested query.

• WHERE clause specifies the conditions for selection of tuples from these relations.

• GROUP BY specifies grouping attribute(s), whereas HAVING clause specifies a condition on the groups being selected rather than on the individual tuples. • The built in aggregation functions COUNT, SUM, MIN, MAX, and AVG are

used in conjunction with grouping. • ORDER specifies an order

(29)

Sequence

1. FROM: The FROM clause is processed first. It specifies the table(s) or views which serve as the source of all data for the final result. If multiple tables are involved, the join operation is necessary.

2. WHERE: The WHERE clause is processed second. It eliminates those rows defined in FROM clause which do not satisfy the search condition.

3. GROUP BY: The GROUP BY clause groups the remaining rows on the basis of shared values in the GROUP BY column(s). The partial result now has the form of a set of groups.

4. HAVING: The HAVING clause is now applied to eliminate those groups which do not satisfy the HAVING condition.

5. SELECT: The SELECT list is used to remove unwanted columns or attributes from the partial result. Only elements which appear in the SELECT list remain.

6. ORDER BY: The final result in the order based on ORDER BY list.

Insert Statement in SQL

Insert Statement:

Insert a new tuple into employee table: insert into employee

values (’Richard’, ’K’, ’Marini’, ’653298653’, ’30-DEC-52’, ’98 Oak Forest, Katy, ‘TX', 'M', 37000, '987654321', 4);

insert into employee(fname, lname, ssn) values (‘Richard’, ‘Marimi’, ‘653298653’);

• Attributes that are not specified in the insert statement are set to their DEFAULT or to NULL if the attributes are defined with DEFAULT or NULL. • The insert operation will be rejected if NOT NULL has been specified for

(30)

Insert a set of tuples into a table:

• create a relation and load it with result of a query.

create table depts_info (deptname vchar(15), noofemps integer, totalsal integer);

insert into depts_info (deptname, noofemps, totalsal) select dname, count(*), sum(salary)

from department, employee where dnumber = dno group by dname;

Delete Statement in SQL

Delete a tuple:

to delete the employee tuple with lname ‘Brown’ delete from employee

where lname = ‘Brown’; Delete a set of tuples:

to delete the employee tuples from ‘Research’ department delete from employee

where dno in (select dnumber from department

where dname = ‘Research’); To delete all the tuples in employee table:

(31)

Update Statement in SQL

Update a single tuple:

to change the location and controlling department number of project number 10 to ‘Bellaire’ and 5.

update project

set plocation = ‘Bellaire’, dnum = 5 where pnumber = 10;

Update a set of tuples in a table:

to raise the salary of employees from ‘Research’ department by 10%. update employee

set salary = salary * 1.1

where dno in (select dnumber from department

where dname = ‘Research’);

Views in SQL

View:

• It is a single table is derived from other tables, these other tables can be base tables or previously defined views.

• A view does not necessarily exist in physical form, it is considered as a virtual table in contrast to base tables whose tuples are actually stored in the database.

Advantages and Disadvantages of View:

• The advantage is that a frequent query involving with join operations can be represented. Queries involving join operations do not have to do join operations every time by querying the view.

• The disadvantage is that the possible update operations applied to views are limited.

(32)

Specification of Views in SQL

Create a view on fname, lname, pname, hours V1: create view works_on1

as select fname, lname, pname, hours from employee, project, works_on where ssn = essn and pno = pnumber; works_on1:

V2: create view dept_info (dept_name, no_of_emps, total_sal) as select dname, count(*), sum(salary)

from department, employee where dnumber = dno group by dname; dept_info

dept_name no_of_emps total_sal fname lname pname hours

Querying on View

QV1: To retrieve the last name, first name of all employees who work on ‘ProjectX’

select pname, fname, lname from works_on1

where pname = ‘ProductX’;

• A view is always up to date, if we modify the tuples in the base tables which

the view is defined, the view automatically reflects these changes.

• The view is not realized at the time of view definition but rather at the time we

specify a query on the view.

• It is the responsibility of the DBMS and not the user to make sure that the

view is up to date.

• If the view is no longer useful, then view can be disposed by drop command.

V1d: drop view works_on1;

(33)

Updating in Views

Single Table View Update:

An update on a view defined on a single table can be mapped to an update on the underlying base table.

Multi Table View Update:

An view involving joins, an update operation may be mapped to update operations on the underlying base relations in multiple ways.

Suppose there is a view update the PNAME attribute of ’John Smith’ from ’ProductX’ to ’ProductY’.

UV1: update works_on1

set pname = ’ProductY’

where lname = ’smith’ and fname = ’john’ and pname =’ProductX’

this query can be mapped into several updates on the base relations to give the desired update on the view.

• There are two possible update (a) and (b) on the base relations corresponding to UV1.

(a). update works_on

set pno = (select pnumber from project

where pname ='ProdcutY') where essn = (select ssn

where lname = 'Smith' and fname ='John') and pno = (select pnumber

from project

where pname ='ProductX') (b). update project

(34)

Discussion

• Update (a) relates "John Smith’ to the ’Product Y’ project tuple in place of the

’Product X’, and is the most likely to desired updated.

• Original update changes the project name pname in works_on1 view, it is

unlikely that the update wants to change the PNAME itself, the semantics here is to update the project that ’John Smith’ works on.

• So the update (a) will update the correspondent project number where

PNAME = ’Product Y’ in works_on base table.

• Update (b) would also give the desired updated effect on the view, but it

accomplishes this by changing the name of of the ’Product X’ tuple in the project relation to ’Product Y’.

It is quite unlikely that the user who specified the view update UV1 wants to update to be interpreted as in update (b).

Observation

• A view with a single defining table is updatable if the view attributes contain

the primary key or some other candidate key of the base relation, because this maps each (virtual) view tuple to a single base tuple.

• Views defined on multiple tables using joins are generally not updatable. • Views defined using grouping and aggregate function are not updatable.

Example:

UV2: modify dept_info

set total_sal = 100000 where dname = ’Research’;

• A view update is feasible when only one possible update on the base

relations can accomplish the desired update effect on the view.

• Whenever an update on the view can be mapped to more than one update on

the underlying base relations, we must have a certain procedure to choose the desired update.

• some researchers have developed methods for choosing the most likely update.

• while other researchers prefer to have the user choose the desired update mapping view definition.

(35)

Specifying Additional Constraints as

Assertions

• To specify the constraint “The salary of an employee must not be greater than the salary of the manager of the department that employee works for. create assertion salary_constraint

check ( not exists ( select *

from employee e, employee m, department d where e.salary > m.salary

and e.dno = d.dnumber and d.mgrssn = m.ssn) );

• if tuples in the database cause the condition of an Assertion statement to evaluate to be FALSE, the constraint is violated.

Specifying Index in SQL

Specifying index on single attribute: I1: create index lname_index

on employee (lname ); Specifying index on multiple attributes: I2: create index names_index

on employee (lname asc, fname desc, minit); Specifying index on the attribute with unique value:

I3: create unique index ssn_index on employee(ssn); Specifying cluster index:

I4: create index dno_index on employee (dno) cluster;

(36)

Cluster in ORACLE

create cluster deptandemp (deptemp varchar(9) ); create table department

( dname varchar(19), dnumber varchar(9), ...

)

cluster deptandemp (dnumber) ;

create table employee

( name varchar(19), ...

dno varchar(9), )

cluster deptandemp (dno) ;

Discussion on Index

• The reseason and motivation for index is to support efficient search and maintenance.

Advantages:

Indices support binary search

Indices support dynamic maintenance Disadvantages:

It costs extra memory space.

Algorithms to support indices are more complex.

• Key work unique can be used to enforce the key constraint.

The reason behind linking the definition of a key constraint with specifying an index is that it is much more efficient to enforce uniqueness of key values on a file if an index is defined on the key attribute, since the search on index is much more efficient .

• A clustering and unique index is similar to primary index. • A clustering and non-unique index is similar to cluster index. • A nonclustering index is similar to secondary index.