Relational Databases and SQL
○ Basic Concepts
○ The 3rd normal form
○Structured Query Language (SQL)
Graça Abrantes
The process of database development
Reality (Universe of Discourse)
Conceptual model (e.g. Entity-Relationship model)
2
Logical model (e.g. Relational model)
Implementation
Logical model: relational DBMS
Relational DBMS are based on a set of theoretical concepts introduced on 1970 by E. F. Codd. p y
Advantages of relational DBMS:
– simplicityof concepts
– the concepts have formal definitions
• quick learning
• generally adopted by different software development companies
3
– they support adequately the representation of most characteristics of reality
• entities, objects, phenomena
• relationships among them
Relation
Relationis the basic concept of relational databases
A relation is defined byy o one schemaand o one table
A schema is defined by:
o the relation name o the name of each attribute o thedatatypeof eachattribute
4
o the datatype of each attribute o ...
A relational database is a set of relations
o thedatabase schema is the set of relation schema of every database table
Table – example:
Table and schema - examples
Relation schema:
soil (FID, Shape, AREA, PERIMETER, CODSOLO, NOME, SUBNOME, ESPECIFI)
5
SUBNOME, ESPECIFI)
Other examples of relation schema:
Camping (code, name, capacity, owner) River (FID, Shape, length, code, name, type)
Attributes
An attribute defines a property of an object, entity, phenomena
phenomena
An attribute Aitakes values in a set Di, called the attribute domain
• the domain specifies the set of values that the attribute can take
Given U={A A A } a relation R over U is a
6
Given U={A1, A2,...,An}, a relation R over U is a subsetof the cartesian product D1x D2x ... x Dn.
Every tuple of this cartesian product is called an instanceof relation R.
Table
The set of instances of relation R is a tablein which - the instancesare also called rowsor records
7
the instancesare also called rowsor records
o orgeographic objects in GIS
- the attributesare also called columnsor fields
Remarks
the values of one attribute belong all to the same domain;
one attribute value must be atomic;
one relation cannotcontain equal instances;
the orderby which the instances are in a table is meaningless;
instances may have some attributes without value; in this case the attribute is called optional; when, given an instance, an optional attribute does not have one value, we say that its value isnull;
8
value is null;
the names (or identifiers) of attributes belonging to the same relation schema must be unique inside this schema
Key(s) of a relation
A set of attributes that takes different values on every instance is a primary keyof the relation
• th l f i k i l ifi i t ithi
• the value of a primary key uniquely specifies an instance within a table
• a composite key is a key made up of two or more attributes
Within a relation, the primary key is the minimum subset of attributesby which every instance are uniquely specified
• natural attributes are sometimes good primary keys
• often an artificial attribute is assigned to an object in order to
9
• often, an artificial attribute is assigned to an object in order to identify it uniquely; this kind of keys have no intrinsic meaning but they are useful to uniquely identify every instance of the relation
• for instance, in a table of data about students at a school, they might all be assigned a student ID
• one table may have several candidate keys but it must have one and only one primary key
Primary keys and Foreign Keys
On relational DBMS, the relationships among entities are represented in tables using common attributes.
i k f
– Example: Primary key of
table soilDiss$
Primary key of table soil
Foreign key in table soil
10
A foreign key is an attribute (or a set of attributes) in a relation that matches the primary key of another relation.
Relational model: 1st normal form
The value of each attribute is atomic.
E l i t d f
Example: instead of
use the 1NF
name type border
Tejo major maritime, land Mondego major maritime
11
name type maritime border
land border
Tejo major Yes Yes
Mondego major Yes No
• Given a relation R, an attribute Aiin R is said to functionally determine another attribute Aj also
Relational model – Functional dependency
functionally determine another attribute Aj, also in R, (written Ai→ Aj) if, and only if, each Ai value is associated with precisely one Ajvalue.
Functional dependency
12
Relational model: 2nd normal form
A table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any proper p p y p p subset of any candidate key of the table.
• a non-prime attribute of a table is an attribute that is not a part of any candidate key of the table
• a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of a candidate key
13
whole of a candidate key.
• when a 1NF table has no composite candidate keys, the table is automatically in 2NF
• the general case concerning GIS tables
A table is in 3NF if and only if it is in 2NF and no non-prime attribute is dependent on any other non-
Relational model: 3rdnormal form
prime attribute.
• For instance, table conc_1998 is not in 3NF
Functional dependency
14
Normalization
• A table is in 3NF if every non-key attribute provides a fact about the key, the whole key, and nothing but the key.
• Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining
relationships between them.
• Database normalization is the process of organizing the attributes and tables of a relational database to minimize
15
redundancy and dependency
• the objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships
Example: 2FN 3FN
Example: given the following table verifying 2NF
to replace it with 2 tables:
16
Example:
Suppose that you have to design a database schema concerning remarkable trees in order to be stored as g vector data in a GIS.
For each tree it is required to record its common name, scientific name, family, height and trunk diameter.
In the study area there is around one thousand remarkable trees and several species are represented
17
p p
by more than one tree.
What relation schema best fits these requirements?
Structured Query Language (SQL)
SQLis a standard programming language (ANSI - 1986) for managing data held in a relational 1986) for managing data held in a relational DBMS
• data insert
• query
• update and delete
18
• schema creation and modification
• data access control
The SELECTstatement
The most common operation in SQL is the query, which is performed with the SELECT statement.
SELECT retrieves data from one (or more) tables. Standard SELECT statements have no persistent effects on the database
• the SELECT statement can have persistent effects when an output table is also mentioned to store the SELECT statement result
The SELECT statement retrieves a subset of a given table
19
according to a given logical condition
• a logical condition is an expression that returns either TRUE or FALSE
• the SELECT statement retrieves the instances for which the given logical condition returns TRUE
SELECT By Attributes (ArcMap)
20
The WHERE clause
SELECT attribute1, attribute2, .... (or *) FROM table1, table 2, ...
WHERE condition;
this statement retrieves the instances that hold the condition
•elementary conditions are built using relational operators (<,
<=, >, >=, =, <>) and other operators (IN LIKE
21
operators (IN, LIKE, BETWEEN)
•conditions may also be built using elementary conditions and logical operators (NOT, AND, OR).
The WHERE clause (examples)
select * from soil where “CODE” >=800
select * from soil
where “CODE” >400 and“CODE” <=700
select * from soil
where not(“CODE” >400 and“CODE” <=700)
select * from soil
where “CODE” <=400 or“CODE” >700) (1) Remark: this statement is equivalent to the previous statement
select * from NUTSII where “NAME”like‘A%’
22
select from NUTSII where NAME like A%
select * from rivers
where "TYPE" =‘major' or
“description" in( 'land border' , ‘maritime border' )
select * from NUTSII where “name” =‘’
The WHERE clause in GIS
In GIS the condition of a SELECT statement may also use spatial operatorssuch as intersect, are within a distance of, contain, are within, touch the boundary of, ...
23
Joining tables
The FROM clause specifies the names of the tables containing the records to select.
When the FROM clause refers to more than one table, the SELECT statement executes the cartesian product (joining) of the referred tables
• each tuple of the cartesian product is composed of one row from each of these tables.
The WHERE clause of a SELECT statement that
24
The WHERE clause of a SELECT statement that joins tables is used to constrain the resulting subset of the cartesian product
• those tuples in which a foreign key value is equal to a primary key value.
SELECT *
FROM NUTS_1998, AddedValue Example (joining tables):
WHERE DTCC=code;
where code is the primary key of table AddedValue and DTCC is a foreign key of table NUTS_1998
25
Another example:
Suppose that you want to create a GIS vector theme concerning agricultural cultures in a given region.
For each land parcel it is required to record (i) the species common name For each land parcel, it is required to record (i) the species common name, (ii) the species scientific name, (iii) the average yield of each species in the region, (iv) the seeding or planting dates and (v) the parcel area.
The study area is a region with a high level of parcel disaggregation.
1. What GIS data structure best suits these requirements?
2. Design a suitable database schema.
3. Explain the main advantages of your schema.
4. Suppose that it is necessary to get information concerning the first
26
pp y g g
seeding or planting date of each species in the region. How to do it?