Often an enterprize or problem domain consists of data that can be organized on multiple attributes (multiple organizations).
This classroom has people with different roles {student,
professor}, associations {major, departments}, courses {sections and classes}
In a single program collections can be used to "manage" data objects (references to data objects).
For example, the CompSci department could use:
a TreeMap to hold professors data with their names "Barnes" as keys.
a HashMap to hold employment data using an employee id as a key
an ArrayList to hold professors that serve on a committee. The three collections hold references to the same data objects
Data w/ multiple organizations
All dynamic objects are "reference" variables – their "value" is a pointer / reference to their memory location.
Barnes 01010101 Computer Science renzo@csun.edu Noga 111110000 Computer Science noga@csun.edu Barkataki 90909090 Computer Science shan@csun.edu
Retrieval operations on integrated is easy. Access by key retrieval with the TreeMap or HashMap or search with the Arraylist accesses data object and "getter" methods can be invoked.
Retrievals do not affect the organizations.
What happens with insertion or deletions operations that update the organization?
All collections must be kept in sync. Insert / delete into one requires associated insert / delete into all.
The program must "know" and maintain the "semantics" of the
associations among the data records. The relationships exist and are defined by the collections.
There is no independence between data and management – between storage and application. Soln: Database Management Systems.
Databases
Database = collection of data organized for access, updated and managed.
Information is the interpretation of data… Database:
collection of tables (relations), files, or datasets
table is a collection of fields (columns of data items) attributes = table columns – domain of values tuples = table rows – attribute values of an entity
primary key = field(s) that uniquely selects table row foreign key = fields in one table that are primary keys
in another table.
no duplicates of tuples or attributes
order of rows and columns is not important
Relation && Relationships
RelationName (attribute0, attribute1, … attributen) key = attribute
profs ( pid, name, office)
section ( snum, profID, studentID, class) students( sid, name, major)
deptFacutly( dept, profID)
Relationships exist between tables w/ shared values
profs ( pid, name, office)
students( sid, name, major) section ( snum, profID, studentID, class)
teaches
attends
deptFacutly( dept, profID)
These notes cover SQL (mySQL examples) and sqlite3. The sqlite3 shell program can be used to test sqlite3 queries. We will use an
"elections" database of USA Presidential elections since 1960.
A Windows "dos command prompt" window (console session) for sqlite3 is on the next page
java application JDBC
SQLite DB
sqlite-jdbc connector Use posted SQLite – Java
Installation page notes to
download, configure, and test your sqlite3 – Java JDBC installation. SQLite (sqlite3) is a
zero-installation, serverless, SQL
compliant open source database. It is widely used in common
C: sqlite3 // run sqlite
SQLite version 3.8.4.1 2014-03-11 15:27:36 Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open elections.db // creates empty elections.db
sqlite> .mode columns // set output mode to columns
sqlite> .mode tabs // set input separator to tabs
sqlite> .read createTables.txt // enable tables from file
sqlite> .schema
CREATE TABLE President( // use blanks if not .mode tabs
Name text not null, Party text not null, Year int not null, primary key (Name, Year));
// ... shows VicePresident and Vote tables also ...
// the relations, tables have been defined, next load the data
sqlite> .import President.txt President
sqlite> .import VicePresident.txt VicePresident sqlite> .import Vote.txt Vote
sqlite> .tables
President VicePresident Vote sqlite3 session
// count: how many entries are there for President ? sqlite> select count(*) from President;
48
// select: who ran for President in 2000 ?
sqlite> select Name, Year, Party from President where Year = 2000;
Albert Gore 2000 Democratic Ralph Nader 2000 Green
Harry Browne 2000 Libertarian Patrick Bucannan 2000 Reform George W Bush 2000 Republican
SQL
SQL ("ess-que-el") ≡ Structured Query Language.
SQL is used to communicate with a relational database.
SQL statements are used to perform tasks :update or retrieve Relational database management systems that use SQL:
Oracle, Sybase, Microsoft SQL Server, Access, MySql, Ingres … Use SQL and have proprietary extensions
SQL commands:
DML – Data Manipulation Language: SELECT, INSERT, UPDATE, DELETE, and other commands to load, read, data.
DDL – Data Definition Language: CREATE TABLE,
ALTER TABLE, and others can change the database's design. DCL – Data Control Language: GRANT, REVOKE, set privilege.
The elections mySql (and sqlite elections.db) database has 3 tables President(Name, Year, Party )
VicePresident ( Name, Year, Party ) Vote ( Party, Year, Percent )
mysql> SHOW DATABASES; +---+ | Database | +---+ | information_schema | | elections | | mysql | +---+ 3 rows in set (0.00 sec)
mysql> USE elections; Database changed
sqlite and mySQL are case insensitive SQL commands shown uppercase sqlite commands shown lowercase
sqlite shell has its own commands (lowercase only)
".<cmd>"
mysql> SHOW TABLES; +---+ | Tables_in_elections | +---+ | President | | VicePresident | | Vote | +---+ 3 rows in set (0.00 sec)
mysql> DESCRIBE President;
+---+---+---+---+---+---+ | Field | Type | Null | Key | Default | Extra | +---+---+---+---+---+---+ | Name | varchar(30) | NO | PRI | | | | Party | varchar(20) | YES | | NULL | | | Year | int(4) | NO | PRI | 0 | | +---+---+---+---+---+---+ 3 rows in set (0.00 sec)
sqlite> .table
President VicePresident Vote sqlite> .schema President
CREATE TABLE President(
Name text not null, Party text not null, Year int not null, primary key (Name,
The schema of tables below is shown with tabs and line feeds, as I
created the elections.db from a text files (createTables.txt, President.txt, VicePresident.txt and Vote.txt). I had .mode tabs set so the data values are tab separated.
sqlite> .schema
CREATE TABLE President(
Name text not null, Party text not null, Year int not null, primary key (Name, Year)); CREATE TABLE VicePresident(
Name text not null, Party text not null, Year int not null, primary key (Name, Year)); CREATE TABLE Vote(
Party text not null, Percent int not null, Year int not null,
SELECT column1 [,"column2", ...] FROM tablename
[WHERE condition]; [] = optional
SELECT
mysql> SELECT * FROM President; +---+---+---+ | Name | Party | Year |
+---+---+---+ | John Kennedy | Democratic | 1960 |
| Richard Nixon | Republican | 1960 | | Lyndon Johnson | Democratic | 1964 |
...
| Barak Obama | Democratic | 2008 |
| John McCain | Republican | 2008 | | Ralph Nader | Independent | 2008 |
| Bob Barr | Libertarian | 2008 | +---+---+---+
Note when using SQL commands (not sqlite shell commands) there is no "." before the command and the command terminates with a ";"
sqlite> select * from President;
Name Party Year
--- --- ---Barack H. Obama Democratic 2012
Jill Stein Green 2012 Gary Johnson Libertarian 2012 Willard Mitt Romney Republican 2012 ....
Lyndon Johnson Democratic 1964 Barry Goldwater Republican 1964 John Kennedy Democratic 1960 Richard Nixon Republican 1960
sqlite> select Name, Year from President order by Year; Name Year --- ---John Kennedy 1960 Richard Nixon 1960 Lyndon Johnson 1964 Barry Goldwater 1964 ... Barack H. Obama 2012 Jill Stein 2012 Gary Johnson 2012
Willard Mitt Romney 2012
Various sort orders can be used depending on column's data type.
mysql> SELECT Name, Year, Party FROM President WHERE Year = '2000';
+---+---+---+ | Name | Year | Party | +---+---+---+ | Albert Gore | 2000 | Democratic | | Ralph Nader | 2000 | Green | | Harry Browne | 2000 | Libertarian | | Patrick Bucannan | 2000 | Reform | | George W Bush | 2000 | Republican | +---+---+---+ 5 rows in set (0.00 sec)
mysql> SELECT COUNT (Year) FROM President; +---+ | COUNT(Year) | +---+ | 50 | +---+ 1 row in set (0.00 sec)
sqlite> select Name, Year, Party from President ...> where Year = 2000;
Name Year Party
--- ---
---Albert Gore 2000 Democratic Ralph Nader 2000 Green
Harry Browne 2000 Libertarian Patrick Bucannan 2000 Reform
George W Bush 2000 Republican sqlite> select count (*) from President;
count (*)
---48
The elections database for the mySQL examples had data errors. The count for the sqlite election.db is "more" correct.
JOIN: Return rows when there is at least one match in both tables An INNER JOIN is the same as a JOIN.
OUTER JOIN: return rows even when there are no matches through the JOIN critieria on the second table
LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table. A type of "outer join".
RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table A type of "outer join"
FULL JOIN: Return rows when there is a match in one of the tables A type of "outer join"
CROSS JOIN: Cartesian product of the sets of rows from the joined tables. There is no "on condition".
SQLite has three joins: cross, inner (default join), and left outer
Cross joins can return large tables the product of table attributes
sqlite> select count (*) from President cross join Vote; count (*)
---2304
President has 48 rows, Vote has 48 rows (2304 = 48 * 48)
Does join, inner join, and left outer join give the same result for the above query with the election.db? Try it.
Query results can create "new tables" that are joins of existing tables. WHERE clause specifies the values to join tables on.
mysql> SELECT President.Name, VicePresident.Name
FROM President, VicePresident
WHERE President.Year = VicePresident.Year AND President.Party = VicePresident.Party; +---+---+
| Name | Name | +---+---+ | John Kennedy | Lyndon Johnson | | Richard Nixon | Henry Lodge | | Lyndon Johnson | Hubert Humphrey | | Barry Goldwater | William Miller |
| George Wallace | Curtis LeMay |
...
PresidentName and VicePresidentName might be better attribute labels, wrt to queries, or use an alias
sqlite> select President.Name as 'President', ...> VicePresident.Name as 'Vice President' ...> from President, VicePresident
...> where President.Year = VicePresident.Year ...> and President.Party = VicePresident.Party; President Vice President
--- ---Barack H. Obama Joseph R. Biden
Jill Stein Cheri Honkala Gary Johnson James P. Gray Willard Mitt Romney Paul Ryan
...
John Kennedy Lyndon Johnson Richard Nixon Henry Lodge
DISTINCT removes duplicate rows from results
mysql> SELECT DISTINCT President.Party FROM President JOIN Vote;
+---+ | Party | +---+ | Democratic | | Republican | | American | | Libertarian | | Independent | | Green | | Reform | +---+
7 rows in set (0.00 sec)
sqlite> select distinct President.Party ...> from President join Vote;
Party ---Democratic Green Libertarian Republican Independent Reform American
sqlite> select distinct Name, Vote.Year ...> from President, Vote
...> where Name = 'Ralph Nader'; Name Year --- ---Ralph Nader 2012 Ralph Nader 2008 Ralph Nader 2000 Ralph Nader 1996 Ralph Nader 1992 Ralph Nader 1988 Ralph Nader 1984 Ralph Nader 1980 Ralph Nader 1976 Ralph Nader 1972 Ralph Nader 1968 Ralph Nader 1964 Ralph Nader 1960
implicit join creates too many entries for 'Ralph Nader'
ON clause determines columns to search for matches between tables.
sqlite> select Name, President.Party, Vote.Percent, ...> Vote.Year
...> from President join Vote
...> on President.Party = Vote.Party ...> and President.Year = Vote.Year ...> where Name = 'Ralph Nader';
Name Party Percent Year
--- --- ---
---Ralph Nader Green 1 1996 Ralph Nader Green 3 2000 Ralph Nader Independent 0 2008
INSERT
Insert values into tables. When a new row is to be inserted the columns are not needed with values in column order.
INSERT INTO tableName (columni, ... columnn VALUES (valuei ... valuen);
sqlite> insert into President (Name, Party, Year) ...> values ('Renzo Ghia', 'Yes please', 1999); sqlite> insert into President
...> values ('Renzo Ghia', 'Halloween', 2010); sqlite> select * from President
...> where Name = 'Renzo Ghia'; Name Party Year
--- ---
---Renzo Ghia Yes please 1999 Renzo Ghia Halloween 2010
When inserting values into tables that are related to other tables (foreign keys) all tables must have appropriate values inserted.
UPDATE is used to alter (modify) tables
UPDATE TableName SET columni = valuei, ... WHERE ...; sqlite> update President
...> set Party = 'Halloween' ...> where Year = 1999;
sqlite> select * from President
...> where Name = 'Renzo Ghia'; Name Party Year
--- ---
---Renzo Ghia Halloween 1999 Renzo Ghia Halloween 2010
DELETE
If not WHERE clause is used with UPDATE then all rows in table are updated.
Delete a record from a table. Delete used w/o a WHERE clause deletes all rows from the table (no undo).
DELETE FROM tableName WHERE ...; mysql> DELETE FROM President
WHERE Name = 'Renzo Ghia';
Query OK, 2 rows affected (0.00 sec) mysql> SELECT * FROM President
WHERE Name = 'Renzo Ghia'; Empty set (0.00 sec)
Deleting related records from a table.
Use ALTER to change a table's design. ALTER TABLE tableName
CHANGE columnNameOld columnName New columnType columnOptions;
ALTER TABLE tableName
ADD newColumnName columnType columnOptions; ALTER TABLE tableName
ADD newColumnName columnType columnOptions;
other add options: primary key, foreign keys, index,
ALTER TABLE tableName DROP columnName; sqlite's alter can't add composite keys
To delete a table
CREATE
There are GUI based mySql administration tools to use for large databases: phpMyAdmin, MySqlWorkbench (updates
MySqlAdministrator)
CREATE DATABASE databaseName;
CREATE TABLE [IF NOT EXISTS] tableName (
columnNamei, columnTypei, columnOptionsi);
mysql> CREATE TABLE Vote ( Party VARCHAR(20), Percent int(2), Year int(4),
PRIMARY KEY (Party, Year) );
mysql> CREATE TABLE President ( Name VARCHAR(30), Party VARCHAR(20), Year int(4),
PRIMARY KEY (Name, Year) );
mysql> CREATE TABLE VicePresident ( Name VARCHAR(30), Party VARCHAR(20), Year int(4),
CREATE TABLE tableName
SELECT * FROM columnName WHERE condition;
Fill table with data from local file
mysql> LOAD DATA LOCAL INFILE 'Vote.txt' INTO TABLE Vote;
Query OK, 50 rows affected (0.00 sec)
Records: 50 Deleted: 0 Skipped: 0 Warnings: 0 mysql> LOAD DATA LOCAL INFILE 'President.txt'
INTO TABLE President; ...
mysql> LOAD DATA LOCAL INFILE 'VicePresident.txt' INTO TABLE VicePresident;
...
========================================================== sqlite see page 8 for creating tables and loading them
SQL Backup / restore
Create can be used to make a backup of tables
CREATE TABLE backupTableName SELECT * FROM TableName;
Restore a table from a backup
DELETE FROM TableName;
INSERT INTO TableName SELECT * FROM BackupTableName;
Make a backup copy of entire database – done outside mySql at command prompt.
$ mysldump – u loginName –p databaseName > backupFile Enter password: xxxx
Restore a database.
$ mysql –u loginName –p dbname < backupFile Enter password: xxxx
sqlite shell dump command can be used to create a text file of sqlite commands to create (re-create) the database.
sqlite> .output election.bak // output sent to fileName
sqlite> .dump
// selected, representative, lines from election.bak
PRAGMA foreign_keys=OFF; BEGIN TRANSACTION;
CREATE TABLE President(
Name text not null, Party text not null, Year int not null, primary key (Name, Year));
INSERT INTO "President" VALUES('Barack H. Obama','Democratic',2012); ...
INSERT INTO "President" VALUES('Richard Nixon','Republican',1960); CREATE TABLE VicePresident(
Name text not null, Party text not null, Year int not null, primary key (Name, Year));
INSERT INTO "VicePresident" VALUES('Joseph R. Biden','Democratic',2012); ...
COMMIT;
sqlite shell backup and restore
sqlite shell backup command can be used to create a "binary" file backup of named or current "main"
sqlite> .backup election.bak2 sqlite> .quit
// restart sqlite3
sqlite> .open election.bak2 sqlite> .tables
President VicePresident Vote sqlite> .schema
CREATE TABLE President(
Name text not null, Party text not null, Year int not null, primary key (Name, Year));
...
=====================================================================
Restore database from text file dump
sqlite> .read election.bak sqlite> .tables
// Code fragments from posted MakeDB.java example
import java.sql.*; public class MakeDB {
Connection connect = null; // connection to database
Statement stmt = null; // a SQL statement
ResultSet rs = null; // values returned from SQL query
String sql = null; // string for the sql statement
public int insertDataObjects(int n) {
// create values for id, age, name and gender here
try {
sql = String.format("insert into DataObject (ID, Name, Age, Gender) values (%d, '%s', %d, '%s')",
id, name, age, gender); stmt.executeUpdate(sql); } catch ( Exception e ) {
System.err.println( e.getClass().getName() + " insert into table " + i + " " + e.getMessage() );
System.exit(0); }
// ...
return i; }
public makeDB() { try { Class.forName("org.sqlite.JDBC"); connect = DriverManager.getConnection("jdbc:sqlite:dataObject.db"); // create statement stmt = connect.createStatement(); // create table
sql = "drop table if exists DataObject"; stmt.executeUpdate(sql);
sql = "create table if not exists DataObject " + "(ID int primary key not null," +
" Name varchar(10) not null," + " Age int not null, " +
" Gender varchar(10))"; stmt.executeUpdate(sql);
System.out.println("defined table DataObject"); }
catch ( Exception e ) {
System.err.println( e.getClass().getName() +
". attempt to create table: " + e.getMessage() ); System.exit(0); }
try { // Query table
rs = stmt.executeQuery( "select * from DataObject where Name = 'ginger';");
int id, age;
String name, gender; while ( rs.next() ) {
id = rs.getInt("ID");
name = rs.getString("Name"); age = rs.getInt("Age");
gender = rs.getString("Gender");
System.out.printf("%d \t %s \t %d \t %s \n", id, name, age, gender); } rs.close(); stmt.close(); } catch ( Exception e ) { System.err.println( e.getClass().getName() +
". attempt to query DataObject: " + e.getMessage() ); System.exit(0);
DBMS Normalization
Entity specification (aka relation) declares an entity’s attributes Normalization process “designs” an optimum set of entity
specifications for all types of modifications.
create many simple entity specifications from many associations minimizes update anomalies and data redundancy
single-valued dependencies: 1:1 primary key is the determinant A’s value determines B’s value (where B is all attributes) multi-valued dependencies: 1:N and M:N
A determines many B and B determines single A A’s value determines finite set of B values A determines many B and B determines many A
A’s value determines finite set of B values B’s value determines finite set of A values
Prof ( pid, office, name, dept)
pid name, office, dept office dept
Member( office, dept)
20 413 Barnes CS 21 414 Smith CS 25 404 Gumb CS 34 304 Wong EE 55 402 Katz EE 57 402 Schwartz EE 27 414 Schwartz CS 414 CS 413 CS 404 CS 304 EE 402 EE pid office name dept
If attributes of an entity are determined by the value of the determinant the semantics (associations that exist among attribute values) can be represented in the data – not in the application program (or database). functional non-key attributes of entity are determined by the
entire set of the primary key (not a subset).
partial entity non-key attributes is dependent on a subset of the primary key.
transitive exist within non-key attributes of an entity when a functional dependency exists among themselves. Boyce-Codd exists when a non-key attribute determines part of a
composite primary key set. Infrequent, not discussed.
Prof ( pid, office, name,)
pid name, office, office dept
Member( office, dept)
20 413 Barnes 21 414 Smith 25 404 Gumb 34 304 Wong 55 402 Katz 57 402 Schwartz 27 414 Schwartz 414 CS 413 CS 404 CS 304 EE 402 EE pid office name dept
Entities:
E1 ( A1, A2, A3, B, C ) E2 ( A1, A3, C ) E3 ( F, G, H, I ) E4 ( H, I ) un-normalized example A1 A3 A2 B C F G H I
Normalization seeks to make all single-valued dependencies functional dependencies.
First Normal Form An entity specification is in first normal form if all its attributes are atomic values (not structures, or
repeating groups). All "flat files" (common files, files with nonvariant records) are 1NF.
All the entities on the previous slide are 1NF.
Second Normal Form An entity specification is 2NF if it is 1NF and there are no partial dependencies.
A partial dependency requires a composite key.
There exists a partial dependency in E1, C is functionally dependent only on A1, A3 – A2 is not required to determine C.
E1.1 ( A1, A2, A3, B ) // remove C from E1.
Third Normal Form An entity specification is 3NF if it is 2NF and there are no transitive dependencies.
E3 is not 3NF H is not a key (in E3) and it determines I
E3.1 ( F, G, H ) // remove I from E3
3NF
Normalized example
Entities:
E1.1 ( A1, A2, A3, B ) E2 ( A1, A3, C ) E3.1 ( F, G, H ) E4 ( H, I ) F G H I A1 A3 A2 B C
20 cs132 456 20 cs132 457 20 cs182 458 30 cs182 458 20 cs182 459 30 cs182 459 38 cs242 461 42 cs242 461 38 cs242 463 42 cs242 463 42 cs440 540
A multivalued dependency (MVD) exists when key can determine more than one value (from a finite set of values) for another attribute. Texts ( bid, course, section)
bid course course bid
course section
course section
bid A course could have multiple
sections. Each section could require a different textbook. The same text
4NF
Fourth Normal Form An entity specification is 4NF if it is BCNF and there is at most 1 MVD within the entity.
violations of 4NF have high data redundancy with possible update anomalies
Texts entity has 1 : N between course and section
1 : N between course and bid (book id)
M : N between course and bid
ClassTexts (bid, course) ClassSections ( course, section ) bid →→ course course →→ section section → course 20 cs132 20 cs182 30 cs182 38 cs242 42 cs242 42 cs440 cs132 456 cs132 457 cs182 458 cs182 459 cs242 460 cs242 461 cs440 540 bid course section
Fifth Normal Form An entity specification is 5NF if it is 4NF and dependencies cannot be decomposed into simpler entity specifications w/o loss of information.
Domain Key Normal Form Is the goal of normalization.
All dependencies in the data are represented by keys determining non-keys.
Practically, SVDs are designed to remove violations of 3NF MVDs are designed to remove violations of 4NF.
Complex entity specifications (stuff all attributes into 1 relation or 1 file) are prone to be poorly designed because dependencies can't be expressed as a function of primary keys.
(BCNF and 5NF violations are infrequent and usually non-intuitive entity specifications.)
exercise
Determine the normal form for each entity specification and normalize the set of entitites. R1 ( B, E, A, C, H ) R2 ( S, U, R, F ) R3 ( N, O, W ) ( B, E) A, C, H C H (S, U) R, F U F N O, W W O
SQLite tutorial
http://www.tutorialspoint.com/sqlite/
SQLite org, command syntax diagrams
http://www.sqlite.org/lang.htm
MySQL on-line reference manual
http://dev.mysql.com/doc/refman/5.6/en/ On-line SQL guides, tutorials
http://www.sqlguides.com/sql_join.php http://www.sqlcourse.com
http://w3schools.com/sql/
MongoDB, an open-source no SQL document database