Agenda
Announcements
Chapter 17 material – Physical Database
Design
15 Minute Break
Annoucements
PMI meeting this evening at the IT Club
12/29
Discuss actions to date Organize future activities
Collect emails for further interest
Project Designs due 1/8 or ealier
Any questions?
Project Presentations start 1/5
Review: Logical DB Design
Logical DB Design is about converting ERDs or
Object Models into a logical record structure
The logical record structure is then converted to a
relational model
Logical Record Structure: a way of describing
records at the system level
We use relational analysis to convert logical record
structure to a set of relations
Relational analysis is a standard technique for
reducing models into non-redundant forms by
Review: DB Design 101
If we wanted to do so, we could store every item in a student table just like that below.
What is wrong with this design?
STUDENT_ID NAME DOB CITA_110_GRADE CITA_205_GRAD E
Tlombardi Thomas Lombardi Feb 12, 1970 75 Olombardi Ola Lombardi Jun 13, 1972 89
JSmith John Smith May 27, 1975
Review: DB Design 101
Database Normalization: a technique for
designing relational databases that aims to reduce data duplication, null fields, and data anomalies
If the same piece of data is stored in more than 1
place, it is possible for an update to put the data into an inconsistent state thus violating data
integrity.
Sufficiently normalized tables preserve data
integrity
Different types of databases require different
Review: Relations
Relation
: a table or list of values
Attribute in a relational model
: a column
of the table of list of values
Relational Model
: A set of tables that
describe the data in a system
Review: Relations
USER-ID USERNAME PASSWORD
1 Tlombardi Pwasd
2 Bob 23424jkd
GROUP-ID GROUPNAME
1 Admin
2 Member
GROUP-ID USER-ID
1 1
2 1
Review: Non-Normal Form
ORDER-NO ORDER-DATE ORDER-LINES
Ord1
Ord2
6 June 1998
3 May 1998
PART-NO QTY-ORDERED p1 10 p6 30
PART-NO QTY-ORDERED p5 10
p6 50 p2 30
Review: 1st Normal Form
ORDER-NO
ORDER-DATE PART-NO QTY-ORDERED
Ord1 6 June 1998 p1
10
Ord1 6 June 1998 p6 30
Ord2 3 May 1998 p5 10
Ord2 3 May 1998 p6 50
Ord2 3 May 1998Each attribute has a discrete value!p2 30
Review: Functional Dependencies
Functional dependency: where one value of
an attribute determines a single value of another attribute.
For example, in our previous case,
ORDER-DATE is functionally dependent on ORDER-NO.
There can be only one ORDER-DATE for each
ORDER-NO!
In the same way, there can be only 1
Review: Relation Keys
Relation Key
: a set of attributes whose
values identify a unique row in a relation
Can include more than 1 column Relations can have more than 1 relation key Must apply for all possible contents of the
relation
Key attribute
: an attribute is a key attribute
if it is at least part of one relation key
Non-key attribute
: an attribute that is not
Review: Normal-Form Relations
Normal-form Relations:
a set of relations
that describes the data in the system, but
where each data component is a simple value
We would like relation keys to determineunique values of non-key attributes
Data Redundancies
The design goal is to eliminate data
redundancies
Relations without such redundancy satisfy a
Review: Normal Forms
1NF: each column is a discrete value 2NF: All non-key attributes must be functionally
dependent on the whole of each relation key
3NF: There can be no functional dependencies
between non-key attributes
BCNF (Optimal Normal Form): set of relations
that have no single-value redundancy
4NF/5NF and beyond: address problems
arising from multivalued dependencies
Mutlivalued dependency: where one value of
Normalization and its Discontents
Is a higher degree of normalization better?
The higher the degree of normalization the more the
database is write optimized
The lower the degree of normalization the more the
database is read optimized
Different types of databases require different levels A reporting system that is too normalized will
require more joins and impact performance adversely
A transaction system that is denormalized will result
Chapter 17: Physical DB Design
Databases are implemented using databasemanagement systems (DBMS)
Each DBMS has its own properties and
proprietary technologies
The benefit of moving from logical DB design to
physical DB design
Ensures that design is free of proprietary solutions
Materialized views are unique to Oracle
Chapter 17: Physical DB Design
Major questions of physical DB design: how will
records be stored? how will records be accessed?
The default option is the table scan: row by row
search for data in the order of storage
In general table scans are horrible for performance!!!
Direct access: Retrieval of records based on
indexed fields (keywords)
Index: uses one or more fields as a key to find
records quickly
Chapter 17: Database Types
Relational Databases Oracle
SQL Server
MySQL
Hierarchical Databases
IBM Information Management System (IMS)
Network Databases
CODASYL Standard (Consortium on Data Systems
Languages)
Object Databases
Chapter 17: Convert LRS
Converting the Logical Record Structure to a
data model supported by the DBMS
RDBMS – simply convert LRS into relations
that will become tables in database
Network DB
Logical records become record types Logical record links become set types
Hierarchical DB
Only one parent
Chapter 17: Access
Requirements
How will the data be accessed? Example: the login screen
Every login screen accepts a username / password
You can be sure that the database table behind
this application has an index on the username field
SELECT * FROM USER WHERE
USERNAME=‘TOM’;
The USERNAME field should be indexed so that
Chapter 17: Access
The record types accessed by each request
Which tables? How often?
The sequence in which records are accessed
ASC, DESC, ordered on multiple fields
The access keys
What fields need to be indexed?
The items retrieved from each record
What fields need to be pulled back?
The number of records accessed
Chapter 17: DB Structures
Indexes are only part of the story of physical
design
Tablespaces are containers for tables that affect
how data can be accessed
Defines the storage characteristics for data including
Maps to logical containers to physical files and devices
Allocates new storage when needed in extents
Tablespaces can be spread across several physical
devices
Indexes and Tables can be stored on separate
Chapter 17: DB Structures
Block Sizes – the chunks of data allocated to a
table
Block Size should be multiple of Disk Block Size Block Size should be tuned to record length
Buffers, Buffer Pools, Cache, Cache-hit ratio Stored Procedures, Triggers, Distributed
Transactions
DB Configuration and DB Design
In order to meet your performance requirements,
you may need to tweak design purity
Chapter 17: Design Trade-Offs
Performance vs. Storage – Indexes requirespace. You can always lean on storage to improve performance
Performance vs. Structure – Redundancy in
data can improve performance in certain areas.
Read vs. Write – Indexes improve retrieval
metrics, but degrade the performance of
insert/update because the index must be rebuilt
Performance vs. Administration – Indexes
require maintenance
Rebuild indexes periodically
Chapter 17: Database Design
We have talked about the differences
between
OLTP Databases Data warehouses
Let’s review these differences in light of
Break
15 Minute Break
When we get back we’ll discuss Program
Chapter 16: Program Design
We have captured business logic in a number
of different ways:
DFD Sequence Diagrams Transition Diagrams Structured English Decision Tables
How do we turn these analysis and design
Chapter 16: Program Design
Identify automation boundaries in a DFD
This highlights which part of a process should
be automated
Deciding which tasks should be automated vs.
manual can influence the feasibility of a project
Identify component subsystem boundaries
What are the major components?
Can we work on these concurrently?
How will we test these? Can they be tested
Chapter 16: Program Design
After subsystems are chosen, processspecifications are refined for each DFD process
DFD processes are converted into program
modules
Program modules are arranged on a structure
chart
Structure chart displays program modules and
their interconnection
Chapter 16: Program Design
Program Design Metrics
How do we know if our modules, packages,
programs make sense in terms of design?
We use coding metrics to help us
Static Analysis Tools PMD
CheckStyle
Dynamic Analysis Tools Findbugs
Chapter 16: Program Design
Coupling
describes the nature, direction,
and quantity of parameters passed between
modules
loose coupling
Cohesion
describes how system functions
are coded into modules
One function, one module = high strength
Loose coupling and high strength =
Chapter 16: Coupling
Content coupling: when a module makes direct
references to the contents of another module
These should be passed as parameters
Imagine if an object had all public variables!!! This is horrible!!!
Common-Environment Coupling: when
modules refer to the same data structure in a common environment
Shared files
Chapter 16: Coupling
Control coupling
: when a module passes a
control element to another module
Control flags, function codes, switches
Changes made to called function require
changes to calling function
Data coupling
: when a module passes data
through a well-defined interface to another
module
Most desirable form of coupling
Chapter 16: Cohesion
Coincidental Strength: there is no meaningful
relationship between the parts in a module
Logical Strength: all of the elements in a module
perform similar tasks: edit, read, write
Temporal Strength: all functions are grouped
according to when they are used: initialization, etc.
Procedural Strength
Communication Strength: all functions that
communicate with each other are in the same module
Chapter 16: Code Complexity
McCabe’s Cyclomatic Complexity Number
measures the number branches in code to estimate the complexity of code.
The higher the number the more complicated the code Coupling/Cohesion metrics can help plan for testing
Modules that have many other modules that depend on
them should be more thoroughly tested
Modules that have many other modules that depend on
them may need to be restructured
Quality metrics
Software Architecture Notes
Transactions across multiple systems
Distributed Transactions
Oracle
SQL Server
Message Queue LDAP system
Transaction coordinator uses 2 Phase Commit
Protocol to ensure that all members in the transaction agree
Software Architecture Notes
Sacrificing design purity for performance The SQL Server 2000 driver debacle
SQL Server 2000 Driver for Java had a bug in its code
that made statement caching not work
The driver rebuilt a query from scratch each time it was
called
Executed 100,000 times
Worked perfectly with other platforms Oracle, SQL Server 2005
For performance, added if it’s SQL Server 2000 logic
then do this other crazy thing!!!
Software Architecture Notes
Sacrificing design purity for ease of maintainence
Validation engine required frequent changes due to new and changing requirements
Rather than change interfaces again and again we invented the bag interface
Rules for the bag: you can put anything you want into it
Every validator accepted a bag as a parameter with its configuration items inside