University of Warwick institutional repository: http://go.warwick.ac.uk/wrap
A Thesis Submitted for the Degree of PhD at the University of Warwick
http://go.warwick.ac.uk/wrap/60752
This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself.
CowertdonsffVcsffi
qf
ffi
A{,LIw.
Forstt
6s
nrv
goEhsri,
Q*qa'
AffisrytisnsL@y
Sasedsn
srlAfqe1raTs
dn{
A
tsser+etstusffifsr
the
{egroe
Doctor
{
"Phifosofry
Dwartffi?xlt,
'<nu*'ffiH*
of
Compular
Ecience
rU,(,
*njwge
Dedicated
to
m1z
mother
and
the
mernoryof
myfather
TABLE OF COTITEI\NS
Abstract
Prologue
-
Theauthor's
first
contactwith
a
database systqn-
Mathematical notation-
AcknorryledgementsChapter
I ....
oo...o..
i...
1OA
critical
surveyof
present database systems: What can be learned frcrn thsn?A -
The problemB -
Record based databasesC -
Problsnsof
the
record based databasesD -
Assesgnentof
the
relational
approachto
databasesE -
Newtrends
in
Computer ScienceF -
Motivation
for
VarqaChapter
fI .... t... o...
28 An overviewof
the
languageA -
Objects
in
the real
r,sorld andobjects
in
a
databaseB -
Databases as algebrasC -
Onvariables
D -
Variable binding
operatorsE -
Theextra object
I
F -
QueriesG -
Comparisonwith
other
r^rorkChapter
IrI
. . ...
. . . o. . . o....
. . . . o. ....
5bFormal
Specification
of
VargaA -
Symbols and symbolic expressionsB -
Semanticsof
the
languageChapter
fV
. . ... ..
. . ....
. .....
. . ...
o..... .... .
86Inductively
defined
setsA -
Introduction
B -
Related rrorkC -
The WHEREnotation
D -
fnductive
definitions
E -
Evaluation
in
the
presenceof
OChapter
V ...
o... o... o.... o...
1O3Incanrplete information
A -
Many-valuedlogic,
a
surveyB -
NuIl
values
in
databasesC -
Ttreprinciple
of
groring certainty
D -
Extending Vargato
handlenuIl
valuesE -
Some remarksChapter
VI ... o...
o...
!i...
l4OVarqa viecrcd
in
termsof
relational
algebraA -
A simplefunctional
guery languageB -
Relational
algebra,
analternative
al4)roachC -
Translation
of
SFQLto
RAQLD -
Some rgnarks EpilogueBibliography
Abstract
We
svntax
propose
a functional
queryand ssnantics
are
based onlarrguage
for
databasesnftere
bothconiEntl6nal
mffifrEEcs.
We argue thaL database
theory
shouldnot
be separatedfrcrn
otherfields of
Com-outerScience,
andthat
database languages should havethe
saneproperties
as thoseof
other
non-procedural J.anguages.The
data
are
representedin
our
database asa collection
of
sets,and
the
relationships
betroeenthe
dataare
representedby
functionsmapping these
sets
to
eachother.
A
database
is
therefore
a
many-sorted
1]ge!5r; i.e. a collection of
indexed
sets
and
in?Iexftoperations.
Asin
abstract
data
type
specification, we specify
the conseguencesof
applying operationsto
the
datawithout
referenceto
any
particular
internal structure
of
the
data.A guery
is
simplyan
expression
wtrich
is built up
frcm
thesynbols
in the
signatureof
the
algebra and wtrich ccmplleswith
theformation
rules
givenby
the
language. The nreaningof
a
guery
is
thevalue which
is
assignedto it
bythe
algebra.Ttrere
are several
waysof
extendingour
language.
llpo
waysstudied
here.
the first
extensionis
to
arlowqueries
in
wtrichare
defined
inductiv4y (i.e.
recursively). Ttris
rnechanisnessential
for-@Es
-dearing
with tranlitive
crosures overinterrelated
objects.Secondly,
since
inccnrplete
information
is
cqnmon
to
manydatabasesr
\€
extendour
languageto
handlepartially
available
data.One main
principle
guidesour
extensions:
'wheneverinformation is
added
to an
inccrnplete database, subsequent answersto
queries mustnot
beless
informative
thanpreviously,.
Finally,
we showthe
correspondence betweenVarqa and
methodsused
in
current
databasesoftware.
A subsetof
Varga,including
all
features
whose implementationis
not obvious,
is
mappedto
relational
algebra
thus
showing
that
our
language, thoughil-nas
been designedwith
no referenceto internal structure, is not
incornpatible
with present database software.are
sets
1S
ProLoque
-The
author's
first
contact
with
a
database systemAbout
six
years ago (1976)the
authorjoined a large
team whosetask
wasto
ccmputerisea large
stock
control- and storage operation system.This
systernheld,
amongothers,
information onover
twenty thousandline
items,flovv
of
itemsfrcm
supptying
sources
(throughthe
receiving, storage and shipnentdepartrnents)
to
the
consrtrner,stock
level
,
andthe
nr-rnberof
itsns
on order,delays, denials
andother
mattersof
that
type.One
of
the
first
tasks
of
our
team wasto
conducta
survey on theexisting
(manual)information
handling technigues usedin
warehouses. Hereis
a
sunnaryof
the
accounts notedin
onetypical
warehouse: The warehousesupervisor (the
personin
chargeof
the
warehouse) hadspent
over
twentyyears
in
exactly that
warehouse. He knewthe
placeof
alnrost everysingle
itsn
byheart.
(There were over 5,oooitgns
in
his
warehouse.)
Althoughthere
wasa
card systsn asthe
location
reference,
the
su;:ervisor usedit
very
rarely.
There wererelatively
fewerrors
andthe
warehousestaff
were capableof
correcting
them.For example, mispraced items \^rere usuarly detected
quickly
and weremoved
to
their
correct
locations.
However,
the
tediouspart
of
the
operation wasfor
the
staff
to
keep
track
of
the
transactions
and input,/outputof
the
warehouse.There were masses
of
paperto
be
sorted
andfiled
everyday.
Atransaction
form hadto
befiled
for
every movementof
anitem.
Therewere ntore
staff
busywith sorting
andfiling
than were assignedto
all
other
tasks
in
the
warehouse.In
later
investigations
it
wasdiscovered
that
further
"paperwork,
sorting
andfiling"
took placein
twoother
offices
(viz.
onthe other
copiesof the
transactionforms).
This
seemed extremely wasteful.Anway, despite
considerableresistance
frcm warehousestaff,
the
operation waseventually
canputerised. The advantagesof
this
ccnputerised system
over the
old
manuar one vrere numerous. The morestriking
ones hrerea
rsnarkable speeding upof
the
wtrole operation,and
relief of
the
warehousestaff
frcmrepetitive
and altogether avoidable paperrrprk:
a central control
systern
supervised theentire
operation.After
a short while,
hqpever, occurrencesof
several errors
revealed
that
r.ae had beennaive
in
thinking
that
the
new systern wasperfect.
The unhappyfact
wasthat
theseerrors
manifested thernselvesin
a diverse
fashion.
sqne rryere htrnanerrors
(e.g.
wrongentries
to
the
systern,rike
punchingmistakes)
and sorne were ccxnpleteryLater,
weal1
(both warehousestaff
andus)
grewto
acceptthe
problernsof
the
new systen asa
fact
of life.
Whenever anybodyasked
the
project
manager aboutthe
errors
hereplied
:"We11.... Whdt
else
do you expect? You know what machinesare
like."
fhre
project
manager wasnot
totally
wrong
and, anln*ay,at
that
time
it
was beyondour
meansto
do muchbetter.
Yearslater,
however,the situation
has seenvery
little
improvement. Most experienced ccxnputer usersare
not
surprised
if
they
find
errors
for
which noreason
exists.
Indeed, manyusers,
like
our
formerproject
manager,do
not distinguish
betweenthe
'software
faults'
andthe
machine'sfaults.
Perhaps
a
totally
different
atrryroach cansolve
(at
least
part
of)
the
problem.Mathematical Notation
We discuss here
notation
whichwill
theoretical
conceptsthe set-theoretical, logical
be
applied
in this
thesis.
derive fron
ideas presentedand
metalinguistic
Part
of
the
set-in
[HMT-71].In
of
set theory, the
fundamentalrelationship
betweenobjects is
membershipi we
use
x€A
to
expressthat
object x
belongsto
that
the set
A.
x/A
expressesthe
opposite.
Aq
B
and
A CB standfor
"A
is
a
subsetof
8",
and "Ais
a true
subsetof
B" respectively.
The union
of
twosets
A and Bis
written
as
Au
B
,
andtheir
inter-section
is
written
as
A/lB.
Given
a set
A anda
formulap(x)
weuse { x I
x€a andp(x)
}x and{
x€AI
p(x)
}*
to
denotethe set
of
all
objects
frcm
A
whichsatisfy
the predicate
P(x)
, i.e.
those erementsof A
for
which thepredicate
P(x) holds.
Thus,
y
€{
x€AI
p(x)
}x iff y is
an elementof
A
andP(y)
is
true.
whenever unanbiguous we mayonit
the
trailing
subscript
x.For any two
sets
A andB,
the cartesian
productof
A and B,denoted
by
Ax
B,
is
the set
of all
orderedpairs
(a,b)
suchthat
a€A and beB. A
relation
between A and Bis
any subsetof
the
cartesian productof
A andB.
(Wewill
see a more qeneraldefinition
of
relations
in
the
later
chapters.)
a function
f
frcnr Ato
B
,
written
asf :
A-->
B, is
a
relation
betroeen A and B where no twopairs
haveegual
first
and unegual second members.That
is,
a relation
R betweenA and
B
is
a function
iff
(arb)€R
and
(arbr)€R
implies
Fb'
.The
set
of
all first
membersof
the
elementsof
a function
f is
called
the
dcrnainof f
andis
written
as
Dqn(f).
Tlrus,Dsn(f)={alf
b:
(a,b)€f}.
Similarly,
the
rangeof
function
f
is
Rng(f)={
b I f a r
(a,b)€f
}.
Given
a function f:A--)B and
A'gF
,
the
restriction
of f
to
A'is
the set
of
ar1
pairs
frcmf
whosefirst
member belongsto
A'.
weIn
this
thesis
each non-negativeinteger
is
regarded asthe
setof all
precedingnatural
nwnbers (they areOrTt2r3r...):
nunber zerois
the
ernptyset
A,
oneis
the
set
{O},
twois
{1,O}
and so on.Thus,
the
ntmbern
is
the
set
{Orlr2r....rn-1}.
Wewill
seethat
this
notation
is
remarkablyhelpful in
later
formalisns.
Given anatural
numbern,
i€n
will
befreguently
usedto
indicate
that
i
is
oneof
the
natural
nunbersfrsn
the
set
{Or1r2r..,.rn-1}.
In this
thesis, a
seguenceis
a
function
whose dcrnainis
the
set correspondingto
a
natural" nr.nnber. For example,a
seguences of
lengthn (i.e.
a
seguence whose dcrnainis n)
hasan ith
elementfor
everyi€n,
andis
representedas
(=O,"l_,
.... , sn_L). lsl is
thelength
of
the
seguence s.For
anyset A
,
P(A)is
the
powersetof
A wtrichis
the set
of
all
subseLsof
A, i.e.
De
P(A)iff
D C A.ForanytwoformulaePanderrl€use
p&e , pve , p-)
and
r P
to
denote
conjunctionof
p
ande,
disjunction
of
p
and e,P implies
Q,
andthe
negationof
p respectively.
V
and
]
standfor
the
universal
andthe
existential
quantifiers
respectively.
we scrnetimes
use
F
and
T
in
placeof
the
truth
val-uesfalse
andtrue.
Greek
letters
will
be used asmetalinguistic
variables.
the
structure
of this
thesis
the
chaptersin this
thesis
are
designated by Mnan numeralsf, ffr...
Referenceto
a
whole chapteris
therefore
givenas,
e.g.
chapter fV.Major
divisions
of
each chapterare
called
'sectionsr.
hteidentify
thesections by
bnan
capital
letters
A, B,
Cr...
the outline
of
the thesis
is
asforlows.
chapterr
contains abrief
surveyin
the state
of
the
field
of
database systems. An analysisof
the
existing
problens wtrichare relevant
to
guery languagesis
given.Ihe
primary airnof this
chapteris
to
give
notivation
for
the
designof
a
new guery language andto
state
its
perspectives.An
informal introduction
to
the
major concepts and constructsof
the
protrnsed languageis
given
in
chapterrr.
oneof
the
fundanental conceptsis
that
of
algebra:
a
brief
introduction
to
algebra and thealgebraic
atrryroachis
therefore
includedin this
chapter.rn
chapter
rrr
nreformally define
r,,trat a databaseis
and giveprecise
sernanticsof
a
database andof
our
language. several gueriesagainst
twodifferent
databasesare
formulated andtheir
evaluationis
discussedin
the
final
sectionsof
chaptersff
andIII.
rn
chapter
rV roe extendour
languageto
cater
for
certain
typesof
gueries
wtrich
cannot be expressed usingthe
toors
providedin
then
allon
queries
*rich
define
sets
inductively.
this
mechanismincreases
the
por+erof
our
langu4e
considerably.Chapter
V
is
focused ona
problan rrrtrichis
ccnunonto
mostdata-base systems, nanely
copirg
with
inccnrpleteinformation.
After
studyingvarious
aspectsrelating
to
nonavailability
of
datarre enrich
the languageto
handle such cases. Ocviously rtrenthe
information
in
the databaseis
not
conplete,
the
ansrrerto
any queryis
only
anapproxi-mation
to
the
true
result.
The aimis
to
get
the
nrost preciseapproximation.
In
chapterVI a translation
of
a
subsetof
Vargainto
relational
algebra
is
given.
rtre
intention
is
not
to
suggesta
technigr:efor
impramentation,
but
to
give a
corres[Dndence betweenour
languageand rnethods used
in
current
database software.Finally,
the
epiloguegives
anoutlook
on areas wherefurther
work can be done. The
thesis
closeswith the
list of
references.Acknowledqements
In
the
preparationof this thesis I
have enjoyed constant advice and much encourage[nentfrsn
my supervisorBill
Wadge. Thrisis
the place
to
saya special
"thank you"to
him.I
rrouldlike
to
thank
Tqn Maibatrnfor
his
valuableccnrnents and
stimulating
suggestions.Being a nember
of the
Semanticsgroup
has providedthe ideal
envirormentin
wtrichto write
this
thesis.
Thanksare
dueto
the
wtrole groups TronyFaustini
,
steve Matther+s,Paul
Pilgran
andA1i
yaghi.I
rmuldalso
like to
thank Cornputer Science Departmentfor
in
various
ways.all
other
nembersof
the having beenof
assistanceMrs.
shirley
sarnondid
the
beautifur calligraphies.
Ib
endthe
prologue,
hereare
sqne ttords aboutthe nane
"Varga".It is
a Farsi
nord
(with
Arabicorigin)
and npans " (male) nightingale". ft
wasthe
naneof
trto Baha'iswho nere
martlrred
in
nineteenthcentury
Iran
for
their
religious
belief .
Ihe nane Varga
has
beenchosen
to
recordthe
fact
that
the
persecution andexecution
of
the
Baha'is
is
continuing
in
fran
today.It is
pronounced"varko"
in
the
sanerhythrn
asChapter
I
A CRTTICAL S]IJDY OF PRESEDIT DATABASE SYSIEMS
Itrat
can be learned frcm them?rn
this
thesis
r"e proposea
functionar
guery language based onthe notation
of
conventional mathematics.fn
the specification
of
this
language, rre
side
with the
user anddirect
our
attanpts
towards thedesign
of
a
simple language wtrichis
also
practical
frcmthe
implenentors'point
of
view.Ihis
chapter containsa
brief
surveyof
sqne aspectsof
databasesytans wtrich
are relevant
to
our proposal.
rhe primary purposeof
this
chapter
is
to
analysethe existing
problens,
andjustify
our
attemptfor
designinga
new language.Problem
In
many circtrnstances concerningthe
managernentof
data,
oneis
confronted
with
large
anountsof
interrelated
information.
How canit
be handled systematically?This
basic
guestion spurred researchinto
the
field
of
databases.The usual probrem
of
representation andformulation
of
a
part
of
thereal
world
bythe
conputeris
of
central
imtrnrtance.only
the
amountof
stored information distinguishes
databasesfrqn
ordinary
cornputerized systems.Ttrere
are
several descriptions
of
database systems. Awidely
acceptedone
is:
"A databaseis
a
sytnbolic representationof
knorledge aboutpart
of
the real
world"
[We-76]. Or,
as
in
[Oa-ZAJ:
"a database
is
acollection
of
stored operational
data used bythe
aprptication systemof
scnne
particurar
enterprissrr.
w€are not
goingto
point out the
subtledifferences
betr.reen these vierps (andother
vievirs) because,at this
level
of
detaif it is
not
important.
A moredetailed investigation
of
databasesreveals
a
nunberof
critical
points.
The more important issuesin
designinga
database are:hovy can
data
be organised?what
is
the
best
wayof
storing
data?hovr can
the
stored data
be accessed?is
the
databasea true
representationof
the real
world?Further
guestions concernsecurity,
maintenance, redundanry and severalothers.
Each
of
the
aboveprincipal
issuesis
rather
conplex.In
the
caseof
accessingthe data,
for
example,the
problems range frcm searchstrategies
within
the
databasefiles
to
formulation
of
various
queries.Storage
of the data is
concernedwith the
choice
of
appropriateccrnputer
devices,
storagearchitectures
andthe
filirg
systen organisation.In this
study,
we donot
aimat
giving solutions
to
all of
these issues.We
intend,
ho*ever,
to
study andto
suggestsolutions
to
soneof
the problemsrelated
to
data
access bythe
user.To
explain our
goa1, r+elook
briefly at
the
developnentsin
ccrnputerlanguages
and,
in
particular,
database guery languages.In
the
early
daysof
corputing,
the
programmer couldonly
beeither
a
ccnrputer designer orengineer,
or
soneonewith
similar
knorrvledgeof
the
system. Progranningreguired specialised
knovrledgeof
everydetail
of
the
system.Later,
withthe
invention
of
moresuitable
hardrpare andspecialised
languages,the
useronly
neededto
learn
the
principles
of
a programming language. However,these
principles
v,ere, moreor
less,
thoseof
the
systemj.tself
.
fn
otherwords,
the
language designerstarted
frcrnthe
systondesigners'
view andsimplified
it for
the user.
Therefore,the
languages were machine oriented.Years
after,
this
methodis stil1 in
use.Considering
the
recent
improvementsin
both hardrrvare and software, we suggestthat, in
the
designof
languages,the highest
importance shouldnorrr be
given
to
the
usersr
requirenentssoftware reguirements.
this
mayresult
work
for
the
system engineers,but
that
and
not
to
the
hardware andin
a
considerable amountof
extrais
whatthey
have beentrained
for.
Our suggestion
is
equally
importantin
query languages.
In
the
oesignof
our
guerythe user,
andfurther,
will
keepthe
user'sthe
engineerts vievyof
the
system.The
real
database
the construction
of
databaselanguage, we
will
side
withvievr
clearly
separated frcmn 1^a is 20.
SalarY
or
uusr'
ltarY
is
married 'Sam
is s"i""
DePt'Jirn
is
J2 Years oldB.
Record Based Databasesl'lost database systsns
fal1
into
three
groups:hierarchical,
network and
relational.
Approaches wtrich cannot be categorized as abovewill
beindividually
studied
in
the next
chapter.the
fundanental concept behindall
these modelsis
that
(A
record
is
basically a
finite
collection
of
labeled data.)of
a
record.the hierarchical
nrodelof
organisirg data
is
based onthe notion
of
a
tree:
Ahierarchical
database nrodelis
a tree with
recordsas
its
nodes.In
trees
:nodes.
the
netr,vrork models,the
organisationA network model
is
a
directed
graphof
data
is
not restricted
wtrich has records
as
its
A B c
I
D
t
In
the relational
together
and;ninters
model
the
recordsof
the
sane t14n are groupedare
removed.The database
in all
these modelsis,
therefore,
a
vast collection
of
records,
and database managementis
the
art
of
handling these records. The unhappyfact is that,
traditionally,
thesestructures
are madevisible
to
the user.
Withthe
first
two models,the
user hasto
navigatea path
throughthe jungle
of
records.
In
a
ccmpany database,for
example,a
simple guery suchas
"nameof
the
departmentin
wtrich Smith works" i.nCodasyl [01-78]
notation is:
EITIPIOYEE. IrIffT{f,=' SMITTI' .
FI}JD EIVIPTOYEE RECORD BY CArc-KEY .
FI}iD O{NER OF CURRMiT ITN4 RECORD SET . GET DEPI .
PRI}TT DEPI.DI\IAII{E .
In
the
reLational
rnodelthe
pathsare eliminated,
but the
userstil1
has
to
think
in
termsof
records.
As an example, hre formulatethe
query"a11 onployees vtto earn more than
their
managers"in
Codd's sublanguage ALPIIA [Co-71] :RA}reE
ru
XRA}GE
R2
YGET W (RI.EMPIOYEE):
il ]V
INT.DEPT=Y.DEIUr)(Y. MAI{AGER=X. EMPLO}EE )
(R1.SAIAF[T > X.SAIAFIr)
where
Rl
is
a relation
on Er{pLoyEE, SAIART and DEpf, and R2is
a relation
onDpt
and MAIIAGER.& &
l.Iote
that
the variables
X and Y rangeover records.
EVen recentproposals,
such asthe entity-relationship
nrodeltch-76],
Aggregatemodel [SS-77] and SDM [HM-79], merely
build additional
structures
overProblems
of
the
Record Based DatabasesMany authors nolr argue
that
recordsare
unnecessarily complex andstil1
inadeguate [Ke-78,
Se-79],
dndthat
the
database user shouldview
the
data
in
a less
technical
way.We
note
that
with
the
present technology, records are most suitedfor
internal
representation andprocessirg.
Honever, h€ arguethat
theuser should
not
be awareof
the
internal structure
of
the
database.l€
stated before
that
the user's
view andthe
engineerts view need not benecessarily
the
sane. orrcethis
distinction
has been made, we canthen
look
for
a nore atrpropriate nrodelof
the real
rrorld
for
theuser.
"Ttte more weare
nrotivatedto
producea
faithful
modelof
real
information,
the
more hewill
havedifficulty
with
record basedconstructs."
IKe-79]One
of
the
main reasons why database guery languages merelyreflect
the internal
modellies in
the
tendenqgof
scrne database researchersto
work
in
isolation
and ignoranceof
new develqmentsin
other
fields
of
Computer Science. I{e argue
that
databasetheory
shouldnot
be separatedfrcm
the
other
fields
of
computerscience.
rn
particular,
databaselanguages should have
the
saneproperties
as thoseof
other
nonprocedurallarguages,
i.e.
values should bespecified
in
anabstract
mathematicalway,
with
no referenceto
anyparticular
implanentation method. l,teSutr4nse
that
Fred wantsto
ask Joeto
cut
a metaldisc
of
acertain dianeter.
Thereare
several waysin
wtrich Fred canspecify
wtratis
needed.fn
the
tedious
way, Fred may (somehow) drawa
circle
(or
sonethirgatrproximating
to
a
circle)
andgive
it
to
Joe,
so he can useit
as apattern.
Here, Fredanticipates
Joe's
rrorkin
a
token way.This
approach, ressnbresformulating gueries
in
coBolstyle
query Ianguages.In
such languages,the
userfinds
his
ansv€r bydirecting
a trninter
throughthe
collection
of
records.Arternatively,
Fred mayspecify
the
circle
by
its
mathsnaticalnotion. brrespondingry, the
userof
nonprocedural languages rtouldgive
anabstract specification
of
the objective.
lve
note,
hortever,that
the
relational
calculus
guery languages,arthough nonprocedural, revear scnre
of
the operational
aspectsto
the
user andinvolve him
(at least
partly)
in
technicalities.
rn
ouranarogy,
it is
as
if
Fted hadto
saythat
he wantsa
pieceof
nretalin
the
shape drawn bya
pair of
conpasses! !{e r,rculdcriticize
that
Fredis
burdenedwith
details
of
howthe
rrorkshall
be done.rt
may be the casethat
a
pair
of
ccnpassesis
the
nrost atrpropriatetool,
but
wtryshould Fred be concerned
with
that?Rcords
in
the
relational
calculus
languages, compasses.Ecords
may be goodoperational
tools,
not
be awareof
them.(Ihe next section
of this
chapter focuses on otherrelational
approachto
databases.)ressnble
the
pair
of
but the
user shouldfn
the
subseguent chapters r,rewill
dsnonstratethat
thealgebraic
atrproach[Zi-75
,
G)-77,
GII^I-78] provides what we areaiming
for:
independence frcrn any concrete representation and frcmany technigue
for
implenentation.D.
Assessnentof
the Relational
Amroachto
DatabasesIhe
relational
theory
of
databasesis
based onrelational
algebra.Informally
speaking, an algebraof
relations
is
basically
a
collection
of
relations
togetherwith
sotne operations rr'trich can operate on theserelations.
A
relation
ona
sequenceof
dcmainsis
any subsetof
the
cartesian productover
that
sequence. For example,if
the
dcmainsare
DorDrr....
and Dn then any subset
of
DOxDrx...xDnis
a relation
over DOrD1r...
andDn.
(TLrereis
analternative
nrethodof
definition
for
relations
andit will
bestudied
later.)
ltembersof
relations
arecalled'tuplesr.
As
relations
are
sets,
the set theoretical
operators,
such as union andintersection
can be aprpliedto
thsn.
other
operations are:concatenation
;
(wronglycalled
"cartesian product"
in
the literature)
Givent*o
rerations
R ands,
their
concatenation
nxsis
ir^s I
r€R and
s€si
nfiere
r^s
is
the
concatenationof r
and s.(Concatenation
of
trrofinite
sequencesr
ands
is
asequence
t,
suchthat:
i)
ltl=lrl+lsl
ii)
t!lrl =r
and
Velsl
alrl+j=sj.
projection
;
rf
Ris
a relation
overa
seguenceof
dcmainsD,
thenthe
projection
of
Rover
i,
wherei is
a
sequenceof
natural
nunbers 1essthan
lDl,
is
11,
tn)
=
{
,rro,rr,
,...)
|
r€n
}selection
;
Givena relation
R overa
sequenceof
dqnainsD,
and aformula F
built
up frcnr:-
relation
symbols=
, ) , )
,....irrtlich
operate onelsnents
of
lDl
or
constants-
the
logical
operators
&
, v
and -r
rthe selection
of
R overF,
denoted byor(R),
is
theset
of all
elsnentsof
R rrtrichsatisfy
F.Based on these operations several
other
operations such asjoin
and
guotient
are
defined.
Relationsare
often
represented astables.
Thus,
projection
andselection
correspondto
choosing colunns androvrs frcm a
relation
respectively.
A11 counterpart
definitions
in
the
available
literature
make useof
a
notion
vfiich,
whenrelations
are defined
in
this
manner, cannot bedefined;
nanely'arity' of
relations.
Arity of
a
relation
is
definedto
bethe lergth
of its
sequenceof
dcrnains. For exampre,if
Ris
a relation
on Dor
D1r....
E[ld Dn,the
arity
of
Ris
said
to
be n+1. sqne arguethat
although whenthe
relation
is
not
enrptythe
abovestatsnent
is
obvious, howrever,
the
arity of
the
snptyrelation
is
undefined. Notethat
taedid
not
make useof
the notion
of
arity
in
our
definition
of
relational
algebra.
Ttris dsnonstratesthe
redundanryof
the
useof arity.
The inunediate issuearising
frcrnthis
typeof
specification
is
that
it
associates
a
ntmberwith
each colunnof
the
relation, that
is,
the
colunnsare
identified
bya
seguenceof
numhrs.
Conseguently, mostrelational
algebra expressions
involving projection
(r*richare
conrnonly accepted) arenot mathsnatically
clear.
For instance,
if
Ris
a relation
on dqnains DO,Dr,
D2 andD,
thenthe following
expressionis
not
neaningfulo3="u6.,'
%r:
(*)because
projection
of
R over colunns O and3 results
in
a
binary
relation
wttich obviously does
not
havea
col-rnn correstrnndingto
3
(sinceafter
theprojection the
colturmsare
renunbered) .To
eriminate the ordering
on colunns, anong others,tsy-7g
,
Asu-791made
alternative
definitions
for
relations.
They regardeda relation
on aset
of
attribr'rtes
asa set
of
tuples
r+fiere eachtuple
is
a mapping frcmthe set
of
attributes
to
the set
of
values
(attributes
are
consideredto
be
labels,
i.e.
nanes). Tr:plesare
thensets
andtheir
elementsare
notordered. (rncidently
,
a misuseof
notation
repeatedly appearsin
theliterature
adopting srrcha
definition:
the
attribute
(i.e.
a
nane)is
ccrmnonry used
also
to
denotethe set
of its
corresSnndirg values.)The
relational
database rrork has overlookeda
nr-unberof
otherpoints.
I{ewi1l,
hqrever, merelylook
at
tr+o npre fundanental rreaknessesthe
first
probJ.enis in
the
designof
relational
databases. Ihedependencies wtrich
exist
onthe
data cannot be expressedin
relational
algebra.
Thereare various
tllpesof
such detrnndencies;nanely, functional,
multivalued
andjoin
dependencies.the
database designer shouldlook
for
all
types because these dependenciesplay
majorroles
in
the
maintenanceof
consistenry
(that
is,
correctness)ard
in
reducingthe size
of
thedatabase.
the
studyof
data dependencies hasresulted
in
productionof
a
great
deal
of
literature
wtrichis
perhaps more thanthe
literature
onother
aspectsof
relational
databasetheory. Yet,
no suggestionis fully
satisfactory;
[BB-79 and anong manyothers
Fa-77,
Bi-7g
and Fa-81J.secondly,
the
traditional
theoryof
relational
algebra asstrnesthat
the
data hasa tabular
form,
and,therefore,
makes noprovision for
conputablerelations.
For exanple, sup[Dse,in
a
ccmpany,the salary of
each enployeeis
determinedsolely
byhis
age,say:
15o tfuneshis
age. Althoughthis
relationship
is
simple andstraightforward,
it
sti11
hasto
be vierrred as
a
table.
[tb, therefore,
arguethat
the
relational
approachis
limited
becauseit
enforcesa
particular
nrethodof
representationfor
relations.
fn
chapter W
roewill
discussthe
weaknessguery languages
for
e:rpressingcertain tlpes of
of
relational
databasegueries.
Trends
in
Ccrnputer ScienceSince
the
early
195o's l*tenthe
first
high
level
programning languageswere
invented,
the
searchfor
npre pohrerful larguages hassteadily
continued.the
nunberof
existing
languagesis
solarge
that
the introduction
of
a
newlarguage
gives
no inrnediate causefor
interest.
rhe
reason, perhaps,is
that
the
sirnilarities
between these languages aregreater
thanthe differences.
ttu-8!
indicates
that
thesedifferences are
usually
superficial,
whereasthe
similarities
are
fundamental.At
a
certain
leve1of
abstraction
all
conventional progranning languages
are the
saune:-
they
havethe
inherent
defect
of
havingthe
von lbunann ccmputer astheir
commcn conceptualorigin.
tBa-7gl-
they are sequential
andimperative
(dueto
the
natureof
the
machine), and assigrrnentis
their
primary operation.-
they
cannoteasily
relate
to
conventional mathematics becausetheir
variables
are
not
static
within
their
scopes-
abovea1l,
reasoning about correctnessof
progranswritten
in
theselarguages
is difficult
andoften
totally
irnpractical.
E.
t€wfn
an attemptto
remedythe
crisis,
conscious CunputerScientists
havetried
analysis
of
progranuning languages.ft is
now widelyusefully
anployedfor
the past
decade, mathsnaticallyformalise
the
art
of
design andaccepted
that
mathernatics,particularly logic
can bethe
studyof
ccrnputer languages.Ashcroft
and Wadqe[AI+-79], howrever,
distinguish
tr+opoints
of
view onthe application
of
mathsnatics:
one
which sees
mathematics asa
tool
to
describe,
to
nrodel
or
to
analyze progranming languages (thedescriptive
role),
andthe
other,which sees mathsnatics
as "playing
primarily
anactive
role to
discover
the
principles
on r,rhich new languages andfeatures
should be based,, (theprescriptive role).
Production
of
a
vast
anountof
highly
sophisticated mathsnaticsfor
thedescription
of
languages suchas
pL/r
is
an exanpleof
approachwith
the
first
point
of
view,
vtrile
scrneother
languages, suchas
pRotoG [Ko_79],I*tcid
[Aw-77] and sASLtft-811
are
based onnathsnatics
in
a prescriptive
role.
rn
this
thesis
're
take
the
secondpoint
of
view anddefine a
guerylarguage based on
the
notation
usedin
conventional mathematics.this
method hasthe
advantageof
making useof
the
userrs mathematical knowledgerather
than
introducing
neht concepts wtrichare inherentry
ccrnplicatedand often
counter-intuitive
.F.
l,lrtivation
for
Vargathe
perslnctives
in
the
develotrmentof
varga can be suumnarised as:1
.
the
query language shouldprovide
the tools
by wtrichthe
user can getthe
best
results
frcrnthe
databasewithout
havingto
go through massesof
printout.
2
.
A meaningful, easyto
understand andrigorous terror analysis'
should be mandatorywith
any cornputer language .3
.
the
user should havethe
simplestview (rftich
is
obviouslythe
natural one)of
the
data. the
user shouldnot
bereguired
to
know anythingabout
the
internal
representation and implernentation technigues.4
.
the
language should be independent frcrnthe data
structure: the internal
data nrodel should
not
affect
the
language.5
rb
avoid
dealing
with
the naturar anbiguity
of
rrcrds,
the
useof
mathernatics rr'hich has
the
same meaningin
all
contexts
is
preferred.Correctness can be proved
in
formal
systens.6
. It is
to
the benefit
of
the
user
if
the notation
of
the
languageis
simple.
!4ost usersare
faniliar
with
conventional mathsnaticalnotation.
Itris
notation,
in
addition,
has several hundred yearsof
testing
anddevelotrment behind
it.
7
.
Finally,
funplementationof
the
wtrolespecified
language must be possible.Chapter
II
AN C'\iERVTEh' OF
l]IE
IAI{GUAGEThis
chapter contains aninformal
introduction
to
the
majorconcepts and
constructs
of
Varga.A. objects
in
the rear
r.,.orrd andobjects
in
u
databaseD<tensional
objects
can beloosely
defined asfollows:
conceptual,static
objects,
suchas
integers,
whichare invariant with time,
place orcontext
etc.
rntensional
objects,
onthe other
hand,are objects
of
the
realworrd
ufiich are
capableof
changing andyet
in
essence rernainingthe
sane .
For exanple, TEMPERAIURE
is
intensional, but
TEuPEnAIIJRE AT ASPECIFIC
TruE
is
extensional assuning that
rre know nfiere and howthis
varue wasobtained.
(tvbntague's exanple:"!he
tanperatureis
90 andrisirg,,.)
rn the
field
of
databases,particularly
relational
databases,
objectsare
coded upintensionally. Let
us considera
simple databaseconsistinq
of
couRsEs
COURSE COT]RSE-LECTURER EXAII'INER zNH{ARKER ttMathstt
"Phys"
"John"
ttJacktt
ttJack"
ttMarytt
"Bi1 I "
'Pam'
tiote that the
colunn headings,
called relation
schemes, areintensional
objects
andthe entries
to
the
columnsare
extensional objects.Ihere
are
'functional
dependencies'r*rich
cannot
be
expressed
in
therelational
form.
rheyare usually given
in
the foltowing
form:Although LECIURER
'
EXMfNER and 2IIH{ARKERdiffer
intensionally,
they
range
over a
commondonain,
nanelythe set
of
all
instructors.
the extensional
atrproachis to forget
aboutdifferent
intensionalcharacteristics
of
the instructors,
andto
deal
with
thgnjust
as membersof
a
set
(of instructors).
the relationship
betweenthe
extensions roouldthen be expressed
by functions
mapping thesesets
together.lecturer-of
It is
importantto
notethat
rcourses'
is
novr adifferent
frcmthe
intensional
object
COURSEin
theset,
andthat
it
is
relational
atrproach.Let
us
look
at
another example.rn
the
relational
approach,if
r,rc wantto
include
the 'prerequisites
of
the courses'
in
the
database we haveto
introduce
a
nehrrelation
R on couRSEand
pRE-REeursrIE. (we can extendthe relation
COIIRSESto
include
PRE-REQUISITE,but
it
rrcu1dnot
satisfy
thenormal forms
of
Codd. See [Co-71].)"Phys-II"
"Maths-rr"
"Phys-IrI"
"Phys-r "
"Maths-r"
"Phys-If"
instructors
2nd-marker-of
Ihe
hiddenfunctional
dependency containedin
Ris
madeexplicit in
thefollowing figure
COURSE
D<tensionally, the entries
set; that
is
rcoursesr. AnPRE-REQUISITE
to
both col:mnsof
relation
R belongextensional representation
is
to
oneprerequisites-of
([€
have used doublelines
to
indicate
that
for
everycourse
any nunberof
prereguisites
is
possible.)
As
before, 'courses'
is
a
set,
andtprereguisites-ofr
is
a
function which mapsthe
elenrentsof
rcoursesrto
sets
of
elernentsof
rcourses'.Databases as Algebras
Drring the past
decade algebra has ernerged as a prcrnisingtool
for
the specification
of
a
nunberof
concepts,in particular,
abstractdata
tlpes
[zi-75
,
Gr-77,
Grw-78]. Algebraalso plays
a
fundamentalrole
in
the
designof
our
language.l€,
therefore,
start this
section bystudyirg
algebras andthe
algebraic
approach.An algebra
is
definedas follor.rs
[HMT-71] :"B1z an
algebra
(or
analgebraic
structure)
roe understanda
pair
IF(Are)where A
is
a
non-entptyset
and Qis
a
function
wtrichcorrelates
with
everyelsnent
i of its
dcrnaina
finitary
operationei,
of
positive rank,
on andto
elernentsof
A."
For a moredetailed
definition
of
algebrassee
[MB-79 ,cH-781.
l4any
sorted
algebras have beenparticularly
funportantin
both thepractice
andthe
theory
of
the specification
of
abstract data t1pes.
rhe
prirnary aim
in the
abstract data
tlpe
specification
is to
(precisery)describe
a data type
independentlyof
any representationof its
data objectsand independently
of
any irnplenentationof
the
operations.
trwt{-7gl
A