Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
5-6-1985
Fuzzy data and logical data base design
Karen Steelman
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].
Recommended Citation
FUZZY DATA
Rochester Institute of Technology
School of Computer Science and Technology
Fuzzy Data and Logical Data Base Design
A thesis submitted in partial fulfillment of
Master of Science in Computer Science Degree Program
By: Karen Steelman
Approved By:
Chris Comte
Professor Chris Comte, Advisor
Jeffrey Lasky
Professor Jeffrey Lasky
Rayno Niemi
Title of Thesis: Fuzzy Data and Logical Data Base Design
I, Karen Steelman, heret,y deny permission to the Wallace t'1emorial
Library, of Rochester Institute of
Technology,
to reproduce my thesis in
whole or in part.
Date: t1ay 24.
1985
Signature:
Karen Steelman
TABLE OF CONTENTS
1.
INTRODUCTION
2.
FUZZY DATA
2.1
INTRODUCTION
2.2
EXAMPLES
OF FUZZY
DATA3.
CRITERIA
FOR EVALUATION
OFLOGICAL DATA BASE
DESIGN3.1
INTRODUCTION
3.2
CRITERIA FOR THE DESIGN
3.3
CRITERIA FOR
THEDESIGN
PROCESS4.
APPROACHES
TO DESIGN OFLOGICAL DATA MODELS
4.1
INTRODUCTION
4.2
ASSOCIATIVE DATA MODEL
4.2.1
ASSOCIATIVE DATA MODEL
-DESIGN
4.2.2
ASSOCIATIVE DATA MODEL
-DESIGN PROCESS
4.3SEMANTIC
4.3.1
SEMANTIC
4.3.2
SEMANTIC
RELATION
DATA MODELRELATION
DATA MODEL-DESIGN
RELATION DATA MODEL
-DESIGN PROCESS
4.4
VETTER
/
MADDISON RELATIONAL MODEL
4.4.1 VETTER/
MADDISON RELATIONAL
MODEL-DESIGN
4.4.2
VETTER/
MADDISON RELATIONAL MODEL-DESIGN PROCESS
5.
THREE LOGICAL
DATAMODELS
OFTHE
GENEALOGICALEXAMPLE
5.1
INTRODUCTION5.2 NATURE OF GENEALOGICAL
DATA
5.3 GENEALOGICALEXAMPLE
5.4 ASSOCIATIVE DATA MODEL
5.4.1
ASSOCIATIVEDATA MODEL
-GENEALOGICAL EXAMPLE
DESIGN
5.4.2 ASSOCIATIVE DATA MODEL-GENEALOGICAL
EXAMPLE
APPLICATION STATE5.5
SEMANTICRELATION DATA MODEL
5.5. 1
SEMANTIC RELATION DATAMODEL
-GENEALOGICAL EXAMPLE DESIGN
5.5.2 SEMANTIC RELATION DATA MODEL-GENEALOGICAL EXAMPLE
APPLICATION
STATE
5.6
VETTER/
MADDISON RELATIONAL MODEL5.6.1
VETTER/
MADDISON RELATIONAL MODEL-GENEALOGICAL EXAMPLE
DESIGN5.6.2
VETTER/
MADDISON RELATIONAL MODEL6.
EVALUATION
OF THREE
LOGICALDATA MODELING
APPROACHES6.1
INTRODUCTION
6.2
ASSOCIATIVE
DATA
MODEL6.2.1
ASSOCIATIVE
DATA
MODEL-DESIGN
6.2.2
ASSOCIATIVE
DATA
MODEL-DESIGN
PROCESS
6.3
SEMANTIC
RELATION DATA
MODEL6.3.1
SEMANTIC
RELATION DATA
MODEL-DESIGN
6.3.2
SEMANTIC
RELATION DATA MODEL-DESIGN PROCESS
6.4
VETTER
/
MADDISON RELATIONAL
MODEL6.4.1
VETTER
/
MADDISON
RELATIONAL MODEL-DESIGN
6.4.2
VETTER
/ MADDISON
RELATIONAL MODEL-DESIGN PROCESS
7.
CONCLUSION
APPENDIX A
APPENDIX
B
APPENDIX
C
CHAPTER
ONE
INTRODUCTION
The
goal ofthis thesis is
to gainunderstanding
ofthe
conceptualization phase of
the data
basedesign
process forfuzzy
data
andto determine
characteristics ofthe data
base design process which areparticularly
appropriate forderiving
the conceptual model forfuzzy
data.Several
potentially
useful design methodologies will be evaluated.Genealogical
researchdata
will be used as an example offuzzy
data.The data base
design process hasthree
phases:(1)
analysis,(2)
conceptualization and
(3)
accomodation(Buchrnarm
and Dale 1981). The conceptualization phase is the focus of this thesis. The analysis phase and the accomodation phase will not be discussed.The purpose of the conceptualization phase is to describe the logical
structure of the information of interest
by
means of a data model. Other namesfor
thedata
model are 'conceptualmodel' or 'logical
data
model'
For
the purposes of this paper 'logical data basedesign'
refers to the
derivation of the logical data model.
The
derivation process for the datamodel should not be influenced
by
the physical storage of thedata,
data accessby
a datalanguage,
or usage of the data in a specific application.Therefore,
physical storage, datalanguages
anddata
usage will notbe
discussed in
this paperThe
proposed approachto
the problem ofderiving
alogical data
model includes these activities:(1)
describefuzzy
data(2)
identify
criteria for a'good'
logical data model
(4)
derive logical data
modelsusing
each ofthe
approachesdescribed
in(3)
to model a representative subset ofgenealogical research
data
(5)
evaluatethe
processesdescribed in
(3)
andthe
datamodels
derived
in(4)
using
the criteria identified in(2)
(6)
summarize results of(5)
to
identify
criteria for logicaldata
base
design methodologies forfuzzy
data.The
following
considerations are beyondthe
scope ofthis
thesis:(1)
exhaustive evaluation of
existing
methodologies,(2)
recommendations ofmodifications to
existing
methodologies to meet the criteria with thepossible exception of the selected techniques and
(3)
derivation of a newCHAPTER TWO
2.1
FUZZY DATA
Webster's New
CollegiateDictionary
defines
'fuzzy' as'indistinct'
or
'not
clear'. In the data base environment'fuzzy'
refers
to
data
whichmay
be
(1)
imprecise
(Babad
andHoffer
1984,
Bouille
1978,
Gaines
1981,
Gray
1961), (2)
inapplicable(Bouille
1976,
Kambayashi et al.1982,
Vassiliou
1979,
Wiederhold
1983), (3)
incomplete(Babad
and Hoffer1984,
Kambayashi
et al.1982,
Vassiliou1980), (4)
inconsistent(Babad
and
Hoffer
1984,
Bouille
1978,
Vassiliou1978,
Vassiliou1980)
and(5)
not
known
(Babad
and Hoffer1984,
Bouille1978,
Gaines1981,
Gray
1981,
Kambayashi
et al.1982,
Vassiliou1979,
Vassiliou1980,
Wiederhold1983).
A
model ofthe
real world needs to reflect the inexactness offuzzy
data. The
type offuzziness
associated with data provides a context inwhich to interpret and to use the data. Often the 'null
value'
has been used
to represent all of these cases.
'Null'
means 'none'
or 'not any'.
Information is lost when a single
descriptor,
'null',
is used for all ofthese
cases as there is no means to distinguish between the different types of
fuzzy
data.Much of the literature relative to null values and
fuzzy
data inthe data base environment is oriented to physical implementation
considerations, not logical design methodology. A computer search of
the
literature
did
notidentify
articles of value inselecting
alogical data base
design
methodology
forfuzzy
data. Logical data base design is recognizedas an element indispensible
to
the success ofdata base
systems.The
absence of design techniques
to apply
when data arefuzzy
indicates that
This chapter considers five types of fuzziness which
may
beassociated with
data. Examples
of each type aredescribed. The
designimplementations
carried outin Chapter Five include data
with these types2.2
EXAMPLES
OFFUZZY DATA
The five types
offuzziness identified in the first
section ofthis
chapter arefound
whenthe
data ofinterest include dates
or names.The
mostinformative
representation of adate is
year, month, and day.IMPRECISE DATA
Assume that
it is possibleto determine that
the date ofbirth
falls
within a range of years.The
year ofbirth is
notprecisely
defined.
The
exact year ofbirth
cannot bedistinguished
inthe
known range. Therange is
the
most preciseinformation
avaliable.The
year ofbirth
existsbut is imprecise. The information that the
year ofbirth is included in
therange of years should not be
lost.
INAPPLICABLE DATA
The
value of a middle namefor
anindividual
whohas
no middlename,
only
a single given name and asurname, is
an example of inapplicable data.INCOMPLETE DATA
If the
year and month ofbirth
areknown,
the birthdate is
incomplete.
The
day
is
not known.The
informationthat the birth took
INCONSISTENT
DATA
If
different
sources give different dates for the date of birthand
it is
not possibleto
identify
the more accuratesource, the
values forthe date
ofbirth
areinconsistent.
It
wouldbe helpful to have
both piecesof
information
available.This
preventshaving
lessinformation
at handthan,
in
fact,
isknown.
UNKNOWN DATA
Assume
that the
year,month,
andday
ofbirth
are unknown. Theinformation
exists(the
individual does
have abirth
date)
but
the value isCHAPTER THREE
CRITERIA FOR
EVALUATION OF LOGICALDATA BASE
MODEL3.1
INTRODUCTION
There is
no one optimalway
to derive
alogical data
model.Selection
of adesign
processfrom
whichto
derive thelogical data
modeldepends
on(1)
area ofapplication,
(2)
problemstructure,
(3)
expertise ofdesigner
and(4)
preference of user(Falkenberg
1983). The
area ofapplication and
the
problem structure areparticularly
relevant considerations forthe
fuzzy
data which are associated with thegenealogical example referenced in
this
paper Criteria should beidentified to
evaluatethe
selected design process andthe
resulting
logical
data
model.The
design
processhas
an impact on thequality
of theresulting
data
model.The design
processdescribes
how the perceptions ofthe
environment are translated
into the
final logicaldata
model. A process which permits assumptionsleads to inconsistencies
andmisrepresentations
in the logical data
model.This is
particularly
true
asthe
environmentbecomes
more complex,i.e.
as the number of objects andinterrelationships increases. An
incorrect data
modelmay
resultfrom
an erroneous translation of correct orincorrect
perceptions ofthe
environment.
In
addition, an error-freetranslation
ofincorrect
In
this Chapter
criteria are identifiedfor
the evaluation of adesign
process andthe final
logical data model.The
criteria are used inChapter Six to
evaluatethe design
processes and thefinal
logical data3.2
CRITERIA
FOR THE DESIGN
The
following
criteria are usedin Chapter Six
to evaluate thelogical data designs derived
by
application ofthe
processes describedin
this
paper:(1)
convenient(2)
expressive(3)
minimally
redundantCONVENIENT
A convenient design is
easy
to use.The
userhas
easy
accessto
the data. The
rightdata
are availableto the
right people atthe
righttime.
The
design can be understoodby
all users.The design
isclear,
notambiguous.
Data
are represented in simple andeasily
comprehensibleform.
Non-essentials
are removed.No
unnecessary complexity
is introduced.The
number oflogical
constructs is minimal(Atre
1980;
Biller
andNeuhold
1977;
Borkin1980;
Bracchi
1983;
Curtice andJones
1982;
Date
1981;
Deen1981;
Kahn1982;
Martin
1961b;
Nijssen1977;
Robinson1981,
Smith and Smith1982,
Ullman
1980;
Vetter andMaddison
1981).
EXPRESSIVE
depends
onthe
objectivesto be
met.A
correctdesign
conformsto fact
and
to truth.
It
is error-free. A design is accurate ifit
reflects theessential properties of
the data.
It is precise and exact.An
expressivedesign is
notconfusing
or misleading.The design is
meaningful.It is
effective
for the
user Thedesign is
called'natural"
if
the
descriptionsare meaningful
to
the user.It
is consistent.The
design is described interms
which arefamiliar to the
userThe
design has simplicity.There
are a reasonable number of constructs, a limited number of ways
to
express
the
samething. The design is
not overloaded,that
is,
the
sameconstructs are not used
to
specify
toomany
different kinds of objects.The design is
complete(Atre
1980;
Biller
and Neuhold1977;
Bracchi1983;
Curtice
andJones
1962;
Date1981,
Deen
1961,
Falkenberg
1983;
Kahn1982;
Martin
1961a;
Martin
1981b;
Nijssen
1977;
Robinson
1981,
Smithand Smith
1982;
Vetter and Maddison 1980).MINIMALLY
REDUNDANTRedundancy
may
occur for data items or for associationsbetween data
items.Redundancy
may
resultin
multiple states ofthe
samedata.
The
several statesmay
be
inconsistent. An attribute which isderived from
other attributes represents oneform
of redundancy.This
type
ofredundancy
can be avoidedby
deriving
the
attribute when itis
needed rather than
by including
the
derived attributein the logical data
model.
For
example, ageis
an attributederived from the
currentdate
andthe
date
ofbirth.
Age
will notbe
correct unless new calculations aremade as
the
years pass.If
the date of birth is part of thelogical data
model,
age canbe
calculated as needed.If facts
occuronce, the design is
consistent
(Atre
1980;
Bracchi1983;
Curtice andJones
1962;
Date
1981;
3.3
CRITERIA
FOR THE DESIGN PROCESS
The
following
criteria are usedin Chapter Six
forthe
evaluationof
the three
selecteddesign
processes:(1)
complete(2)
consistent(3)
efficient(4)
error-free
(5)
implementation
independent(6)
precise(7)
theoretically
sound(8)
useableCOMPLETE
A
completedesign
processis thorough.
Careful
attention is paidto details. The steps of the process are described in a
way
such thatthere
is
no reasonto
omitany
step orto
make assumptions.The
processis
comprehensive, that
is,
it can be used to model a range of data problems.It is not limited to a specific problem environment. A
variety
oftypes
ofdata can
be
modeled. Redundant data can beidentified
and adecision
madeabout its inclusion in the design. A complete
design
process results in anaccurate
design,
a model of the real world(Bracchi
1983;
Curtice
andJones
1982;
Hubbard
1981,
Nijssen1977).
CONSISTENT
If
a design process is consistent, thereis
agreement asto
whatrule or usage should be followed
for
a givenstep in the
process.a given problem which are equivalent, not
necessarily
identical.Consistency
in the design
process resultsin designs
which are notambiguous.
The
classification of entities shouldbe
consistent. Objectswhich are similar
in
someway
canbe
classifiedappropriately
and objectswhich are not
identical to
each other can bedifferentiated
as necessary.Special
attention shouldbe
givento the
classificationboundaries
so that agiven
entity
is
classifiedin the
sameway
eachtime it is
consideredin
theprocess.
Ambiguity
should notbe
introducedthrough inproper
choice ofwords. It
is important
to use the same word or phrasefor
the sameentity
or association each
time it
is referenced inthe
process(Curtice
and Jones1982;
Date 1981).
EFFICIENT
The
process should use resources efficiently.Time is
needed tounderstand the process as well as to implement it. The number of
iterations
necessary
to produce the design should be as few as possible.The
time needed for each iteration shouldbe
minimal(Bracchi
1983;
Curtice
and Jones1982,
Date
1981;
Hubbard1981;
Robinson 1981).
ERROR-FREE
The design process should
be
free of errors.The
process shouldnot introduce errors into the data model. The process
has integrity. The
details of implementation of
the
process are consistent and notcontradictory. The logical data model should not be a false
description
ofthe
environment as a result ofincluding
orexcluding
inappropriate
objectsor relationships when
following
the design process(Curtice
andJones
IMPLEMENTATION
INDEPENDENT
Development
ofthe
logical data model shouldbe
completed with noconstraints
imposed
by
prior selection of an implementation method forthe
physicaldata base. The
processis
independent of hardware andsoftware considerations.
The
result ofthe
design process is a logicaldata
model whichdescribes
the environment ofinterest to the
user. Thisdesign is
useful even if a change is made in the implementation method(Atre
1980,
Borkin
1960;
Date
1 981,
Martin1981b,
Nijssen1977;
01 le1983;
Robinson
1981;
Smith
and Smith1982;
Vetter and Maddison 1981).PRECISE
There is
nothing
vague orinexactly
defined in a precise process.The
rules and their usage are clearThere is
noambiguity
about how toproceed from the
beginning
to the end of the process(Nijssen
1977;
Vetterand Maddison 1981).
THEORETICALLY SOUND
A
design processis
theoretically
sound if it is based onscientifically
acceptable principles. Examples of such principles includeset
theory, functions,
graphtheory
and normalization(Date
1981;
Vetter
USEABLE
A
useablemethodology
is
(1)
easy
to understand,
(2)
easy
to
use,(3)
simple and(4)
straight-forward.A
process whichis
easy
to
understand
is
clear. Itis
notconfusing
ormisleading
to the designer. Useof graphical or other visual representation
may
increase clarity. Conceptsand
terminology
areclearly
defined.
Thevocabulary
is
notexcessively
large. A
process whichis
easy
to useis
not tedious. Ease of use isrelated
to
familiarity
with other processes. A more complex processwhich resembles a
known
process will seem easierto
use than it would beif it were the
first
process learned. The process can be modifiedto
fit aparticular need of
the
application withoutdestroying
its
integrity.A
simple
methodology
is
not complex.The
number of constructs and rules isnot excessive. A straight-forward process is free of
unnecessary
elements, is easier to understand and is easier to use.
The
number ofpartitions and the number of levels of detail are adequate to model the
data
but not so great that the designer becomes confused(Biller
andCHAPTER FOUR
APPROACHES
TO DESIGN OF LOGICAL DATA MODELS
4.1
INTRODUCTION
There
are a number of approachesto
deriving
alogical data
model.Some
appearto have
potential forrepresenting
fuzzy
data,
in thischapter
three logical data
models arediscussed. The
process neededto
develop
each model is alsodescribed.
An application state of each logicaldata
modelis
presentedin
Chapter Five. Thelogical
data models anddesign
procedures are evaluated in Chapter Six.The three
approachesselected are
(1)
Associative
DataModel, (2)
Semantic
Relation Data Modeland
(3)
Vetter
/
Maddison Relational
Model.The
AssociativeData
Model wasdeveloped
to provide the designerwith a technique
for
creating
logical
structuresto define the
environmentof
interest.
Particular attention is givento identification
ofthe
important data items and
to
defining
thern
at an appropriate level ofdetail. The notation of
the
Associative DataModel
permitsidentification
of optional variables and specification of domain types. These
features
may
be
helpful whenmodeling
fuzzy
data.The
Semantic RelationData
Model was selected as representativeof semantic models.
The
structure of the semantic model is suchthat
terminology
which is relevant to the applicationbecomes
part ofthe data
model. Constraints are an essential part of
the data
model.The
effect ofVetter
andMaddison
have designed a systematic procedure forderiving
alogical data
model.The
resultis
atleast
one list ofnon-redundant
elementary
relations.The design
canbest be described
as a special case ofthe
'relational model'. Forconvenience, in
this paper the modelis
referredto
asthe
"Vetter/
Maddison
RelationalModel'. This
method was selectedbecause
itis
possible thatsome,
if notall,
of thefeatures included in the
processmay
beespecially
usefulin
modeling
4.2
ASSOCIATIVE
DATA MODEL
4.2.1
ASSOCIATIVE
DATA MODEL
-DESIGN
The
Associative
Data Model is
a set of assertiontemplates
expressed
in the
special notation ofthe
model.The
assertiontemplates
describe the logical
structure ofthe
environment ofinterest to the
user.Additional information
is included in the key-level overview and the dataelement
definitions.
An
entity
is
an object of interest.An
entity
identifier
isassociated with each entity.
For
example:ENTITY
ENTITY
IDENTIFIERtruck
T 1 2345
The
entity
identifier
belongs to adomain
oftruck
identification numbers.Data
elements arebasic
unitsin
an assertion template.They
are arepresentation of the
entity
identifier in an assertion template. A dataelement includes
the
name of avariable,
a symbol, and the connectionto
another element in an assertion
template.
Adata
elementtakes its
valuesfrom
a particular domain.For instance:
nameof variable vehicle
symbol
connection
Data
elements are groupedto become
assertiontemplates. For
example:data
elementsvehicle
SERIAL NUMBER
assertion
template
VEHICLE
SERIAL
NUMBER
Typical
values of vehicle aresedan,
stationwagon,
truck,
or van.Each
assertiontemplate
containsdetailed information
about one simpledata
element calledthe
key
or subject of the template. A particularvariable name
may
appearonly
once as akey
inany
design.The
symbolfor the
key
is
The
key
appears onthe left
side of aconnection
in
a template. Thebar
at the right end ofthe
symbol connect?the
key
to
the remainder of the assertion template.All other data elements in
the
template aretargets
or objects.Targets
appear onthe
right side ofthe
connectionin
atemplate. The
symbol
for the
target includes a bar atthe
left
endto
connectthe target
to the
rest ofthe
template. There are twotypes
oftargets,
associatorsand non-associators.
An
associatoris
a cross-referenceto
thekey
ofA
non-associatordoes
not referencethe
key
of another assertion template.The
symbolfor
a non-associatoris
l
)
|n the exampleVEHICLE
SERIAL NUMBER
VEHICLE
SERIAL
NUMBER
is the
key
is an associator
A
vertical line separates the subject(vehicle)
from the target(serial
number). An
interpretation
of the template is that a particular vehicle hasa serial number Serial number is identified as an associator because it is
the
key
of another template. If serial number were not thekey
of anothertemplate,
serial number -would be a non-associator and the template would beVEHICLE
SERIAL
)
\ NUMBER)
The
determination of whether adata
element is an associator or anon-associator
is
madeby
the designer after analysis of the environment of interest.Enhancements
to the
target symbols provide additionalindicate that
morethan
one value of the data elementis
possible. Thesesets are unordered and
do
not containduplicate
values.Any
of the target;may
be
mandatory
or optional.The
value of an optionaltarget
does nothave to be included.
A value of amandatory
target
must be present foreach value of
the key.
If the target
isoptional,
a small circle appears onthe.
horizontal line
which connects the symbol to the template.For
example:
VEHICLE
Le-
SERIAL NUMBERThis template indicates
that each vehiclemay
have upto
one serialnumber.
Any
target
which is not optional is mandatory. For example:VEHICLE
SERIAL NUMBER
This is to
say, each vehicle must have a serial number.Assertion templates are classified as primitive or compound. A
primitive assertion template represents a relationship between the
key
and a simple data element.For
example:VEHICLE SERIAL
NUMBER 1 TinriPT.
\ MUHEEK
A
compound assertion template results when multipledata
elements areFor
example:SERIAL NUMBER
MODEL
NUMBER
-e- LICENSE
PLATE ID
s.
-6- OVHER
NAME
COLOR
The
subject(serial
number) appears to the left of the vertical line.The
targets
(model
number, license
plateid,
ownername,
color) appearto
theright.
When
morethan two data
elements are part of onerelationship,
the
key
and targets form a cascade. The vertical line appears at the lowercenter rather than at
the
right centerto
increasereadability
of thetemplate.
For
example:COUNTRY
(
1,
STATECOUNTY
A vertical line identifies a subject. Each subject has a
relationship to
theFor
example:SUBJECT
country
country-state
TARGET
state
county
This
assertiontemplate indicates that
acountry
is made up of states. Astate
in
acountry
is
madeup
of counties. A particularcounty
depends onthe
combination ofcountry
and state.This
combination(country
andstate)
is
asecondary
subjectin
arelationship
with county.The Associative
DataModel
requires that each associator apprearsas
the
key
of an assertion template such that the inverse of the firstrelationship is
expressedby
the second relationship. This is known as theAssociator Reversal Rule. For
example:ORIGINAL RELATIONSHIP
KEY
TARGETstate county
STATE
COUNTY
TEMPLATE CREATED
BY
APPLICATIONOF
ASSOCIATORREVERSAL RULE
KEY TARGET
county
stateCOUNTY
<
STATE
For
example:ORIGINAL TEMPLATE
KEY
TARGET
state
county
state-county
town
STATE
A ,
'
COUNTY
7
A TOWN
TEMPLATES
AFTERAPPLYING
ASSOCIATORREVERSAL
RULEKEY
TARGET
county
statetown state
COUNTY
A .
STATE
TOWN
STATE
COUNTY
The
COUNTY-STATE template indicates that several statesmay
have acounty
with a given name.The assertion templates are summarized in a key-level overview.
All
keysin the
design are shown. The relationshipbetween two keys is
indicated
by
a lineconnecting
thecorresponding boxes
inthe
overview. Aone-to-one.
An
arrowhead atthe
end ofthe
line
(
>) signifies aone-to-many
relationship. Amany-to-many
relationship is
describedby
(<
>).For
example:ORIGINAL
TEMPLATE
key
associator
non-associator
serial number
model number
license plate id
owner name
color
SERIAL NUMBER
MODEL
NUMBER
f~\ LICENSE PLATE ID
A OVNER
NAME u
7
-f COLOR V
)
J
KEY-LEVEL OVERVIEW
LICENSE PLATEID
SERIAL MODEL
NUMBER NUMBER
\
OVNER NAME
The keys
in
the design are(1)
model number,(2)
license plateid, (3)
ownerThe
relationshipsin the key-level
overview are summarized asfollows:
KEY PAIR
RELATIONSHIP
model number
-serial number
1:N
license
plateid
-serial number 1:1
owner name
-serial number
M:N
Additional information
about the design is included in thedefinition
of eachdata
element.This definition
shouldinclude
one ormore of
the following:
(1)
function
of variable named in data element(2)
description
of conditions under whichdata
elementis
optional(3)
description
of conditions under which relationship exists(4)
domain
For
example:COLOR
indicates exterior paint color of vehicle domain:
black, blue,
red, silver, whiteLICENSE PLATE ID
indicates license plate identification
for
vehicleoptional conditions: vehicle not assigned
identification
(inapplicable)
domain:
A-Z,
0-9
MODEL NUMBER
indicates name specified
by
manufacturer toindicate group
of vehicles with specific characteristicsdomain: three integers
(0-9)
followed
by
a single alphaOWNER NAME
indicates
name of vehicle owner otherthan
originalmanufacturer
optional condition: vehicle not sold
by
manufacturer(inapplicable)
domain:
names of persons or organizationsSERIAL NUMBER
indicates
unique identification for each vehicle4.2.2
ASSOCIATIVE
DATA MODEL
-DESIGN PROCESS
The
design
processfor the Associative Data Model
is:(1)
identify
objects ofinterest
(2)
identify
relationshipsbetween
objects(3)
draw
assertiontemplates
(4)
apply
Associator Reversal Rule
(5)
draw key-level
overview(6)
writedata
elementdefinitions
(1)
IDENTIFY OBJECTS
OFINTEREST
All
ofthe
objects ofinterest
in the environment of interest needto be identified. Objects
which are the same object but are calledby
different
names shouldbe treated
as a single objectin
the design.Conversely,
objects which are calledby
the same namemay
be differentobjects and should be treated as such in the design. Each object has a
unique name. Compound objects are divided
into
simple pieces and theseparate pieces are considered
for
inclusionin the design.
(2)
IDENTIFY RELATIONSHIPS BETWEEN OBJECTSPairs of objects which are related in some
way
are identified.Compound relationships are separated into simple relationships.
(3)
DRAW ASSERTIONTEMPLATESThose objects which have a substantial amount of
information to
be recorded in the data base are identified as keys of the assertion
template for that
key.Each target is
identified as an associator or as anon-associator.
The
name ofthe target
may
reflect the nature of the relationshipbetween the
key
andthe target.
Special characteristics ofthe relationship
(multiple
valued, mandatory,
optional) are identified onthe templates.
(4)
APPLY ASSOCIATOR REVERSALRULE
The
Associator Reversal Rule is applied for all associators of allof
the
templates. Multiple primitive templates with the samekey
should be combined into a compound assertion template.(5)
DRAW KEY-LEVEL OVERVIEWThe
key-level overview can be drawn after all of the assertion templates are completed. The steps to create the key-level overview are:(
1
)
represent each assertion templatekey by
a labeled rectangle(2)
represent each special associatorby
a labeled rectangle Note:A
'special'
associator
is
one which isimmediately
subordinate to thekey
and has other associators subordinate to itself.(3)
connectkey
and associator atimmediately
subordinate levelby
a lineNote: If associator
is
not connecteddirectly
to akey,
connectit to
the next higher level associator. Label duplicate lines. Label line drawn from akey
to
itself.(6)
WRITE DATA ELEMENT DEFINITIONA definition is
writtenfor
eachdata
element in each assertiontemplate.
One
or more ofthe
following
is
included for each of the dataelements:
(1)
function of variable namedin
data element,(2)
descriptionof condition under which
data
element is optional,(3)
description of4.3
SEMANTIC RELATION DATA MODEL
4.3.1
SEMANTIC RELATION
DATA MODEL
-DESIGN
The Semantic
Relation
Data Model
viewsthe data base
as sets ofstatements
describing
anexisting
application state.A
particularstatement contains
information
about an entity. Anentity
is
something
about which
information
is to be recorded,
an object or concept whichexists and can
be
distinguished.
An
entity may
be
associatedin
someway
with another entity.
An
entity
type
is the name of anentity
set.A
characteristic is adescriptor
of anentity
type. One or morecharacteristics
describe
anentity
type.
Example:
entity
entity
type characteristic characteristictruck in
my
driveway
truck
serial number color
An
entity
andits
characteristicsmay
be representedin
tabularformat.
truck
serial number color
A
characteristicis
essentialif it is
necessary
and sufficientto
define
a unique entity. Theremay
be
several statements withthe
samevalue of
the
essential characteristic.The
value ofthe
essentialcharacteristic does not
uniquely
identify
the tuple. This differs
from the
to distinguish
a particulartuple from
all others within a relation.A column-group is the
set of all columns of characteristics of anentity
type in
a relation.If
acolumn-group
includes
all ofthe
essentialcharacteristics of an
entity
type,
then
the essential characteristicname(s) are
italicized
orhighlighted in
some way. None ofthe
characteristic names of the
column-group
are italicized or highlighted ifone or more of
the
essential characteristics of theentity
ismissing
fromthe
column-group.In this
paper'_'
is
usedto
highlightthe
essentialcharacteristic.
For example, the
column-group fortruck
is made up of theserial number and color columns.
The
serial number is an essentialcharacteristic.
This
informationis
displayed as follows:truck
_serial number- color
A
domain is a set of values from which the value of thecharacteristic is taken. The null value is a part of
every
domain. The nullvalue means 'no
value'
and is represented
by
' If one characteristic ofan
entity
is null, then all characteristics of theentity
are null. Forexample, the characteristics of a truck are serial number and color. If the
color is not known
(null),
then the serial number must also be null. Associations and entities are describedby
predicate: case pairs.In languages the subject and the predicate are the two principle elements
of a sentence.
A
subject is that about whichanything
is said or done.The
predicate
is
the verb and its modifiers, object, or complement. An objectis that which
is
acted on or accomplishedby
the verb.'Case'
relation
between
a noun or pronoun and some other elementin the
sentence.
Example:
sentence
John drives
atruck.
subject
John
predicate
drives
atruck
verb
drives
object
truck
case of
truck
objectiveIn the Semantic Relation Data Model
the predicateis
a verb phrase.'Case'
is
a noun(entity)
associated withthe
predicate asthe
subject orthe
object.'Agent"
is often used
instead
of'subject'
Predicate:
agent andpredicate: object are predicate: case pairs which
describe
an association.Example:
association drive
predicate to
drive
to
drive: agent personto
drive:
object truckEach
entity
typeis
describedby
a 'be'predicate, be
entity
type:
object'.
This
special form of the predicate: case pair is usedto
state theexistence of an
entity
whether or not it is part of an association.Example:
entity
personentity
type person'be'
predicate
be
person: objectOrganized true statements form relations. A relation is specified
entities and
(4)
domains
of characteristics.Two
or more relations canbe
combined
to form
anew,
more complex relation.The
new relation hasmultiple predicates.
At least
one ofthe
predicates must not havethe
nullvalue.
TRUCK
PERSON
be truck:
objecttruck
serial number
number
x
12345
be person: object
person
name
name
John A. Smith
DRIVE
be truck: object
to drive: object
be
person: objectto drive: agent
truck person
serial number name
number name
x
1
2345y67891
John A. Smith
James B. Good
A
'be'
predicate and one or more of the predicate: case pairs in an
association form a set of predicate: case pairs in a relation. A particular
predicate: case pair cannot appear in two different predicate: case sets in
a given relation. Relations can be named for easier reference. There
cannot
be
duplicate statements(rows)
in a relation.The
orderin
whichA
relationis
displayed
asfollows:
RELATION NAME
predicate case: pair
entity
type
characteristic domain
Constraints
are an essintial part ofthe
SemanticRelation Data
Model.
They
give addedmeaning
to
the statements. Constraintsspecify
conditions which are always
true
ofthe
relational state which describes aparticular application state. Restricted and
mandatory
characteristics ofan
entity
or association are describedin the
constraints. An example of aconstraint for a
mandatory
characteristicis
'each truck must have a serialnumber"
A
restriction of the characteristic is 'each truck must have aserial number which
is
unique'
Logical statements about the application environment can be
derived from the specification of the relation.
The
interpretation of aA
relationfor
a particular applicationis:
TRUCK
be
truck:
objecttruck
-serial number- color
number black/white/red
Constraint:
serial number is uniqueThe interpretation of this relation is: 'A truck is
black,
white or4.3.2
SEMANTIC RELATION
DATA MODEL
-DESIGN PROCESS
The design
process toderive the Semantic Relation Data Model
is:1.
identify
entity
types
2.
identify
associations3.
identify
predicate: case pairs4. form
relations5.
identify
characteristics ofentity
types
6. define domains
of characteristics7
identify
constraintsThe items
ofinterest in the final data
basedesign
arethe
entity
types
andthe
associationsbetween
entity
types.
All ofthese
itemsshould
be identified
asearly
as possible inthe design
process.Each
entity
type is described
by
a predicate: case pair called a'be"
predicate.
Two
predicate: case
pairs,
predicate: case agent and predicate: case object, aredeveloped for
each association.Relations
areformed from
one or morepredicate: case pairs.
It is
usefulto
name relations asthey
are formed althoughit is
notmandatory
that
relationsbe
named.The
design processis iterative.
Review
of thefirst
four steps in thedesign
processbefore
moving
tothe final
three stepsmay
reducethe
number ofiterations
necessary
to completethe
design.The
characteristics of eachentity
type
are identified. Anappropriate label
is
selected for each characteristic.In this step it is
useful
to
considerthe
likelihood ofthe
occurence ofnull"
as a value of
the
characteristic and its effect on the use of the design.
Essential
characteristics are identified.
The
constraints arelisted. Review
ofthe
design
should include consideration ofthe
possibility
ofgenerating
duplicate
tuples.A
design which resultsin duplicate tuples
shouldbe
modified since duplicate tuples are not allowed.Entering
sampledata in
the
relationsmay
indicate design weaknessesfor the
particular4.4
VETTER / MADDISON
RELATIONAL MODEL
4.4. 1
VETTER / MADDISON RELATIONAL
MODEL
-DESIGN
The Vetter
/
Maddison Relational
Model is alist
of non-redundantelementary
relationsdescribing
the
environment ofinterest to the
user.Several
suchlists
may
be
generated.Each
of thelists
fully
describes
theenvironment of
interest. One
ofthese
models is selected forimplementation.
This
paper considersthe
generation ofthe
list(s)
ofelementary
relations.It does
notdiscuss the
selection andimplementation
of one ofthe
models.An
entity
issomething
about which information is recorded. Anoccurrence of an
entity
is
called atuple. There
are n elements in ann-tuple.
If
an n-tupleis ordered,
each element is selected from a separateset of possible values.
The first
element belongsto the first
set. The nthelement belongs to the nth set.
Each
of the n sets is sometimes called adomain. Vetter
andMaddison,
however,
usethe
term domainfor the
valueswhich occur at a particular instant
in
time. An example of a4-tuple
is:truck
20L
A579T
redwhere: element of tuple belongs to set
truck vehicle
type
20L model number
A579T serial number
red color
A
relationis
composed of ordered n-tuples.N is the degree
ofthe
relation. A relation of degree n can be displayed as a table with n columns.
relation name and
A(N)
is
a set.An
example of a relation is:VEHICLE(TYPE,
MODEL
ID,
SERIAL
NUMBER,
COLOR)
VEHICLE
TYPE
MODEL ID
SERIAL NUMBER
COLOR
The
project operationforms
a subset of the original relationby
extracting
specified columns ofthe
relation andremoving
duplicate rows.For
example:Original
Relation
TYPE MODEL
ID
SERIAL NUMBER COLORtruck
20L
A579T redvan 50M D432V blue
sedan
35N
G771S
black
truck
20L
B688T greenApply
project operationto
vehicletype
TYPE
truck
van
sedan
Two
relations canbe
joinedto
create a new relationin
which eachtuple
is
formedby
concatenating a tuplefrom
each ofthe
originalrelations. Each of
the
two tuples whichis
joined satisfies a specifiedFor
example:Original Relations
VEHICLE
lYHt
MODEL ID
SERIAL NUMBER
truck
20L
A579T
van 50V
D432V
sedan
35N
677 IS
truck
20L
B688T
MILEAGE
MODEL
ID20L
50V
MPG
27
24
Apply
join
operationto two
original relations whenMODEL ID's
areequal
VEHICLE MILEAGE
TYPE
MODEL ID
SERIAL NUMBER
MPt
truck
20L
A579T
27
van
50V
D432V
24
An
elementary
relationis
a relation which cannotbe
reducedinto
relations of smaller
degree
by
using
the
project operation suchthat the
original relation
is
recreated whenthe
relationsresulting
from the
use ofAn example of on
elementary
relation is:VEHICLE
-MODEL
TYPE
MODEL NUMBER
truck
20L
van
sov
4.4.2
VETTER /
MADDISON
RELATIONAL
MODEL
-DESIGN PROCESS
The design
processdescribed
by
Vetter
andMaddison includes the
following
phases:(I)
specify
environment ofinterest
(II)
transform
description
of environment of interest into listof
elementary
relations(III)
expandlist
ofelementary
relations toinclude
all possibleelementary
relations(IV)
reducelist
of all possibleelementary
relations to a minimalset of
elementary
relationsThere
are several steps associated with each phase.SPECIFY ENVIRONMENT
OFINTEREST
(PHASE I)
The steps to
specify
the environment of interest are:(1)
identify
entity
sets(2)
identify
properites ofentity
sets(3)
identify
domains of properties ofentity
sets(4)
identify key
domains
ofentity
sets(5)
identify
entity
attributes and express as relations withentity
set replacedby
primary
key
domains(6)
identify
relationship sets and express as relations withentity
sets replacedby
primary
key
domains(7)
identify
relationship attributes and express as a relationwith
entity
sets replacedby
primary
key
domains
An
entity
isanything
of interest tothe
user Anentity may
be
athing,
a person, a concept, an event or a relationship. It can beExamples
of entities are:TYPE OF ENTITY
ENTITY
thing
appleperson
John Brown
abstract concept
July
1,
1984event
birth
relationship
marriageEntities
ofthe
sametype form
entity
sets.Each
entity
set has a uniqyename.
For example, the
entity
'apple'
belongs
to theentity
set 'fruit'.An
entity
is identified
and classifiedby
its
properties.Properties
are named characteristics of the entity.
Some
properties of a person arename,
age and eye colorProperties
which are relevantto the
application area are selectedfor the
model. An occurrence of aproperty
of anentity
is
aproperty
value.Property
valuesfor
anentity
belong
to a specifieddomain. The
association from anentity
setto
a domainis
called anentity
attribute.
For
example, 'age'is an
entity
attributefor
theentity
set'person'.
The
domainis
'positive integers'.An
attribute which has different valuesfor
each occurrence of anentity
is
called anentity
key.If
there are severalentity
keys,
then one ofthem
is
selected asthe
primary entity
key.A
primary
entity
key
may
also be called aprimary
key.A primary
key
is
identified for eachentity
set.It
is
generally
assumed that theprimary
key
alwayshas
a non-null value.The domain underlying the primary
key
is
usedto
identify
the
entity
set.This
domainis
calledthe primary
key
domain.For
example,the
attribute'social
security
is
aprimary
entity
key
for the
entity
setperson'
The primary
key
domainis
socialsecurity
numbers.The
attributeassociates
the domain 'social
security
number'
with the domain 'name'
"Social
security
number',
the attribute,
associatesthe domain
'socialsecurity
number'
with
itself.
An
associationbetween
entities is a relationship.Each
relationship
has
a unique name.For
example:ENTITY
John
coffee
ASSOCIATION
prefers
RELATIONSHIP-BEVERAGE PREFERENCE
John prefers coffee
TRANSFORM
DESCRIPTIONINTO
ELEMENTARY RELATIONS(PHASE II)
The
conceptual objects identified in Phase I are transformed intoelementary
relations.An elementary
relation is not reducible. It is arelation with one of the
following
properties:(1)
allkey, (2)
singleattribute
is
functionally
dependent of second single attribute or(3)
singleattribute is
fully
functionally
dependent on several attributes.Execution
of
the
following
steps, in order, leads toelementary
relations:(1)
eliminate non-full functional dependencies on candidatekeys
(2)
eliminate transitive dependencies on candidatekeys
(4)
if
necessary,
determine
single or compositeprimary
key
andtake
projections such that each projectionhas the primary
key
and one attributefrom the
complement of theprimary
key.
For
a givenrelation,
an attributeB is
functionally
dependent onattribute
A
if,
andonly
if,
each value of Ais
associated with one value ofB
atany
giventime. Attribute B is
fully functionally
dependent
onattribute
A if
attributeB is
functionally
dependent
on attributeA but
isnot
functionally
dependent
on a subset of attributeA.
In
the case ofdependence
on a singleattribute, there is
no difference between functionaldependence and
full functional
dependence. The dependence isusually
shown as
functional dependence
(
>). Full functional dependence isshown as
(====>).
For
example:relation: bank account
attribute A: bank account number attribute B: account balance
Attribute B is
functionally
dependent on attribute A. Attribute B is alsofully
functionally
dependent on attribute A.A
transitivedependency
exists between attribute X and attribute Zin a relation if there is an attribute Y such that a value of Y cannot be
associated with attribute X unless a value of attribute Z is associated
with attribute Y. For example:
RELATION:
(PRODUCT
ID,
MACHINEID,
EMPLOYEEID)
MANUFACTURE
PRODUCT
ID
MACHINE ID
EMPLOYEE ID
P44
M234
E 1 2
EMPLOYEE
ID is
transitively
dependent
on PRODUCT ID throughMACHINE ID.
A
multivalueddependency
exists for attributeB
ifthe
value ofB
depends
onthe
value of attribute A and there is more than one value of Bfor a given value of attribute A in the relation. For example:
RELATION: OPERATE(MACHINE
ID,
EMPLOYEEID)
where a given machine is operated
by
any
one of severalemployees
OPERATE
MACHINE
ID
EMPLOYEE IDM234 E12
M234 E22
M234 E24
There is a multivalued dependence of EMPLOYEE ID on MACHINE ID.
EXPAND LIST OF ELEMENTARY RELATIONS
(PHASE
III)
The expanded list of elementary relations includes
the
originallist
ofelementary
relations described in Phase II andelementary
relation:derived
from
those of the original list. The complete set ofelementary
of
elementary
relations.The
stepsto derive the
expandedlist
are:(1)
represent originallist
ofelementary
relationsby
digraph(2)
representdigraph
by
connectivity
matrix(3)
modify connectivity
matrixto
representtransitive
closure(4)
eliminate meaninglesselementary
relationsfrom
transitiveclosure
The
attributes ofthe
elementary
relation are representedby
thenodes of
the directed
graph. A cycle of length one occurs at a node whichrepresents an
entity
key
or at a node which represents anelementary
relation which is all key.
The
arcs ofthe
digraph represent thedependencies.
Arcs
are namedby
the name of theelementary
relationwhose
dependence is
shownby
the arc. The different types of dependence(functional,
fullfunctional
andtrivial)
aredistinguished
by
differentline
formats.
A
singleline
(
>) represents functional dependence. A doubleline
(====>)
represents full functional dependence. A broken line(
>)represents trivial dependence.
A digraph can be represented
by
a square matrix called aconnectivity
matrix. The size of the matrixis N
x N where N is the numberof nodes in the digraph. The first row and the first column are identified
with
the
first node. The Nth row and the Nth column are identified withthe Nth
node. The connections between the nodes of the digraph arerepresented
by
either one or zero. One indicates thatthere is
a connectionbetween
the
nodes. Zero or a blank indicates thatthere is
no connection.No
distinction is made between the severaltypes
of dependence inthe
matrix.
Node A
may
be repeatedly
dependent on nodeB. Node A is
repetitions. A
descriptor
is added to the node name to distinguish themultiple rows and columns.
For
example:STATE
is
part ofHOME ADDRESS
andSCHOOL ADDRESS
H.ST ATE
canbe
usedfor HOME ADDRESS
S.ST ATE
canbe
usedfor SCHOOL ADDRESS
A
transitive
closureis the
set of allelementary
relations,
originaland
derived,
whichdescribes
the environment of interest.The
transitiveclosure
is
unique.The transitive
closure forthe
original list ofelementary
relations can be createdby identifying
transitivedependencies
of
length 2
until nofurther
suchdependencies
can be identified. Thederived transitive dependencies
are addedto the
connectivity
matrixto
form
a modifiedconnectivity
matrix.Any
newelementary
relationsderived
in thisway
must be reviewed.Only
those
relations whichhave
meaning
in
the environment of interest are retained in a modifiedconnectivity
matrix. This modified matrix defines a digraph whichcompletely
describes the environment of interest.REMOVE
REDUNDANT ELEMENTARY RELATIONS(PHASE IV)
A redundant
elementary
relation is anelementary
relation whichis
composed of anotherelementary
relation. Removal of redundantelementary
relations resultsin
one or more minimal covers.Each
minimalcover includes all of the information in
the
originalelementary
relations.There
are two conditionsfor
the removal of anelementary
relation.The
be
a composition oftwo
otherelementary
relations. The second conditionrequires
that
anelementary
relation canbe
removedonly
if it is not partof a composition of another
elementary
relation.For
example:ELEMENTARY
RELATIONSER1(A,B)
ER2(B,C)
ER3(C,D)
ER4(A,C)
ER5(B,D)
ER6(A,D)
DIGRAPH
ER1
ER2^
ER4
ER3
ER5
D
\
K^/
1
ER6
REDUNDANT ELEMENTARY RELATION COMPOSED
OF
ER4
ER5
ER6
ER6
ER1,ER2
ER2,
ER3ER1,ER5
ER4,
ER3Considerremoval of ER4
Condition
1
Condition
2
yes
no
ER4 is composed of ER1 and
ER2
Consider
removal ofER6
Condition
1 yesER6
is composed ofER1
and ER5 ER6 is composed of ER4 andER3
Condition 2
yesER6
is not part of anotherER
ER6
canbe
removed.Consider
removal ofER4
after removal of ER6Condition 1
yesER4
is composed of ER1 and ER2Condition 2
yesER4 is
not part of anotherER
ER4
can be removed.Similarly,
ER5
can alsobe
removed. Theremaining elementary
relations are non-redundant.NON-REDUNDANT ELEMENTARY RELATIONS
ER1 ER2 ER3
ER1 ER2 ER3
/
"""V'" "
""""'V
CHAPTER
FIVETHREE LOGICAL
DATA MODELS OF THE
GENEALOGICAL
EXAMPLE
5.1
INTRODUCTION
This
chapterincludes the implementation
of thethree
designsdescribed in Chapter Four The
application selected for implementation isa genealogical example
including
instances
of the types of fuzzinessdescribed in Chapter Two.
_i.i
NATURE OF GENEALOGICAL DATA
The two
problems solvedby
genealogists are(1)
determination ofthe
pedigree of an individual and(2)
creation of afamily
genealogy.The
pedigree identifies the direct ancestor of an
individual.
Basic
data in thepedigree are name, date and place of
birth,
date and place of marriage anddate
and place ofdeath.
Thedata
are often presented on an AncestorChart
(see
Appendix A).The
family
genealogy
includes data on all members ofthe family.
Inthe
simplest form it includes the relevant data for ahusband,
wife and their children. The datamay
be entered on aFamily
Group
Sheet(see
Appendix A).