Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
1987
Display of molecular models with interactive
computer graphics
Steve Wuter Yang
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
ritscholarworks@rit.edu
.
Recommended Citation
To
my
wife?Li-Yun?
for
her
encouragement? andlove
Acknowledgement
I
wouldlike
to
thank
my
adviser? professorGuy
Johnson?
for
his
guidance and encouragementduring
my
thesis
work.I
alsothank
professorDonald
Kreher
andprofessor
Andrew
Kitchen
for
their
suggestions andcr
it icisms.
Also?
I
amvery
gratefulto
rny
parents?brothers?
sisters? relatives? and
friends
for
their
moral supportand
financial
support.Lastly?
I
thank
my
lovely
daughter?Kail ing?
for
her
n 2 it n n n c E J^.:
1-
Introduction.
...i_- i."
Display
ofMo 1
ec u1
arModel
.2
.!. 2
.Relat ional
Database
Model
. ...9
.!_
3..
Cornp
uterGraphics
Overview.
...13.
!.
4
Vect
ri
y,Graphics
Processor.
... t ...1 4
= 3."Relat ional
Database
Management
in
the
CAMP
System. 18
=2
.1_
Relat ional
Model
Approach.
...18.?. 2
Xq.
D
i
sp1
ay aMolecule
in
Data
Poo
1
. ...32.
2.-
=[Primary
Key
Handl
ing
withIndex
Tree.
...39.
2. 4.
The
CAMP
System
Program
Coding
Charts.
...44.
si-
Computer
Graphics
in
the
CAMP
system. ...60.
S
i."
Tr
ansf
or mati
on andProj
ection.
...60.
3. 2.
Changing
Color
Table
Entries.
...67.
3. 3..
Algorithm
to
Draw
aSphere.
...70
.3
.4
.Shading
aPolygonal
Facet
in
Sphere.
...81.3." 5"
Vectr ix
Tools
in,
CAMP
System.
...86.
4
.Proper
t ies
ofMolecule
Structure.
...91.4
. JUPrototype
ofAtom.
...91.4. 2
.Prototype
of
Molecule.
...93
.4. 3.
Prototype
ofGeometric
Model
. ...96.
5"
Conclusions
andF
uturePirect ions.
...105.
&
Bib 1
iography
. ... ....111./ n
APPen d j
eX... no. ...."... "...""... 11*TeP^sa
1
Abstract
Computer
graphicsdevices
offer away
to
presentimages
ofthree-dimensional
objects with enoughdetail
to
aidhuman
understanding
ofthe
structureof molecules.
In
this
thesis?
the
Computer
Aided
Molecular
Design
system(CAMD
system) willillus
trate
this
idea.
The
CAMD
systemis
an educational package
designed
for
studentsin
high
school.The
user ofthe
CAMD
system can createdifferent
molecules and store
these
moleculesinto
a relational
data
base
for
future
display.
A
Relational
Data
E-iase
Management
system wasimplicitly
imple
1.
Introduction.
1.1.
Display
ofMolecular
Model.
Chemists
have
used mechanical modelsto
study
andunderstand
the
structure of molecules?particularly
large?
organic ones.
These
models representthe
locations
ofatoms
in
the
molecule relativeto
one another.In
this
project? users ?
especially
the
studentin
high
school-can create and retrieve
different
moleculesto
display
onthe
screen.The
CAMD
system produces colored? andbal
1-and-st ick
renditions of molecular models with
highlighting
(illumination?
shining? or shading).The
purposeis
to
allow
humans
to
visualise moleculesinteractively
in
amanner similar
to
that
offeredby
plasticbal
1-and-st
ick
models.
Some
basic
geometric structures are supplied astools
for
usersto
create new molecular models.These
geometric structures are
line?
triangle?
trigonal
plane?square plane?
tetrahedron?
trigonal
pyramid?trigonal
bi-pyramid? octahedron? and square pyramid.
For
example?both
methane(CH4)
and carbontetrachloride
(CC14)
aretetrahedral
models.If
these
two
molecular modelshave
to
be
created?then
these
two
tetrahedral
models willbe
page j
of carbon will
be
the
same of color and size atany
time
whenever
it
appearsin
any
geometric modelin
the
CAMP
system.
During
the
last
fifteen
years a number of graphicsystems
have
been
developed
for
researchersto
model andstudy
molecular structures.The
ATOMS
systemhas
been
developed
atBell
Laboratories.
ATOMS
is
a set ofFORTRAN
subrout
ines
for
depicting?
in
color? configurations ofspace-filling
atoms and/orbal
I-and-st
ick
models.Pictures
are rendered asin
traditional
cartoon animation:solid monochrome areas separated
by
thin
black
lines
C253.
The
user supplies x? y? z? radius? and colorinformation
for
atoms? atom pairs?thickness
and colorfor
bonds?
alsoparameters
defining
rulesfor
viewing
and picturedrawing-including
position and orientation of ahypothetical
camerain
use-defined space.The
systemis
ideally
suitedfor
use with afilming
unitthat
containsor
is
driven
by
a minicomputer-The
host
computer canthen
performvisibility
calculations?yielding
parameterized
descriptions
of visible patches of variouscolors? a routine resident
in
the
mini generatesthe
appropriate set of parallel vectors
to
fill
solidly
eachpatch area.
The
systemis
intended
for
both
educationalstills and movies? and
for
research purposesthat
benefit
from
good visualdisplay
ofhypothetical
morphologies andaffordable-computation
yielding
approximate perspectiveviews of user-defined situations.
This
is
achievedby
using
simplified picture elements and simple? approximatecalculations while
maintaining
visualvalidity
for
the
configurations
that
make sensefor
chemistry.Picture
elements
treated
arein
fact
bounded
only
by
circular arcsand straight
lines?
efficiency
is
further
enhancedby
doing
most ofthe
computation onthe
picture plane-acircumstance
commonly
called"2
1/2
D"which
deals
withflat
objectsthat
hide
pieces of one anotheraccording
to
an
ordering
in
depth.
The
intended
pictorial outputconsists of
filled?
solid-color areas separatedfrom
oneanother
by
thin
black
lines
asin
standard cartoon cellanimation.
This
is
done
normally
eitherby
exposing
colorfilm
directly
in
afilming
device
containing
a colorwheel? or
by
making
the
appropriate sequence of colorseparation negatives?
to
be
combined and coloredlater
by
optical printing.
This
systemis
particularly
well suitedfor
use with afilm
exposing
device
containing
ordriven
by
a minicomputer:the
main(host)
computer'sjob
is
then
to
performvisibility
calculations?delivering
to
the
actual picture-drawer a
list
of parameterizeddescriptions
of patches of various colors?
the
mini contains a programfor
calculating
the
many
parallel vectorsthat
effectively
fill
the
patch.This
is
the
system withthe
following
properties:
it
treats
bal 1-and-st
ick
models?page
b
perspective pictures
in
color withhidden
surfacesremoved?
it
is
computationally
fast.
Chemically
Oriented
Graphics
System
(COGS)
is
amolecular
modelling
system?for
small and macro-rnolecuies?which
incorporate
a wide range offunctionality
has
been
developed
C263.
The
systemis
*user friendly' andis
controlled almost
exclusively
by
a puck(mouse)?
in
amanner akin
to
the
Apple
Macintosh.
The
systemis
writtenin
FORTRAN
77
andthe
graphics adheresto
the
CORE
standard? so
that
a reasonabledegree
ofportability
is
assured.
COGS
is
a molecularmodelling
programdesigned
to
be
usefulin
four
major areas: small moleculestereochemical
design?
modelbuilding?
and comparison?investigation
ofthe
interaction
between
organic orinorganic
crystallattices
with small and medium sizedorganic molecules? enzyme-substrate or enzyme-inhibitor
docking
studies? and protein engineering.The
systemis
designed
to
be
as comprehensive as possible? whileremaining
easy
to
use.COGS
wasdeveloped
by
White
andPearson
in
1985?
UK.
Most
ofthe
facilities
in
COGS
perform
their
operationsimmediately?
withthe
obviousexception of molecular mechanics and molecular
dynamics
calculations? and conformational search procedures.
COGS
incorporates
extensive errorchecking
sothat
it
is
very
difficult?
but
not yetimpossible?
to
crashthe
programby
user
is
always given anopportunity
to
correctit
(interactively)
withoutlosing
placein
a sequence ofcommands.
There
are a number of main menu entriesin
COGS?
each of whichis
the
head
of atree
of a number ofsubmenus.
The
"draw
molecules"option allows models
to
be
constructed
from
a2D
structural chemicalformula
ofthe
desired
moleculedrawn
withthe
puck.This
occursin
exactly
the
sameway
asit
wouldbe
drawn
by
a chemistusing
pencil and paper.This
2D
diagram?
of connectedatoms of
different
types?
is
convertedautomatically
by
COGS
into
afull
3D
molecular representation whichis
then
displayed
onthe
screen.Models
constructed withdraw
molecules option will overwrite
any
other molecule(s)
currently
in
the
workspace.COGS
consists of afamily
ofsubroutines
(around
150)?
each with a single welldefined
and
logically
distinct
function?
which communicate witheach other via
FORTRAN
commonblocks
(arguments
to
subroutines are avoided where
sensibly
possible).Extensive
useis
made of structuredprogramming
techniques
and
long
variable names(with
words separatedby
underscores) so
that
the
intent
of each subroutineis
clear and
may
almostbe
read asEnglish
text.
In
additioneach subroutine
has
a multilinetext
header
describing
its
function?
entry
points? referencesto
the
algorithm used?page
7
The
MAGIC
system(Modeling
And
Graphics
In
Chemistry)
has
been
developed
for
interactively
handling
andmanipulating
in
the
computer moleculesusually
dealt
within
organic orinorganic
chemistry
C273.
The
emphasishas
been
put onfast
input?
display?
andediting
ofstructures.
MAGIC
overcomesdisturbing
tracking
subpictures
for
the
light
pen andthe
clumsy
operationby
simultaneously
manipulating
light
pen andkeyboard.
The
system
is
designed
for
easy
use?is
self explanatory? anddoes
not requireany
complicated commandlanguage.
All
functions
ofthe
system areinvoked
via menususing
alight
pen.The
systemhas
powerfulmodeling
anddisplay
functions
allowing
for
directly
modeling
molecularstructures with a
light
pen.The
MAGIC
systemis
designed
by
Bauer
andSchubert
atOrganisch-Chernisches
Institute?
West
Germany?
in
1982.
In
orderfacilitate
direct
editing
of structures stick
diagrams
are usedto
display
the
structures on
the
screen.More
elaborate pictures withatoms represented as spheres and
hidden
lines
eliminatedcan
be
generated on a plotter.The
functions
ofMAGIC
may
be
conceptually
divided
into
modeling
functions
andthose
for
computing
structural parameters andsaving
orrecalling
models.Modeling
a structure comprisesentering? editing? and
display
the
structure.Stick
diagrams
appear onthe
display
in
orderto
keep
the
numberof subpictures small and
to
avoidflickering
ofthe
highlighted
to
give abetter
impression
ofthe
spatialarrangement of
the
atoms.More
elaborated representationslike
ball
and stickdiagrams
ordiagrams
of spacefilling
models are not chosen
to
appear onthe
screen.These
representations
usually
containtoo
many
lines
to
allowspecific
light
penhits
for
editing
ofthe
picture.MAGIC
fully
utilizesthe
storage capabilities of a computer.Complete
models orthose
being
developed?
substructures orbuilding
blocks
may
be
stored and retrievedby
MAGIC.
This
way
the
useris
ableto
add?modify
ordelete
items
in
the
collection ofbuilding
blocks
maintainedby
the
system.
Therefore?
MAGIC
may
be
usedto
build
adata
bank
of structures and
their
models.Instead
ofstoring
generated structures?
they
can eitherbe
printed orpunched
in
interpreted
form
andthen
be
usedfor
otherpurposes.
The
interpreted
form
contains all pertinentdata
for
reconstructing
the
three
dimensional
model sothe
data
may
be
transferred
to
other programs orinstallations.
Many
structural properties of moleculesare
readily
described
in
terms
ofdistances
and anglesbetween
atoms?bonds?
lines?
and planes withinthe
molecule.
MAGIC
therefore
allowsfor
calculation of allthese
parameters.The
operationsfor
calculating
geometric parameters are
loaded
from
a menu andthe
atoms?bonds?
lines?
or planes are specified as attributesthrough
light
penhits.
The
value ofthe
geometricpage
9
1.2.
Relational
Database
Model.
A
"database"is
a computerized storehouse ofdata?
The
data
is
organized sothat
it
canbe
retrieved andprocessed?
for
one andonly
one purpose:
to
answerquestions.
A
"database
management system"is
a computerprogram
that
works as adata
librarian.
The
database
management system
(DBMS)
organizesthe
electronicequivalent of
bookshelves.
It
placesdata
on shelves?retrieves
data
from
the
shelves? and combines andmanipulates
data
according
to
instructions
providedby
the
user.
It's
like
having
your own personal assistant whoremembers
data
you provide.keeps
track
of whereit
is
and assembles
the
data
in
various waysto
provide answersto
questions you ask.A
data
base
canbe
analyzedfrom
two
viewpointsthe
physical storage ofthe
data
andthe
logical,
orconceptual, view of
data.
Files
are usedto
physically
store
data
in
adata
base.
Most
data
bases
use eitherdirect
files
orindexed
files?
or a combination ofthe
two?
to
physically
storedata
ondisk.
The
logical,
or conceptual, view of adata
base
is
concerned with
how
data
is
logically
organized andhow
the
data
canbe
retrievedfor
information
purposes.The
three
base?
a networkdata
base?
and a relationaldata
base.
A
hierarchical
data
base
consists of elements which actin
aparent-child
relationship.
This
is
the
oldest ofconceptual
data
base?
such asIMS
from
IBM.
A
networkdata
base
actsin
muchthe
same manner as ahierachical
data
base
exceptthat
an element canhave
morethan
oneparent.
In
networkdatabase
nomenclature? a parentis
called an owner and
the
childis
called amember-A
relationaldata
base
is
the
newest conceptual viewof
data
and offersmany
advantages.The
mostimportant
advantage
is
that
the
relationshipsbetween
data
canbe
determined
dynamically
atthe
time
the
user requestsinformation
from
the
data
base.
To
begin?
a relationis
defined
as a conceptualfile
where each conceptual recordis
unique and each conceptual recordhas
the
same numberand
type
offields.
A
conceptualfile
containing
recordsis
also called atable
and each record withinthe
file
is
sometimes called a
tuple.
The
relationaldata
base
modelwas selected
for
the
CAMD
system.Binary
tree
algorithms andindex
sequentialalgorithms can
be
usedto
create and search molecularmodels
in
the
relationaldata
base.
For
example? a usermay
wishto
store ordisplay
a molecule of water<H20)
.A
user will go
to
a relationaltable
to
check whetherthis
page
11
then
the
user stores some relevantdata
for
this
molecule.The
name ofthe
molecule shouldbe
aprimary
key
in
the
relational
table
to
avoidduplicate
molecules.If
user-wants
to
display
this
molecule?the
CAMD
system will goto
one relational
table
calledGEOMETRY
to
getthe
centralvalue of each atom, and go
to
another relationaltable
called
ATOM
TABLE
to
get radius and colorfor
each atom.The
name ofthe
basic
geometric structure also willbe
akey
in
atable.
The
following
relations existin
the
CAMD
system.relation
l:
MOLECULE (
MOLE
.NAME
,MODEL
.T YPE )
.relation
25
ATOM
TABLE
(
ATOM. NAME,
RADIUS,
COLOR).
relation
3:
LINEAR (
MOLE.
NAME,
AT0M.1?
ATOM.
2,
ATOM. 3).
relation
4:
TRIANGLE(
MOLE. NAME,
AT0M.1,
ATOM. 2,
ATOM.
3).
relation
5:
TR
I GONAL
PL ANE (
MOLE
.NAME
,ATOM
.1
,ATOM
.2
,ATOM
.3
,ATOM
.4 )
.relation
6:
SQUARE
PLANE
(MOLE. NAME,
AT0M.1,
ATOM. 2,
ATOM.
3,
ATOM. 4,
ATOM. 5)
relation
7:
relation
8:
TETRAHEDRAL ( MOLE
.NAME
,ATOM
.1
,ATOM
.2
?ATOM
.3
?ATOM
.4
,ATOM
.5
)
relation
9:
TRIGONAL
B I
-PYRAMID(MOLE. NAME,
ATOM.l,
ATOM. 2,
ATOM. 3,
ATOM. 4,
ATOM. 5,
ATOM. 6).
relation
10:
SQUARE
P YRAM I D ( MOLE
.NAME
?ATOM
.1
,ATOM
.2
,ATOM
.3
,ATOM. 4,
ATOM. 5,
ATOM. 6)
.relation
11:
OCTAHEDRAL (MOLE. NAME,
ATOM.l,
ATOM. 2?
ATOM. 3,
ATOM. 4,
ATOM. 5,
ATOM. 6?
ATOM. 7)
relation
12:
GEOMETRY (
GEOM
.NAME
,X
.1
,Y
.1
,Z
.1
,RAD
I
US
.1
?COLOR
.1
,X.2,
Y.2,
Z.2,
RADIUS.
2,
COLOR. 2,
X.3,
Y.3,
Z.3,
RADIUS. 3,
COLOR. 3,
X.4,
Y.4,
Z.4,
RADIUS.
4?
COLOR. 4?
X.5,
Y.5,
Z.5,
RADIUS. 5?
COLOR. 5,
X.6?
Y.6?
Z.6?
RADIUS.
6,
COLOR. 6,
page
13
1.3.
Computer
Graphics
Overview.
Computer
graphicsis
often usedto
assistbusiness
executives
in
analyzing
data
andto
assistindividuals
in
preparing
data
for
analysis and presentationto
others[73.
Computer
graphics allowsinformation
to
be
displayed
in
the
form
of charts? graphs? or pictures sothat
the
information
caneasily
andquickly
be
understood.A
picture
is
the
fundamental
cohesive conceptin
computergraphics.
This
thesis
considershow
molecules arerepresented
in
computer graphics?how
molecules areprepared
for
presentation? andhow
interaction
withthe
molecule
is
accomplished.Many
computer graphicsapplications
involve
the
display
ofthree-dimensional
objects and scenes
L'93.
For
example? computer-aideddesign
systems allowtheir
usersto
manipulate models ofmachined components? automobile
bodies
and aircraft parts.These
applicationsdiffer
from
two-dimensional
applications not
only
in
the
addeddimension?
but
they
also require concern
for
realismin
the
display
ofobjects.
In
this
thesis,
the
Computer
Aided
Molecular
Design
system(the
CAMD
system)is
a3-D
computer graphics1.4.
Vectrix
Graphic
Processor.
In
the
CAMD
system,the
Vectrix
'v'X384,
a graphicprocessor? and
Vectrix
VXM
monitoris
usedto
display-molecules.
The
VX384
canbe
used withany
computerhaving
a serial or parallel
interface.
This
processor canbe
operated with
any
operating
system orlanguage.
The
CAMD
system was
implemented
in
the
C
programming
language
running
underthe
Unix
operating
system.The
VX384
candisplay
512
colorssimultaneously
from
a palette of
16
million colors andhas
a resolution of672
x
480
pixels.The
screenis
laid
out as a rectanglein
2-D
mode.The
pixels are numbered0
-671
along
the
horizontal
axis, and0
-479
onthe
vertical axis.The
coordinate
(0,0)
is
in
the
lower
left
hand
corner ofthe
screen;
The
coordinate(671,
479)
is
in
the
upper righthand
corner-The
Vectrix
graphic processoris
designed
to
relieve
the
host
computer of graphic calculations andto
relieve
the
programmer of subroutine and procedure calls.The
Vectrix
VXM
monitoris
ahigh
quality,high
resolution monitor well matched
to
the
capabilities ofthe
frame
buffer.There
are almost76
available commandsthat
can
be
usedin
interactive
modeto
demonstrate
use ofthe
Vectrix.
The
Vectrix
graphics processor offers a widevariety
of geometricforms
that
canbe
combined onthe
page
15
These
basic
forms
are called graphics primitives.The
Vectrix
command setincludes
primitivesto
draw
dots?
lines?
polygons? arcs? patterns? and severaltypes
ofcolor
fills
andfloods.
Most
ofthese
commands workin
both
two
dimensional
andthree
dimensional
modes;D
(Dot)?
M
(Move),
L
(Line)?
andP
(Polygon),
andF
(Filled
Polygon)
commands workin
either2-D
or3-D
space.These
commands will
be
described
moredetail
in
chapter3.5
ofthis
thesis.
The
color manipulation commands allow precise colorcontrol
for
graphics andtext
primitives andthe
display-background.
Colors
canbe
mixedin
several waysto
createvery
subtleshading
effects.Colors
can alsobe
usedto
hide
images
in
the
background
until a new colordefinition
puts
them
in
the
foreground,
making
passible animation andhidden
drawing
effects.The
graphicsmemory
is
organized as ninebitplanes,
and each pixel's color
is
determined
by
varying
combinations of
the
ninebits.
The
VX384
usesthe
selected color number as a nine
bit
index
to
a colorlookup
table
wherethe
color values are stared.This
index
is
groupedinto
three
bits
of red?three
bits
ofgreen, and
three
bits
ofblue.
Since
there
are ^combinations possible,
512
colors canbe
storedin
the
is
referenced,three
stored eightbit
values are outputto
the
RGB
monitor-These
three
valuesare amounts of red,
green, and
blue
anentry
willdisplay.
They
controlthe
intensities
of each ofthe
three
color gunsin
the
monitor.
Using
eightbits
for
each color allows each
color value
to
be
from
0
to
255.
Over
16
million
<
255
)
colors canbe
createdby
combining
the
red, green, andblue
components atdifferent
levels.
Changing
the
contents of a colortable
entry
changesthe
colorthat
willbe
displayed
when a color commandreferences
the
table.
The
display
is
driven
by
the
table
contents at
time
it
is
scanned.For
example? anentry
canbe
madeto
display
agrey
by
setting
each ofthe
three
stored values? red? green and
blue?
to
the
same value.A
256
entry
grey
table
canbe
built
by
giving
entries0
through
255
equalincremental
valuesfor
red? green? andblue.
The
default
colortable's
512
entries are arranged sothat
the
lower
three
bits
of color numberindex
representthe
red components?the
middlethree
bits
representthe
greens?
the
high
bits
representthe
blues.
As
the
valueof each
group
ofbits
increases,
the
intensity
ofits
page i/
The
algorithm usedto
determine
the
default
colortable
valuesis
this:
The
last
entry?#511,
is
setto
R=255?
G=255?
B=255.
Red?
green andblue
values aredecreased
by
32
in
nestedloops
until ailthree
equal31.
Red
is
decremented
first?
then
green?then
blue.
The
first
entry,#0?
is
setto
R=0?
G=0,
B=0.
Some
other2.
Relational
Database
Management
in
the
CAMD
system.2.i.
Relational
Model
Approach.
A
relationaldata
base
is
madeup
ofany
numberrelations.
Each
relationis
given a name.For
example?"ATOM
TABLE"is
one ofthe
relationin
Computer
Aided
Molecular
Design
system(CAMD
system).The
relation canbe
viewed as atable
that
is
madeup
of a number of rowsand columns.
Each
columnis
called an attribute and givena name such as
ATOM. NAME,
RADIUS,
andCOLOR
in
"ATOM
TABLE"
relation.
The
rows ofthe
relation are calledtuples
or record and containthe
data.
such as?(H?
0.1250?
GREEN)
is
atuple
ofdata.
The
valuesin
eachcolumn of each row come
from
adomain
of values.One
suchdomain
is
associated with each attribute anddefines
the
range of allowed values
for
that
attribute.The
instance
of a relation
is
the
content ofthe
relation at aparticular
instant
oftime.
There
is
adistinction
between
domain
and attribute.A
domain
is
a set of values? as suchit
may
appearin
morethan
one relation or sometimes morethan
oncein
the
samerelation.
It
is
commonto
choosedomain
namesto
signify
value sets?
for
example,INTEGER?
ALPHA?
NUMERIC
aredomain
names.Attribute
names? onthe
other hand? arepage
19
enterprise.
COLOR
is
the
third
attribute namein
"ATOM
TABLE"
relation.
It
meansthat
atomH
willbe
drawn
in
green.
Each
relation possess several rules asfollows:
(1)
There
is
one columnin
the
relationfor
each attribute ofthe
relation.Each
such columnis
given a namethat
is
unique
in
the
relation.(2)
The
entriesin
the
columncome
from
the
samedomain.
(3)
The
order ofthe
columnsor attributes
in
the
relationhas
no significance.(4)
The
order ofthe
rowsis
not significant.(5)
There
areno
duplicate
rows.The
mostimportant
term
of allis
the
relation
key.
This
key
is
the
attribute or set ofattributes
that
uniquely
identifies
tuples
in
a relation.A
relationkey
is
formally
defined
as a set of one or morerelation as attributes concatenated so
that
the
following
three
propertieshold
for
alltime
andfor
any
instance
ofthe
relationC43:
(1)
Uniqueness
:
The
set of attributestakes
on a unique valuein
the
relationfor
eachtuple.
(2)
Nonredundancy
:If
an attributeis
removedfrom
the
rest of attributes.
the
remaining
attributesdo
notpossess
the
uniqueness property.(3)
Validity
:No
attribute value
in
the
key
may
be
null.Null
valueis
aspecial value
that
is
usedto
represent"value
unknow" or
"value
inapplicable".It
is
notthe
same asblank
orIt
is
possiblefor
relationsto
have
morethan
onerelation
key.
Each
key
is
madeup
of adifferent
set ofattributes? and
is
often calledthe
candidatekey.
If
akey
is
the
only
key
of relation?it
is
generally
referredto
asthe
primary
key.
There
is
only
oneprimary
key
implemented
in
CAMD
system.A
primary
key
also canbe
aprimary
key
for
another relation. such as?MOLE. NAME?
which
is
a attribute name? canbe
aprimary
key
ofboth
"MOLECULE"
table
and "TRIANGLE"table.
When
an attributein
one relationis
akey
of anotherrelation?
the
attributeis
called aforeign
key.
The
term
means
that
the
attributeis
akey?
but
in
aforeign
relation.
Thus?
in
the
CAMD
system? attribute"MODEL.
TYPE"of "MOLECULE" relation
is
aforeign
key.
To
construct
the
relationship?the
system will match"MODEL.
TYPE" with values of"GEOM.NAME"
in
"GEOMETRY"relation.
Foreign
keys
areimportant
whendefining
constraints across relations.
For
example?the
database
designer
may
wantto
specify
that
no "GEOMETRY"tuple
canbe
deleted
if
its
"GEOM.NAME"is
a value offoreign
key.
The
relational model wasfirst
proposedby
Dr.
E.
F-Codd
in
a seminal paperin
1970
C53.
sincethen?
there
have
been
many
implementations
ofthe
relational approach.System
R
is
a wellknown
relationaldata
base
system.It
page
21
the
IBM
San
Jose
Research
Laboratory.
All
accessto
this
database
is
via adata
sublanguage calledSQL.
The
original version of
SQL
wasbased
on an earlierlanguage
called
SQUARE.
The
two
languages
areessentially
the
same.
However?
SQUARE
uses a rather mathematical syntaxwhereas
SQL
is
much moreEnglish-like.
SQL
is
an acronymfor
"Structured
Query
Language".
It
provides notonly
retrieval
functions
but
also afull
range of updateoperations.
It
includes
bath
adata
definition
language
(DDL)
and adata
manipulationlanguage
(DML)
.DDL
provides
for
the
definition
ordescription
ofdatabase
objects? such as
PEFINE
operation,DESCRIBE
operation.DML
supportsthe
manipulation orprocessing
ofdatabase
objects? such as
SELECT
operation?DELETE
operation.The
more
detail
description
canbe
found
in
C173.
The
CAMD
system provides aDDL
andDML
based
on someideas
from
SQL.
SQL
is
a self-contained commandlanguage
classified
by
mode.The
CAMD
systemis
a self-containedmenu selection
language.
In
traditional relationaldata
base
system?there
areseveral
kinds
of operations? such asDEFINE?
DESCRIBE?
INSERT,
DISPLAY,
SELECT,
PROJECT,
DELETE,
andJOIN.
The
DEFINE
operationis
usedto
create a relation.The
DESCRIBE
operationis
usedto
describe
the
properties ofoperation
is
usedto
add one newtuple
ofdata
in
arelation.
The
SELECT
operationis
usedto
extract certaintuples
(rows)
from
a relation.The
PROJECT
operationis
used
to
extract attribute(s)
(columns)
from
a relation.The
DELETE
operationis
usedto
removetuples
from
arelation.
The
DISPLAY
operationis
usedto
showthe
wholerelation.
The
JOIN
operationis
usedto
combinerelat
ions.
The
CAMD
systemdoes
not provide aSQL-like
language
to
do
interactive
queries. However? some ofthese
commands are
implicitly
performedin
the
CAMD
system.It
is
simpleto
usethe
SQL
language.
SELECT-FROM-WHERE
is
the
fundamental
structure ofSQL
language
commands.The
SELECT
expression specifiesthe
name of all attributes ofrelation.
FROM
specifiesthe
relationto
be
used, and newexpression?
WHERE?
providesthe
conditionsfor
the
page
23
There
are severalSQL-like
examplesbelow.
Note
that
no such
interface
is
currently
implemented
as part ofthe
CAMD
system.(1).
To
create a relationusing
DEFINE
operator-DEFINE
MOLECULE
FIELD
MOLE. NAME
TYPE
CHARACTER
( 10)
PRIMARY
KEY
F
I ELD
MODEL
.TYPE
TYPE
CHARACTER
(
20 )
,(2)
.To
add newdata
using
INSERT
operator.INSERT
INTO
ATOM
TABLE (
ATOM.
NAME,
RADIUS,
COLOR)
VALUES
<
'H
' -1
.250
, 'GREEN
'>
;
(3).
To
project one columnusing
PROJECT
operatorPROJECT
MOLE.
NAME
FROM
MOLECULE;
(4)
.To
select onetuple
using
SELECT
operator.
SELECT
MOLE
.NAME
,MODEL
.TYPE
FROM
MOLECULE
(5).
To
display
wholetable
using
DISPLAY
operator.DISPLAY
ATOM
TABLE;
(6).
To
select onetuple
withspecify
attributesusing
SELECT
operator-SELECT
ATOM.l,
ATOM. 2,
ATOM.
3
FROM
TRIANGLE
WHERE
MOLE. NAME
='S02'?
(7).
Subquery
with comparison operatorto
get onetuple,
SELECT
*FROM
ATOM
TABLE
WHERE
ATOM.
NAME
-(SELECT
ATOM. 1
FROM
TRIANGLE
WHERE
MOLE. NAME
=*S02'>;
(8).
To
remove onetuple
data
using
DELETE
operator.DELETE
GEOMETRY
WHERE
GEOM
.NAME
= 'page
25
(9).
To
combinetwo
relationusing
JOIN
operator-JOIN
MOLECULE,
TRIANGLE
WHERE
MOLECULE.
MOLE. NAME
=TRIANGLE. MOLE. NAME;
(10).
To
describe
properties of a relationusing
DESCRIBE.
DESCRIBE
ATOM
TABLE
Actually,
the
CAMD
systemdoes
not allowthe
userto
employ
this
kind
of commandlanguage,
but
the
CAMD
systemdoes
implement
all ofthese
operationsin
its
internal
processes.
The
systemhas
been
made morefriendly
to
the
user
by
using
aninteractive
environment.It
may
ask userquestions.
It
is
notnecessary
to
realizeSQL-like
language.
CAMD
does
not support all relationaloperations.
For
example,in
the
PROJECT
operation,the
CAMD
systemonly
projectsthe
first
attributefrom
any
onerelation.
It
does
not support projection ofany
otherattributes,
because
it
is
notvery
important
to
projectany
other attributesto
the
user ofthe
system.To
project
the
primary
key
from
any
table,
it
is
good enoughto
know
how
many
molecules existedin
the
system,how
many-kinds
of geometric model are offeredin
the
system? orThe
other reasonthat
systemdoes
not usethe
traditional
query
language
is
to
avoidthe
userinserting
data
in
one relation withoutinserting
data
to
anotherrelation.
Suppose
the
user adds onetuple
of new moleculedata
(S02?
TRIANGLE)
to
relationMOLECULE (MOLE. NAME,
MODEL. TYPE).
Later
on?the
system can notfind
the
S02
tuple
in
relationTRIANGLE (
MOLE. NAME?
ATOM.l,
ATOM.
2,
ATOM. 3).
So
the
systemautomatically
requests userinput
data
for
relationTRIANGLE.
The
CAMD
systemdoes
notprovide a parser system
to
analyzethe
query
language
according
to
the
syntax.From
the
user's point of view,it
is
more convenient? since a userfollows
the
menuto
do
data
retrieval.Here
is
an examplethat
employs aSQL-like
language
to
illustrate
the
underlying
operations.Assuming
that
a user wantedto
display
moleculeS02
(Sulfur
dioxide)
onthe
Vectrix
graphic screen.Also,
assuming
that
the
CAMD
system supportsSQL-like
query
page
27
step
1.
To
find
whatis
the
geometric modeltype
ofmolecule
S02
(Sulfur
Dioxide):
SELECT
MODEL. TYPE
FROM
MOLECULE
WHERE
MOLE. NAME
='S02';
The
result:MODEL. TYPE
TRIANGLE
step
2.
To
find
which atomlocate
on which position.JOIN
MOLECULE
AND
TRIANGLE
WHERE
MOLECULE. MOLE. NAME
=TRIANGLE. MOLE. NAME
GIVING
Rl;
The
result: relationRl
MOLE. NAME
MODEL. TYPE
ATOM.l
i
ATOM.
2
ATOM. 3
S02
TRIANGLE
0
0
step
3.
To
get radius and colorfar
first
atom:JOIN
Rl
AND
ATOM
TABLE
WHERE
Rl. ATOM.l
=ATOM
TABLE.
ATOM.
NAME
The
result: relationR2
!
MOLE. NAME
1
MODEL.
TYPE
ATOM. 1 !
ATOM. 2!
ATOM.
3!
RADIUS!
COLOR !
!
S02
!
TRIANGLE
S
!
0
!
0
i
1
.850
!
BLUE
:
step
4.
To
get rid of unrelated atom:SELECT
MOLE. NAME,
MODEL. TYPE,
ATOM.l,
RADIUS,
COLOR
FROM
R2
WHERE
MOLE. NAME
= 'S02'GIVING
R3;
The
result: relationR3
!
MOLE. NAME
!
MODEL. TYPE
ATOM.
1
!
RADIUS
COLOR
:
!
S02
!
TRIANGLE
S
1
1
.850
BLUE
!
step
5.
To
find
first
set of(x,
y, z) coordinate value!SELECT
GEOM
.NAME
,X
.1
,Y
.1
?Z
.1
FROM
GEOMETRY
WHERE
GEOM.NAME = 'TRIANGLE'GIVING
R4;
The
result: relationR4
GEOM.NAME
X.
1
Y.
1
Z. 1
page
29
step
6.
To
get completed messagefor
first
atom.JOIN
R3
AND
R4
WHERE
R3. MODEL. TYPE
=R4. GEOM.NAME
GIVING
R5;
The
result: relationR5
i
MOLE. NAME!
MODEL. TYPE!
ATOM. 1 !
RADIUS!
COLOR
i
X. 1
!
Y.
1
i
Z.I!
!
S02
!
TRIANGLE
!
S
!
1
.850
i
i
BLUE
1
0
.0 !
""
1
0.0!
0.0!
After
getting
relationR5?
the
data
ofthe
first
atomof molecule
302
willbe
sendto
graphic systemto
draw
first
sphere onVectrix.
then
the
CAMD
system repeatsthe
same retrieval
from
step
(3)
to
step
(6)
to
getthe
secondatom
data
and sendsto
graphic systemto
draw
the
secondatom on
Vectrix.
Again?
to
repeatthe
rest of atoms untilto
finish
drawing
this
molecule.Actually
the
CAMD
systemdoes
not supportthese
query
language
for
userto
codethese
commands.But
the
CAMD
system stillimplicitly
The
CAMD
system maintainstwo
supportfiles,
whichare relation.tbl and attribute.
tbl
."relat ion. tbl
"
is
aflat
file
treated
as atwo-dimensional
array
of elements.It
containsthe
name of all relations?the
name ofdata
file?
adescriptor
ofdata
file?
the
name ofindex
file?
adescriptor
ofindex
file,
the
number oftuple
in
data
file?
the
size of atuple
in
the
data
file?
the
starting
position of a
tuple
in
the
attribute.tbl
file
ofthe
relation?
the
number of attributein
the
relation?the
endof
file
positionin
the
index
file?
andthe
end offile
position
in
the
data
file.
The
length
of a relation.tbltuple
is
136
bytes
in
the
system.The
layout
ofthis
data
structure wasdesigned
in
C
language
asfollows:
relnameCMAXCHAR3
;
data_nameCMAXCHAR3?
data_f
d?
index__nameCMAXCHAR3 ;
index_f d?
d_tuple?
rec_size
att_lst_rec_pos;
att_nurnb;
ieof__pos"
deof_pos" >"
st r uc
t
rel_buchar
char
int
char
int
int
int
int
int
int
page
31
The
other system supportfile?
attribute.tbl
, alsocan
be
treated
like
a relation.It
containsthe
name ofall
the
relationsin
the
system?the
name of allthe
attributes
in
the
system?the
data
type
ofthe
attribute?the
length
ofthe
attribute? and aflag
indicating
whetherthe
attributeis
primary
key.
The
length
of aattribute.
tbl
tuple
is
80
bytes.
The
data
structure of attribute.tbl
wasdesigned
asf
ollows:
struct attjouf
C
char rel_nameCMAXCHAR3;
char att_nameCMAXCHAR3;
int
att_type;int
att_count;int
is_key?
2.2.
To
Display
aMolecule
in
Data
Pool.
The
CAMD
systemdefines
twelve
relations which areMOLECULE,
GEOMETRY,
ATOM
TABLE?
LINEAR,
TRIANGLE,
TRIGONAL
PLANE,
SQUARE
PLANE,
TRIGONAL
BI
-PYRAMID,TETRAHEDRAL,
TRIGONAL
PYRAMID,
SQUARE
PYRAMID,
andOCTAHEDRAL.
There
are no
duplicate
relational names.Also?
the
systemautomatically
defines
attributedata
types,
attributenames? and attribute numbers.
For
example,in
ATOM
TABLE,
there
arethree
attributefields
for
onetuple.
The
first
attribute
is
aprimary
key
to
store atom name.It
is
string
characterdata
type.
The
second attributeis
decimal
data
type
to
store radius of atom.The
third
attribute
is
string
characterdata
type
to
describe
atomcolor-
Each
relation wouldbe
described
in
moredetail
later.
Basically,
this
sectiondescribes
how
to
create amolecule,
how
to
display
a relationdata,
how
to
draw
amolecule, and
how
to
delete
a molecule.The
fallowing
areimplemented
in
a similarfashion:
defining
a relation,selecting
atuple,
deleting
atuple?
adding
atuple,
displaying
a wholetable
data,
anddescribing
a relationproperties
in
relationaldata
base
management system.To
create a relationaltable,
the
system writes someproperties of
the
relationto
"relation.tbl"file
and"attribute.
tbl"file
which arethe
system supportfiles.
page
33
system.
These
areinteger
type,
realtype,
and charactertype.
If
the
attributedata
type
is
acharacter string,
it
also needsto
be
assigned a character numberfor
that
field.
Each
relationis
associated with onedata
file
andone
index
file.
The
"relation.tbl"file
is
composed ofrelation names and
there
relative properties: name ofdata
file
andindex
file,
descriptor
ofdata
file
andindex
file?
total
tuples
in
data
file,
size of adata
record?and
field
of attribute number, end offile
positionin
data
file
andindex
file.
The
"attribute. tbl
"file
is
composed of relation name, all of attribute names, and
data
structure of each attribute.To
add atuple
ofdata
to
one ofthe
relations?the
system will check whether
this
relationalready
existsin
the
CAMD
system.Also?
it
will checkif
there
is
aduplicate
key
in
a relation.If
there
is
noduplicate?
then
it
stores atuple
ofdata
atthe
end ofthis
data
file.
If
the
user ofthe
CAMD
system wantsto
create anew molecule?
for
exampleSulfur
dioxide
(302),
the
systemwill go
to
the
"MOLECULE"relation
to
checkits
prirnary
key.
If
there
is
noduplicate
S02,
it
will storeS02
asits
newkey.
And
then
the
system asksthe
userto
input
another
item
ofdata
for
the
next attributefield
whichin
this
caseis
"TRIANGLE".
Also?
the
system will addto
the
relation "TRIANGLE" one