W&M ScholarWorks
W&M ScholarWorks
Dissertations, Theses, and Masters Projects Theses, Dissertations, & Master Projects1998
A probability programming language: Development and
A probability programming language: Development and
applications
applications
Andrew Gordon GlenCollege of William & Mary - Arts & Sciences
Follow this and additional works at: https://scholarworks.wm.edu/etd
Part of the Computer Sciences Commons, and the Statistics and Probability Commons
Recommended Citation Recommended Citation
Glen, Andrew Gordon, "A probability programming language: Development and applications" (1998). Dissertations, Theses, and Masters Projects. Paper 1539623920.
https://dx.doi.org/doi:10.21220/s2-1tqv-w897
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI
films the text directly from the original or copy submitted. Thus, some
thesis and dissertation copies are in typewriter face, while others may be
from any type o f computer printer.
The quality of this reproduction is dependent upon the quality of the
copy submitted. Broken or indistinct print, colored or poor quality
illustrations and photographs, print bleedthrough, substandard margins,
and improper alignment can adversely afreet reproduction.
In the unlikely event that the author did not send UMI a complete
manuscript and there are missing pages, these will be noted. Also, if
unauthorized copyright material had to be removed, a note will indicate
the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand comer and
continuing from left to right in equal sections with small overlaps. Each
original is also photographed in one exposure and is included in reduced
form at the back of the book.
Photographs included in the original manuscript have been reproduced
xerographically in this copy. Higher quality 6” x 9” black and white
photographic prints are available for any photographs or illustrations
appearing in this copy for an additional charge. Contact UMI directly to
order.
A P r o b a b ility P r o g r a m m in g L an gu age:
D e v e lo p m e n t a n d A p p lic a tio n s
A D isse rtatio n
P resen ted to
T h e F a cu lty of th e D e p a rtm e n t of A p p lie d Science
T h e College of W illiam M a ry in V irginia
In P a rtia l Fulfillm ent
O f th e R eq u ire m e n ts for th e D egree of
D octor of P h ilo s o p h y
by
A ndrew G. G len
UMI Number: 9904264
C o p y r i g h t 1 9 9 9 b y G l e n , A n d r e w G o r d o n
All rights reserved.
UMI Microform 9904264
Copyright 1998, by UMI Company. All rights reserved. This microform edition is protected against unauthorized
copying under Title 17, United States Code.
UMI
300 North Zeeb Road Ann Arbor, MI 48103
A P P R O V A L S H E E T
This D isse rtatio n is s u b m it te d in p a r tia l fulfillment of
th e re q u ire m e n ts for th e Degree of
D o c to r of P hilosophy
A nd rew G. G len
A P P R O V E D . J a n u a r y 1998
Low
'Mad
I
Law rence M. Leemis, D isse rtatio n Advisor
/L S
/
A
. J S A n / V . . X S . ^ J o h n H. Drew S id n ey H. Law rence Rex K. K incaid / \ AA^ D onald R. B arr, O u ts id e E x a m in e rC o n ten ts
1
I n tro d u c tio n
2
1.1 G eneral ... 2 1.2 L i te r a tu re r e v i e w ... 6 1.3 O u tlin e of th e d i s s e r t a t i o n ... 7 1.4 N o ta tio n a n d n o m e n c la tu re ... S2
S oftw are D e v e lo p m e n t
10
2.1 T h e c o m m o n d a t a s t r u c t u r e ... 132.2 C o m m o n con tin u o u s, u n iv a riate d i s t r i b u t i o n s ... 16
2.3 T h e six re p re s e n ta tio n s of d i s t r i b u t i o n s ... IS 2.4 V e r i f y P D F ... 21 2.5 M e d i a n R V... 24 2.6 Dis playRV ... 25 2.7 P l o t D i s t ... 26 2.S E x p e c t a t i o n R V... 28 2.9 T r a n s f o r m ... 29
2.10 O r d e r S t a t ... 31 2.11 P r o d u ct R V a n d P r o d u c t l l D ... 33 2.12 SumRV a n d S u m l l D ... 36 2.13 MinimumRV... 3S 2.14 MaximumRV... 39 2.15 M a x im u m likelihood e s t i m a t i o n ... 41
3 T ra n sfo rm a tio n s o f U n iv a ria te R an d o m V a ria b les
44
3.1 I n t r o d u c t i o n ... 44 3.2 T h e o r e m ... 47 3.3 I m p l e m e n t a t i o n ... 50 3.4 E x a m p le s ... 53 3.5 C o n c l u s i o n ... 594
P ro d u c ts o f R a n d o m V ariab les
61
4.1 I n t r o d u c t i o n ... 61 4.2 T h e o r e m ... 62 4.3 I m p l e m e n t a t i o n ... 65 4.4 E x a m p le s ... 68 4.5 C o n c l u s i o n ... 755
C o m p u tin g th e C D F o f th e K o lm o g o ro v -S m irn o v T est S ta tis tic
76
5.1 I n t r o d u c t i o n ... 76•5.3 C o m p u tin g th e d is tr i b u tio n of D n ... 79
5.3.1 P h a s e 1: P a r t i t i o n th e s u p p o r t of D n — ^ ... S2 5.3.2 P h a s e 2: Define th e A m a t r i c e s ... S3 5.3.3 P h ase 3: Set lim its on th e a p p r o p r ia te i n t e g r a l s ... S9 5.3.4 P h ase 4: Shift th e d i s t r i b u t i o n ... 96
5.4 C ritic a l values a n d significance l e v e l s ... 99
5.5 C o n c l u s i o n ... 100
6 G o o d n e ss o f F it u sin g O rd e r S ta tis tic s
102
6.1 I n t r o d u c t i o n ... 102 6.2 T h e V - v e c t o v... 104 6.3 Im p ro v in g c o m p u ta ti o n of V ... 106 6.4 Goodness-of-fit t e s t i n g ... 109 6.5 C o n c l u s i o n ... 1137 O th e r A p p lic a tio n s a n d E x am p le s
115
7.1 I n t r o d u c t i o n ... 115 7.2 E x a c tn e s s in lieu of CLT a p p r o x i m a t i o n s ... 116 7.3 A m a t h e m a t i c a l resource g e n e r a t o r ... 1217.4 P ro b a b ilis tic m ode l design: reliability block d i a g r a m s ... 125
7.5 M o deling w ith h a z a rd f u n c t i o n s ... 127
7.6 O u tli e r d e t e c t i o n ... 132
8 C onclusion a n d F u rth e r W o rk
140
A T h e A rc ta n g e n t Survival D is trib u tio n
143
A .l I n t r o d u c t i o n ... 143
A .2 D evelopm ent ... 145
A .3 P ro b a b ilis tic p r o p e r t i e s ... 14S A.4 S ta tis tic a l i n f e r e n c e ... 150
A.5 C o n c l u s i o n ... 15S
B C o n tin u o u s D istrib u tio n s
159
C A lg o rith m for 6 x 6 C onversions
161
D A lg o rith m s for V arious P ro c e d u re s
163
D .l A lg o rith m for V e r i f y P D F ... 163 D.2 A lg o rith m for E x p e c t a t i o n R V ... 164 D.3 A lg o rith m for O r d e r S t a t ... 165 D.4 A lg o rith m for P r o d l l D ... 166 D.5 A lg o rith m for S u m R V ... 167 D.6 A lg o rith m for S u m l l D ... 168D.T A lg o rith m for MinimumRV... 169
D.S A lg o rith m for MaximumRV... 170
D.9 A lg o rith m for MLE... 171
F A lg o rith m fo r
ProductRV174
G
A lg o rith m fo r
KSRV180
H S im u la tio n C o d e for M LEO S A p p ro x im a tio n s
184
B ib lio g rap h y
187
A ck n o w led g em en t
I m u s t g ra te fu lly acknow ledge t h e assistan ce of m a n y p e o p le w ho have h elp ed in
very m a n y ways in th e p r o d u c tio n of th is d is s e rta tio n a n d re search . Professors H a n k
Krieger. M a r in a K o n d ra to v itc h gave help in specific p a r t s of th is research. A special
no te of p erso n al th a n k s goes to P rofessors Donald B a rr, J o h n D rew , a n d especially
Larry Leemis. who were tr e m e n d o u s ly p a tie n t, insig h tfu l, a n d s u p p o r tiv e of t h e re
search. I t h a n k for th e ir p a tie n c e a n d su p p o rt m y wife, Lisa G le n , a n d m y children
A n d rea. R eb ec ca, and Mary. I also th a n k and praise a l m ig h t y G o d , for it is only
by His d iv in e will t h a t we are privileged to u n d e r s ta n d w h a t we hav e learned in this
»o
»
o
L ist o f F igu res
3.1 T h e tra n s fo rm a tio n Y = <7(A") = X 2 for — 1 < A" < 2 ... 54
3.2 T h e tra n s fo rm a tio n Y = g [ X ) = ||A’ — 3| — 1| for 0 < X < 7 ... 55
3.3 T h e tra n s f o r m a tio n Y = g ( X ) has a d iscontinuity a n d is variously I —to—1 a n d 2 - t o - l on different s u b s e ts of th e s u p p o r t of A"... 57
3.4 T h e tra n s fo r m a tio n Y = <7(A") = sin2(.A) for 0 < A’ < 2 ~ ... 58
4.1 T h e s u p p o r t of X an d Y w h en ad < be... 64
4.2 T h e m a p p in g of Z = X a n d 1’ = AT" when ad < be... 65
4.3 T h e P D F of I ' = A T ' for A" ~ N ( 0 . 1) and Y ~ N ( 0 . 1)... 72
.1 T h e C D F of th e D$ r a n d o m v a ria b le ... 98
.2 T h e P D F of th e £>6 r a n d o m variable. N ote th e d is c o n tin u ity a t y = 1/6. 99
6.1 T ra n s fo rm a tio n s from iid o b s e rv a tio n s A'i, A ^ , . . . ,A 'n to t h e s o rte d
■P-vector elem ents F(i). P(2), • • • , P(n)...
6.2 E s ti m a t e d power fu n ctio n s for te s t in g Ho'. X ~ N ( 0 , 1) versus Hi:
X ~ N(0.<72) using K - S , A - D , a n d two s ta tistic s b ased on t h e
7.1 O v erlaid plots of f z - { x ) a n d t h e s t a n d a r d n o rm al P D F ... 120
7.2 O v e rla id plots of th e s t a n d a r d n o rm a l a n d s ta n d a r d i z e d IG(O.S) d is tr i
b u t i o n s ... 123
7.3 R B D of a co m p u te r s y s te m w ith tw o processors a n d t h r e e m e m o ry units. 126
7.4 T h e S F of th e hypothesized B T - s h a p e d h a z a rd fu n c tio n fit to th e s a m
ple [1. 11. 14. 16. 17] overlaid on t h e e m p irica l S F ... 129
7.5 T h e P D F of th e d is tr ib u tio n h a v in g perio d ic h a z a r d fu n c tio n k \ w ith
p a r a m e te r s a = 1. 6 = 0.5 a n d c = 10... 131
7.6 T h e P D F s of th e four o rd e r s ta t is tic s from an e x p o n e n tia l d is trib u tio n . 13S
A .l E x a m p le s of th e a rc ta n g e n t p r o b a b il it y d en sity f u n c ti o n ... 147
A .2 E m p ir ic a l, fitted a rc ta n g e n t, a n d fitted Weibull s u rv iv o r functions for
t h e ball b e a rin g lifetim es... 153
List o f T ables
5.1 C o m p u t a t i o n a l requirem ents for c o m p u tin g th e Dn C D F for sm all n.
5.2 C o m p u t a t i o n a l efficiency associated w ith using th e F a n d V a rray s. .
5.3 C D F s of D n — ^ for n = 1. 2 , . . . . 6 ...
6.1 E s ti m a t e d critica l values for P, a t various sam ple sizes a n d levels of
sig nificance...
7.1 F ractiles for e x a c t and a p p ro x im a te d d is tr ib u tio n s ...
7.2 P ( A (6) > 10) for n = 6 for several p o p u la tio n d is tr i b u tio n s ...
7.3 T h e M S E s of th e M LE and M LEO S an d ad ju sted-for-bias M L E te c h
niques of p a r a m e t e r e s tim a tio n ...
A .l K o lm o g o ro v -S m irn o v Goodness-of-fit S ta tis tic s for th e Ball B earin g
D a t a ...
B .l C o n tin u o u s d is trib u tio n s of ra n d o m variables available in A P P L . . . .
A b stra ct
A p ro b ab ility p ro g ra m m in g la n g u a g e is developed a n d p r e s e n te d : a p p lic a tio n s illus t r a t e its use. A lg o rith m s a n d g e n e ra liz e d th e o re m s u sed in p r o b a b i l i t y a re e n c a p s u lated into a p ro g ra m m in g e n v i r o n m e n t w ith th e c o m p u te r a l g e b r a s y s t e m M a p le to provide th e ap p lied c o m m u n it y w ith a u t o m a t e d p r o b a b ility c a p a b ili tie s . A lg o rith m s of procedures are p re s e n te d a n d e x p la in e d , inclu d in g d e ta ile d p r e s e n t a t i o n s on th r e e of th e m ost significant p ro ced u res. A p p lic a tio n s t h a t e n c o m p a s s a w id e ra n g e of ap p lied topics including goodness-of-fit te s t in g , p ro babilistic m o d e lin g , c e n t r a l lim it th e o re m a u g m e n ta tio n , g e n e ra tio n of m a t h e m a t i c a l resources, a n d e s t i m a t i o n a r e p re s e n te d .
A P r o b a b ility P r o g r a m m in g L a n g u a g e:
D e v e lo p m e n t a n d A p p lic a tio n s
C h ap ter 1
In tr o d u ctio n
1.1
G e n e r a l
P ro b a b ility theory, as it exists today, is a vast collection of axio m s a n d th e o re m s th a t,
in essence, provides th e scientific co m m u n ity m any c o n trib u tio n s , including:
• th e n a m in g a n d d e s c rip tio n of ra n d o m variables t h a t o c c u r fre q u e n tly in ap p li
cations.
• th e th e o re tic a l resu lts asso c ia te d with these r a n d o m variables, an d ,
• th e a p p lie d results a s s o c ia te d w ith these ra n d o m v aria b les for s ta tis tic a l ap p li
cations.
No one volum e categorizes its work in exac tly th e se t h r e e ways, b u t th e l i t e r a t u r e ’s
com p re h en siv e works ac c o m p lish th e se goals. W h e t h e r v o lu m in o u s , such as th e work
of Joh n so n , K otz. an d B a la k ris h n a n (1995), or succinc t, su c h as t h a t of Evans,
3
ings, an d P eacock (1993). one finds all th r e e of th e se areas p r e s e n te d in c h a p t e r s th a t
are organized on t h e first co n trib u tio n above, n a m in g a n d d e s c rib in g t h e r a n d o m vari
ables. W orks such as Hogg an d C raig (1995), P o r t (1994). a n d D a v id (19S1) organize
their efforts ac c o rd in g to t h e second c o n trib u tio n , covering th e o re tic a l re s u lts th a t
apply to r a n d o m variables. T h e n th e re a re t h e works such as Law a n d K e lto n (1991),
Lehm ann (19S6). a n d D ’A gostino a n d S te p h e n s (1986) who c o n c e n t r a t e on th e s ta
tistical a p p licatio n s of ra n d o m variables, a n d ta ilo r th e ir e x p l a n a ti o n s of prob ab ility
th e o ry to th e p o rtio n s of th e field t h a t hav e a p p lic a tio n in s t a t i s t i c a l analysis.
In all these works, as well as countless o th e rs , one s ta rk om issio n is a p p a re n t.
T h e re is no m e n tio n of an ability to a u t o m a t e t h e n am in g , p ro cessin g , o r a p p licatio n
of ra n d o m variables. T h is omission is even m o re profound w h e n o n e c o n sid ers the
tedious n a t u re of t h e m a th e m a tic s involved in t h e ex e c u tio n o f m a n y of th e se results
for all b u t th e sim p le s t of exam ples. In p ra c tic e , t h e level of t e d i u m m a k es the
a c tu al execu tio n u n te n a b le for m any r a n d o m variables. A u to m a tio n of c e r t a in types
of these p rocedures could e ra d ic a te th is te d iu m . T h e r e is an a b u n d a n c e of s ta tis tic a l
softw are packages t h a t give th e scientific c o m m u n it y powerful tools to a p p l y s ta tis tic a l
procedures. B u t to d a te , th e re is no package t h a t a t t e m p t s to a u t o m a t e t h e more
th e o re tic al side of t h e p ro b a b ilis t's work. E ven t h e sim p lest of ta sk s , p lo t tin g a fully
specified p ro b a b ility d e n s ity function, is n o t p rovided in m a n y s t a t i s t i c a l packages.
In o rd e r to plot new . ad-hoc densities or C D F s . one is often r e q u ir e d t o w rite an
4
A c o n c e p tu a l p ro b a b il it y softw are package is now p r e s e n te d t h a t begins to fill this
existii g gap. Before o u tl in in g th is w ork's a p p ro a c h , let us p re s e n t s o m e exam p les to
illu s tra te w h a t sh ould b e in c lu d e d in such a p ro b a b ility s o ftw are package.
C o n s id e r th e following i n d e p e n d e n t r a n d o m variables: W ~ g am m afA . k), Z ~
N(/x,cr), Y ~ Weibullf A. k ). A’ a n d R ~ arctan(<z>. a ) , a n d D . T . U . a n d V as s p e c
ified in th e q u estio n s below . See A p p e n d ix A for m o re in f o rm a tio n on th e a r c t a n
d is tr ib u tio n . • W h a t is th e d is tr i b u tio n of V’ = W + X + K? • W h a t is t h e d i s t r i b u t i o n of T = X • l n ( W 2) + eYZ? • W h a t is th e d i s t r i b u t i o n of a r a n d o m d is ta n c e D, w hich is t h e su m of th e p r o d u c t of r a n d o m ra te s R i . R2 R n a n d ra n d o m ti m e s T1. T2 Tn. i.e., D = Ri • T\ + /?2 • T2 + • • • + R n ■
TvZ-• W h a t is th e d is tr i b u tio n of th e s y s te m lifetim e U in a re lia b ility block d ia g ra m
c o n ta in in g two p a ra lle l blocks of tw o s u b s y s te m s t h a t con sist of tw o co m p o n e n ts
in series, i.e.. U = m a x { m in { V ’. W } , m in{V ’. Z } } ?
• W h a t is th e exact u p p e r ta il p ro b a b ility for th e s t a t i s t i c 6.124 a sso ciated w ith
t h e d is tr i b u tio n of t h e 4th o rd e r s t a t i s t i c o u t of a s a m p l e of 12 iid observations
t h a t h a v e th e s a m e d is tr i b u tio n as U?
5
• How could one e m p lo y th e P D F s of o rd e r s ta tis tic s of a p o p u la tio n , in s te a d of
th e P D F of t h e p o p u la tio n itself, to develop an a l t e r n a t e a p p ro a c h to m a x im u m
likelihood e s tim a tio n ?
• W h a t is th e d is tr i b u tio n of th e m a x im u m likelihood e s t i m a t o r of th e inverse
G a u s sia n r a n d o m v a ria b le's first p a r a m e te r /j. w hen a s a m p le size of n is sp eci
fied?
• M ost im p o r ta n tly , if o n e could find th e se answ ers, w h a t is th e ir u tility to th e
s ta tis tic a l a n d a p p lie d science co m m u n ity ?
T h e r e is no im p lic a tio n t h a t t h e previously cited a u th o rs a re rem iss in n e g lect
ing th e a u t o m a t i o n of p r o b a b ility softw are. In fact it is only w ith th e advent a n d
m a tu rin g of c o m p u te r a lg e b r a sy ste m s , such as M aple an d M a th e m a ti c a , th a t th e re
now exists th e a b ility to a u t o m a t e p ro b ab ilistic m o deling an d research. This d o c
toral research a n d d is s e rta tio n will ta k e a d v a n ta g e of this rela tiv e ly new technology
by d eveloping a n d p re s e n tin g a softw are "engine" t h a t c o n t ri b u te s to th e fields of
probab ilistic m o d e lin g a n d s ta tis tic a l applications. T his research has c o n c e n tra te d
on p ro c ed u res in th e sy m b o lic lan g u ag e M aple V.
T h e specific c o n trib u tio n s to th e a p p lie d p ro b ab ility a n d s ta t is tic s c o m m u n ity of
this research a n d d is s e rta tio n inc lu d e t h e following:
1. D e ta ile d alg o rith m s t h a t co m p ris e t h e con ce p tu al softw are.
2. G en e ra liz e d versions of th e o re m s t h a t com prise th e softw are. [Note th a t while
6
a p p e a r difficult to im p le m e n t in an a u t o m a te d en v iro n m en t.]
3. An a l g o ri th m t h a t p roduces t h e exact C D F of t h e K olm ogo ro v -S m irn o v test
s ta tis tic .
4. D etailed e x p la n a tio n s a n d e x am p les of th e s o ftw are's cap abilities.
5. A p p lic a tio n ex am ples th a t c o n trib u te , on th e ir own. to various areas w ithin
p ro b a b ility a n d statis tic s.
6. E x p lo r a tiv e exam p les of probabilistic q uests t h a t a p p e a r to be difficult to carry
out w ith o u t a u to m a tio n .
7. E x ten sio n s of te stin g s ta tis tic a l hypotheses, to in c lu d e specific c o n trib u tio n s in
th e area s of outlier d e te c tio n , goodness-of-fit. a n d p a r a m e t e r estim atio n .
S. D e m o n s tr a tio n s in th e g eneral area of p ro b a b ilis tic m odel design, to include
specific c o n trib u tio n s in th e a rea s of survival d is tr ib u tio n s , reliability block di
agram s. e x a c t solutions to c e n tra l limit th e o re m (C L T ) applications, and e s ti
m a tio n .
1.2
L ite r a tu r e r e v ie w
W hile th e c u r r e n t lite r a tu r e will be reviewed th r o u g h o u t th e d is s e rta tio n , th e re are
a num b e r of works th a t should be m e n tio n ed for th e i r gen era l ap p licab ility to this
research. T h e s e include th e works of Johnson, K otz, a n d B a la k ris h n a n (1995), Leemis
th e o ry b e h in d th e a lg o rith m s a n d im p le m e n ta tio n . T h e review of t h e l i t e r a t u r e has
d iscovered no p u b lic a tio n on im p le m e n tin g p ro b a b ilis tic p rocedures in c o m p u t e r a l
g e b r a la nguages, nor on th e benefits of such a p a ra d ig m . D a ta b a se s sea rc h e d in c lu d e
F ir s tS e a rc h , IN N O P A C . I N S P E C . D T IC , N T I C . Science C ita tio n Index. L ib ra ry of
C ongress. Swem L ibrary at T h e College of W i lli a m k M ary, an d th e U S M A lib r a ry
a t W est P o in t. NY. Search s trin g s in c lu d ed t h e following in d iv id u a l s u b je c t are a s a n d
pairs of s u b je c t areas, w here a p p ro p r ia te : d is tr i b u tio n s , goodness-of-fit. life te s tin g ,
M aple, m odeling, o rd er s ta tis tic s , p robability, reliability, a n d sym bolic algebra. W h ile
th e re a re m a n y listings u n d e r th e se te rm s a n d pairings of these te rm s , no w ork was
found a b o u t com bining p ro b a b ilis tic results w ith c o m p u te r a lg e b ra i m p le m e n ta t io n .
T h e n e g a tiv e result of this search in dicates t h a t th e re is a lack of archival m a te r ia l
on th e s u b je c t .
1.3
O u tlin e o f t h e d is s e r ta tio n
T h is d is s e r ta tio n is p re sen te d acc o rd in g to t h e following o utline. In C h a p t e r 2 th e
d e v e lo p m e n t, abilities, a n d e x a m p le s of use o f th e softw are la nguage a re p r e s e n te d .
C h a p t e r 3 co n ta in s th e d e v e lo p m e n t of th e p r o c e d u r e t h a t a c c o m m o d a te s t r a n s f o r m a
tions of r a n d o m variables to in c lu d e a r e - s ta t e d , general, im p le m e n ta b le t h e o re m for
such work. In C h a p t e r 4 a p ro c e d u re for finding t h e d is tr ib u tio n of t h e p r o d u c t of tw o
in d e p e n d e n t co ntinuous r a n d o m variables is p r e s e n te d . In C h a p t e r 5 a p r o c e d u r e t h a t
s
specified sa m p le size, is p re s e n te d . C h a p t e r 6 co n tain s an ap p lic a tio n in w h ic h a new
goodness-of-fit te st p r o c e d u r e is p re s e n te d and te s te d using th e softw are. C h a p t e r 7
is a collectio n of e x a m p le s of e x p lo ra tio n s in th e fields of p ro b a b ility a n d s ta t is tic s
t h a t a r e now possible d u e to t h e softwaxe. Finally, in C h a p t e r S. c o n clu sio n s an d
s u g g e s tio n s of fu r th e r work a re given. In th e a p p en d ices are listed t h e a l g o r i th m s for
t h e so ftw a re , as well as d o c u m e n t a t i o n of th e early work in c re a tin g new p r o b a b ilis tic
m o d e ls.
1 .4
N o t a t io n a n d n o m e n c la tu r e
T h is s e c tio n reviews c e r t a in n o ta tio n a n d n o m e n c la tu re used here. Use is m a d e of
t h e following a cro n y m s a n d fu n c tio n a l n o ta tio n for d en sity re p re s e n ta tio n s :
• p ro b a b ility d e n s ity fu n c tio n ( P D F ) f x i * )
-• c u m u la tiv e d is tr i b u tio n fu n ctio n (C D F ) F.y(x) = f x { * ) ds,
• s u rv iv o r function (S F ) 5.\'(-r) = I — F.v(x).
• h a z a r d fu nction ( H F ) h. \{x) =
• c u m u la tiv e h a z a rd fu n c tio n (C H F ) Hx [ x ) = j l ^ h x i s ) ds, an d
• in v e rse d is tr ib u tio n f u n c tio n ( ID F ) F ^ l (x).
T h r o u g h o u t th e d is s e r ta tio n , th e pro p o sed softw are is referred to as “a p r o b a b i l i t y
9
a re used to refer to P D F s (an d o th e r functions) t h a t can be c o n s tr u c t e d by piecing
to g e th e r various s t a n d a r d functions, such as p olynom ials, lo g a rith m s, e x p o n en tials,
a n d trig o n o m e tric fu n c tio n s , e.g., th e tr i a n g u la r ( l , 2. 3) d i s t r i b u t i o n which has two
s eg m en ts o r tw o pieces, each of which are linear fu nctions. T h e c o m m o n a b b r e v i
a tio n “N{fj., <r)’’ is u sed to refer to th e norm al d is trib u tio n . N o te t h a t th e second
p a r a m e te r is th e s t a n d a r d d e v iatio n , not th e variance. Also, “ 11(0. 6)” is used to rep
resent th e unifo rm d is tr i b u tio n w ith paxam eters a a n d 6. S u b s c rip ts in p aren th eses
re p re s e n t o rd e r s t a t is tic s , e.g. t h e r th o rder s ta tis tic a sso ciated w ith a ra n d o m sa m p le
A 'i,A '2 X n is d e n o t e d by A"(r ). T h e a b b re v ia tio n “iid ” is u sed to d enote in d e
p e n d e n t a n d id e n tic a lly d is tr i b u te d ra n d o m variables. T h e t e r m s “fully-specified,”
“sem i-specified," a n d “unspecified’’ are used to d escrib e th e d e g re e to which p a r a m
e te rs are specified as c o n s ta n ts or fixed p a ra m e te rs in a d i s t r i b u tio n . For e x a m p le ,
th e ex p o n e n tia l! 1) d i s tr i b u tio n is a fully specified d is tr ib u tio n . T h e W e ib u ll(l, k)
a n d th e N(0. a) a re b o th semi-specified d is trib u tio n s . T h e tr i a n g u la r ( a . b. c) a n d
exponential!A ) d is tr i b u tio n s are b o th unspecified. T y p e w r it e r font is used to r e p
resent M aple la n g u a g e s ta t e m e n t s . For exam p le “> X := U n ifo r m R V (0 , 1 ) ; " is a
M ap le a s sig n m e n t s t a t e m e n t . N ote th a t the sym bol “>” re p re s e n ts th e M aple in p u t
C h ap ter 2
Softw are D ev e lo p m en t
T h e n o tio n of p ro b ab ility softw are is different from th e notion of a p p lie d s ta tis tic a l
softw are. P ro b ab ility th e o ry is rife w ith th eorem s a n d c a lc u la tio n s t h a t re q u ire s y m
bolic. algebraic m a n ip u la tio n s . A pplied s ta tis tic a l calcula tions a r e u su ally num e ric
m a n ip u la tio n s of d a t a based on know n formulas associated w ith d is t r i b u t i o n s of c o m
m on ra n d o m variables. T h is section contains a discussion on several a lg o rith m s t h a t
c o n t ri b u te to th e d e v elo p m en t of A P P L . A vailability of c o m p u te r a lg e b r a system s
s uch as M ap le and M a th m a t i c a facilitate th e d e v elo p m en t of s o ftw are t h a t will derive
fu nctions, as opposed to c o m p u tin g num bers.
P r o b a b ility softw are m u s t, a t th e m ost basic level, be a m e an s o f p r o d u c in g dis
tr i b u tio n s of ra n d o m variables. A t th e h eart of th e softw are m u s t re s id e a n “en g in e ”
t h a t c an c o m p u te new. useful re p re sen tatio n s of d is trib u tio n s .
T h e derivation of e x a c t d is tr ib u tio n functions of com plex r a n d o m v aria b les is often
u n te n a b le . In such cases, one h a d to be co n ten t w ith a p p r o x im a tio n s a n d s u m m a r y
11
s ta t is tic s of th e unk n o w n d is trib u tio n s , reg a rd le s s of w h e th e r those a p p r o x im a ti o n s
a n d s u m m a rie s were a d e q u a te . For e x a m p le , o n e often a p p ro x im a te s d is tr i b u tio n s u s
ing M o n te C arlo s im u latio n or by invoking t h e c e n tra l lim it theorem . S ta t is tic s such
as t h e sa m p le m ean a n d v aria nce of th e a p p r o x i m a t e d d is trib u tio n are th e n r e p o r te d .
If w h a t was really needed was a c e rta in p e r c e n til e of t h e a p p ro x im a te d d i s t r i b u t i o n ,
often tim e s th e en tire sim u la tio n would need to be re m o d e le d , re-validated, re-verified,
a n d re-ru n to o b ta in th e n eed e d in fo rm a tio n . A re su lt su ch as a fully-specified P D F
w ould e ra d ic a te th e need for such r e d u n d a n t efforts. O n e also would have a n a l y t
ical resu lts to rep resent ch a ra c te ristic s of c e r t a i n co m p lex ra n d o m variables w hose
fully-specified functions are u n te n a b le . For e x a m p le , renew al th e o ry a n d c o m p o u n d
P oisson process th e o ry have results t h a t d e r iv e th e m e a n a n d variance of c o m p le x dis
trib u tio n s . b u t fall sh o rt of a c tu a lly d e t e r m in i n g th e e n tir e r e p re s e n ta tio n of c o m p le x
d is tr i b u tio n s via a P D F . C D F . or some o t h e r form of t h e d is trib u tio n . T h e p ro p o s e d
p ro b a b ilis tic software is designed to m ake a b r e a k t h r o u g h into th e a re a of c o m p le te ly
d e s c rib in g com plex d is trib u tio n s w ith P D F s . C D F s . a n d th e like, th e re b y p r o v id in g
in c re ased m odeling c a p a b ility to th e a n a ly s t.
A t th e m o st general level, one could a t t e m p t to find d is trib u tio n s of in t r ic a t e t r a n s
fo rm a tio n s of m u ltiv a ria te , d e p e n d e n t r a n d o m variables. T h e software d e s c rib e d here
is lim ite d to u nivariate, con tin u o u s, i n d e p e n d e n t r a n d o m variables, a n d t h e c o m p le x
tr a n s f o r m a tio n s a n d co m b in a tio n s t h a t can r e s u lt b e tw e e n in d e p e n d e n t r a n d o m v a ri
ables. A set of a lg o rith m s t h a t derives f u n c tio n a l re p re s e n ta tio n s o f d is t r i b u t i o n s
12
s en ted . Specifically, a lg o rith m s have been d ev e lo p e d t h a t will c o n d u c t t h e following
op e ra tio n s :
• s u p p ly a c o m m o n d a t a s t r u c t u r e for th e d is t r i b u t i o n s of co n tin u o u s , u n iv a ri
ate . ra n d o m varia b les— in c lu d in g d is tr i b u tio n fu n c tio n s t h a t m a y b e defined
piecew ise, e.g. t h e tr i a n g u la r d is trib u tio n .
• c o n v e rt any fu n c tio n a l r e p r e s e n ta tio n of a r a n d o m variable into a n o t h e r fu n c
tio n a l re p r e s e n ta tio n using t h e com m on d a t a s t r u c t u r e , i.e. allowing co nversion
a m o n g s t th e P D F . C D F . S F . H F . C H F , a n d ID F ,
• verify th a t th e a r e a u n d e r a c o m p u te d P D F is one,
• prov id e s tr a ig h tfo rw a rd in s ta n tia tio n of w ell-know n d is trib u tio n s , s u c h as th e
ex p o n e n tia l, n o rm a l, u n iform , a n d Weibull d is tr i b u tio n s , with e ith e r n u m e ri c or
sy m b o lic p a ra m e te rs .
• d e t e r m in e th e d is tr i b u tio n of a sim ple t r a n s f o r m a tio n of a c o n tin u o u s r a n d o m
variable. Y = g ( A ')— in c lu d in g piecewise, c o n tin u o u s tra n s fo rm a tio n s ,
• d e t e r m in e co m m o n s u m m a r y ch ara c te ristic s of r a n d o m variables, s u c h as th e
m e a n , variance, o t h e r m o m e n ts , and so fo rth ,
• c a lc u la te th e P D F of su m s of in d e p e n d e n t r a n d o m variables, i.e. Y = X + Z.
• c a lc u la te th e P D F of p r o d u c ts of in d e p e n d e n t r a n d o m variables, i.e. Y = X Z ,
• c a lc u la te th e P D F of th e m i n im u m a n d m a x i m u m of in d e p e n d e n t r a n d o m vari
13
• c a lc u la te th e P D F of th e r th o rd e r s ta tis tic from a s a m p le of n iid r a n d o mvariables.
• c a lc u la te probabilities a sso ciated with a ra n d o m variable.
• g e n e r a te ra n d o m variates asso ciated with a r a n d o m variable.
• plot any of th e six fu n c tio n a l forms of any d is tr ib u tio n , e.g. th e H F or C D F .
• p ro v id e basic s ta tistic al abilities, such as m a x im u m likelihood e s ti m a t io n , for
d is trib u tio n s defined on a single segm ent of s u p p o r t,
• c o m p lim e n t th e s tr u c t u r e d p ro g ram m in g language t h a t hosts t h e softw are (in
this case M aple) so th a t all of th e above m e n tio n e d p ro c e d u re s m a y be u sed in
m a th e m a tic a l and c o m p u te r p ro g ra m m in g in t h a t language.
2.1
T h e c o m m o n d a ta str u c tu r e
Im plicit in a pro b ab ility softw are language is a c o m m o n , su ccin c t, in tu itiv e , a n d
m a n ip u la t a b le d a t a s tr u c t u re for describing th e d is tr i b u tio n of a r a n d o m variable.
T h is im plies th e re should be one d a t a s tr u c tu re t h a t ap p lie s to th e C D F . P D F , S F ,
H F . C H F . a n d ID F . T h e c o m m o n d a t a s tr u c tu re used in th is softw are is referred to as
th e “list-of-lists.”1 Specifically, a n y functional r e p r e s e n ta tio n of a ra n d o m v a ria b le is
p re s e n te d in a list th a t co n tain s t h r e e sub-lists, each w ith a specific p u rpose. T h e first
sub-list c o n ta in s th e o rd ered fu n ctio n s t h a t define th e s e g m e n ts of th e d is tr i b u tio n .
1 4
two linear functions t h a t com prise th e tw o seg m e n ts of its P D F for its first s u b
list. Likewise, th e C D F re p re sen tatio n of th e tr ia n g u la r d i s tr i b u tio n w ould have
th e two q u a d r a t ic fu nctions th a t c o m p rise th e two segm ents of its C D F for its first
sub-list. T h e second sub-list is an o r d e re d list of real n u m b e rs t h a t d e lin e a te th e
end points of t h e s egm ents for th e fu n c tio n s in t h e first sub-list. T h e e n d poin t of
each segm ent is a u to m a tic a lly th e s t a r t p o in t of th e succeeding se g m e n t. T h e th ir d
sub-list in d icate s w h a t d is trib u tio n form t h e functions in th e first s u b -list represen t.
T h e first e le m e n t of th e th i r d sub-list is e i t h e r t h e s trin g C o n t i n u o u s for co ntinuous
d is trib u tio n s or D i s c r e t e for discrete d is tr ib u tio n s . T h e second e l e m e n t of t h e th ir d
sub-list shows w hich of t h e 6 functional form s is used in th e first s u b -lis t. T h e s trin g
PDF. for ex a m p le , in dicates th e list-of-lists is c u rre n tly a P D F list-of-lists. Likewise,
CDF indicates t h a t a C D F is being re p re s e n te d .
E xam ples:
• T h e following M aple s ta t e m e n t assigns th e variable X to a list-of-lists t h a t rep
resents t h e P D F of a U(0. 1) r a n d o m variable:
> X := [ [ x -> l ] , [ 0 , l ] , [ ' C o n t i n u o u s ' , ' P D F ' ] ] ;
• T h e tr i a n g u la r d is trib u tio n has a P D F w ith two pieces to its d is tr i b u tio n . T h e
following s t a t e m e n t defines a triangularfO , 1. 2) ra n d o m v a ria b le X as a list-of-
lists:
> X := [ [ x -> x , x -> 2 - x] , [ 0 , 1 , 2 ] , [ ' C o n t i n u o u s ' , ' P D F ' ] ] ;
1 5
its h a z a rd fu n ctio n w ith th e s ta t e m e n t :
> X := [ [ x -> 1 / 2 ] , [ 0 , i n f i n i t y ] , [ ' C o n t i n u o u s ' , ' H F ' ] ] ;
• Unspecified p a r a m e te r s can be re p re s e n te d symbolically. A N (0 . 1) r a n d o m
variable X can be defined w ith th e s t a t e m e n t :
> X := [ [ x -> e x p ( - ( x - t h e t a ) 2) / s q r t ( 2 * P i ) ] , [ - i n f i n i t y , i n f i n i t y ] , [ ' C o n t i n u o u s ' , ' P D F ' ] ] ;
• T h e p a r a m e te r sp ace can be specified by using th e M aple a s s u m e fu n c tio n . C onsider th e r a n d o m variable T w ith H F
for A > 0. T h e r a n d o m variable T c an b e defined by th e s t a t e m e n t s :
> a s s u m e ( l a m b d a > 0 ) ;
> T := [ [ t -> l a m b d a , t -> l a mb da * t ] , [ 0 , 1, i n f i n i t y ] , [ ‘C o n t i n u o u s ' , ' H F ' ] ] ;
• T h e s y n ta x allows for t h e e n d p o in ts of t h e s e g m e n ts associated w ith t h e s u p p o r t
o f th e r a n d o m varia b le to be specified sym bolically. A U (a. b) r a n d o m v a ria b le
X is defined by:
> X := [ [ x -> 1 / (b - a ) ] , [ a , b ] , [ ' C o n t i n u o u s ' , ' P D F ' ] ] ;
• No error checking is p erfo rm e d w hen a d i s t r i b u t i o n is defined. T h i s m e a n s t h a t 0 < t < 1
A t t > 1
16
P D F .
> X := [ [ x -> 6 ] , [ 0 , 5 ] , [ ' C o n t i n u o u s ' , ' P D F ' ] ] ;
S om e e r ro r checking will be perfo rm e d by t h e p ro c e d u re VerifyPDF. w hich is
p re s e n te d in a s u b se q u en t section.
2.2
C o m m o n c o n tin u o u s , u n iv a r ia te d is tr ib u tio n s
S y n t a x : T h e c o m m a n d
> X := RandomVariableNameKVCParameterSequence) ;
assigns to th e variable X a list-of-lists re p re s e n ta tio n of t h e specified r a n d o m variable.
T h e a r g u m e n ts in ParameterSequence may be real, integer, or string (for sy m b o lic
p a ra m e te rs ).
P u r p o s e : In cluded in th e p r o to ty p e softw are is t h e a b ility to i n s ta n tia te c o m m o n
d is trib u tio n s . W h ile th e list-of-lists is a functional form th a t lends itself to t h e m a t h
e m a tic s of th e softw are, it is not an in s ta n tly recognizable form for r e p r e s e n t in g a
d is tr ib u tio n . H ere is provided a n u m b e r of sim p le p ro ced u res th a t ta k e re la tiv e ly
co m m o n definitions of d is trib u tio n s a n d convert t h e m to a list-of-lists fo rm a t. T h e
included d is tr ib u tio n s are well-know ones, such as th e n o rm al, Weibull, e x p o n e n t ia l ,
an d g a m m a . A co m p le te list of th e d is tr ib u tio n s p ro v id e d in A P P L , to in c lu d e th e ir
p a r a m e te r s , is p re s e n te d in A p p e n d ix B.
S p e c i a l I s s u e s : T h e suffix RV is a d d e d to each n a m e to m a k e it readily id e n tifia b le
17
n o r m a l an d gamma. T h e first le tte r of each w o rd is cap italized , w hich is th e case
for all procedures in A P P L . Also, th e re is no s p a c e betw een w ords in a p ro cedure
call, e.g.. an inverse G au ssian ra n d o m variable m a y be defined by t h e c o m m a n d
I n v e r s e G a u s s i a n R V ( P a r a m e f e W . P a ra m e ter^ ). U sually th e fo r m a t is r e t u r n e d as a
P D F . b u t in th e case of th e IDB d is trib u tio n , a C D F is re tu rn e d . T h e C D F of th e
IDB d is trib u tio n , it tu r n s o u t. is easier for M aple to m a n ip u la t e (e.g.. in te g r a te , differ
e n tia te ) th a n th e P D F . C e r ta in as s u m p tio n s a re m a d e a b o u t u nspecified p a ra m e te rs .
For e x am p le, an assignm ent of an unspecified e x p o n e n tia l r a n d o m varia ble (see th e
second ex a m p le below), will result in th e a s s u m p tio n t h a t A > 0. T h is a s s u m p tio n , as
w ith all o th e r d is trib u tio n s ' a s su m p tio n s, are o n ly a p p lie d to unspecified p a ra m e te rs .
T h e a s s u m p tio n s allow M aple to carry o u t c e r ta in ty p e s of sy m b o lic in te g ra tio n , such
as verifying th e area u n d e r th e d en sity is in fact one. for a P D F (see S ection 2.4).
E x a m p l e s :
• T h e exponential! I ) d is trib u tio n m ay be c r e a te d w ith th e following s ta te m e n t:
> X := E x p o n e n t i a l R V ( 1 ) ;
• T h e exponential(A ) ra n d o m variable A', w here A > 0. d i s t r i b u t i o n m ay be
c re a te d as follows:
> X := E x p o n e n t i a l R V ( l a m b d a ) ;
• T h e s e procedure also allow a m odeler to r e p a r a m e te r iz e a d is tr i b u tio n . T h e
ex p o n e n tia l ( | ) d is trib u tio n w here 8 > 0 , for e x a m p le , m a y b e c r e a te d as follows:
> a s s u m e ( t h e t a > 0 ) ;
IS
• T h e semi-specified Weibull(A. 1). w h ere A > 0. d is trib u tio n m a y b e c re a te d as
follows:
> X := Wei b u l l R V ( l a m b d a , 1 ) ;
N ote t h a t th is is a special case w h ere th e Weibull d is tr ib u tio n is e q u iv alen t to
an e x p o n e n tia l d is trib u tio n .
• T h e s ta n d a r d n o rm al d is trib u tio n m a y be c rea ted as follows:
> X := N o rm alR V (0, 1 ) ;
All d is trib u tio n s p re sen tly included in A P P L an d their p a r a m e te r i z a ti o n s a r e listed
in A p p en d ix B.
2.3
T h e s ix r e p r e s e n ta tio n s o f d istr ib u tio n s
S y n t a x : T h e c o m m a n d
D esiredForm iRandom Variable [ , Statistic]) ;
re tu rn s th e list-of-lists form at of th e d esired functional re p re s e n ta tio n of th e d i s tr i b u
tion. w here DesiredForm is one of t h e following: PDF. CDF, SF, HF. CHF. or IDF. T h e
single a r g u m e n t RandomVariable m u s t b e in th e list-of-lists fo rm a t. T h e o p tio n a l
a rg u m e n t , Statistic m a y be a c o n s ta n t o r a string.
P u r p o s e : T h e 6 x 6 d is trib u tio n conversion ability, a variation of t h e m a t r i x o u tlin e d
by Leemis (1995, p. 55). is provided so t h a t th e functional form of a d i s t r i b u t i o n can
19
C H F . T h is set of p ro ced u res will ta k e o n e form of th e d is tr i b u tio n as a n a r g u m e n t
a n d re tu r n t h e d e s ire d form of th e d i s t r i b u t i o n in th e a p p r o p r ia te list-of-lists f o rm a t.
For th e o n e - p a r a m e t e r call, th e f u n c tio n a l r e p re s e n ta tio n will be r e t u r n e d . For t h e
tw o -p a ra m e te r call, t h e a c tu al value of t h e fu n c tio n at t h a t p o in t will b e r e t u r n e d .
S pecial Issues:
T h e procedures are fairly ro b u s t against n on-specified p a r a m e t e r sfor th e d is tr i b u tio n s t h a t will be c o n v e rte d (see t h e fourth e x a m p le below ).
E xam ples:
• To o b ta in th e C D F form of a s t a n d a r d n o rm a l ra n d o m variable:
> X := NormalRV(0, 1); > X := CDF(X);
or. eq u iv alen tly , in a single line.
> X := CDF(NormalRV( 0, 1 ) ) ;
Since th e C D F for a s ta n d a r d n o r m a l r a n d o m variable is n o t closed fo rm , A P P L
r e tu rn s t h e following:
X : = [[j- —► ^ e r f ( ^ x \/ 2 ) + -j], [—oc. oc], [C o n tin u o u s. CDF]]
• If A' ~ N ( 0 . 1), th e n th e following s t a t e m e n t s can be used to find P ( X < 1.96) =
0.975.
> X := NormalRV(0, 1); > p r o b := CDF(X, 1 . 9 6 ) ;
20
• S h o u ld th e h a z a rd function of an e x p o n en tial d i s t r i b u t i o n be e n t e r e d , its asso
c ia te d P D F m a y be d e te rm in e d as follows:
> X := [ [ x -> 1 ] , [ 0 , i n f i n i t y ] , [ ' C o n t i n u o u s ' , ' H F ' ] ] ; > X := PDF(X) ;
• For th e case of unspecified p a r a m e te r s , th e following s t a t e m e n t s convert an
unspecified YVeibull P D F to a n unspecified W eib u ll SF :
> X := Wei bu ll RV( lambd a, k a p p a ) ; > X := S F ( X ) ;
w hich returns:
.V : = [[x —► e ' - r A *], [0, oo], [C o ntinu ou s. SF]]
N ote t h a t th e tildes afte r th e p a ra m e te rs in d ic a te t h a t a s s u m p ti o n s have been
m a d e concerning th e p a ra m e te rs (i.e.. A > 0 a n d k > 0) in t h e W eib u llR V
p ro ced u re.
• F in d in g a q u a n tile of a d is trib u tio n requires t h e I D F p ro c e d u re . If X ~
W eibull( 1.2), th e n th e 0.975 q u a n tile of th e d i s t r i b u t i o n can be found w ith
t h e s ta t e m e n t
> q u a n t := I D F ( W e i b u l l R V ( l , 2 ) , 0 . 9 7 5 ) ;
• T h e p rocedures c a n be nested so t h a t if th e r a n d o m v a r ia b le X has b een defined in te rm s of its P D F , th e n th e s ta t e m e n t
21
does n o th in g to th e list-of-lists r e p re s e n ta tio n for X, a s s u m in g t h a t all tra n s fo r
m a tio n s can be perform ed analytically.
A l g o r i t h m : T h e conversions a re show n in a 6 x 6 m a t r i x in A p p e n d ix C. E ach
e lem en t of th e m a trix takes th e 'r o w ' a n d converts it to t h e ty p e specified in th e
"column" heading. T hus th e first row. second e le m e n t of t h e m a tr ix shows a call to
th e CDF p ro c e d u re using th e P D F re p re s e n ta tio n of a r a n d o m varia ble as a n a rg u m e n t
which r e tu r n s th e C D F re p re s e n ta tio n of a ra n d o m variable.
2.4
VerifyPDF
S y n t a x : T h e c o m m a n d
V e r if y P D F ( Random Variable) ;
re tu rn s t r u e or false, d e p en d in g on w h e th e r or not th e P D F in te g ra te s to one. T h e sin
gle a r g u m e n t Random Variable m u s t be in th e list-of-lists fo rm a t d e s c rib e d previously.
In a d d itio n , th e p rocedure prints
‘T h e a r e a u n d e r th e P D F is '.
along w ith th e area , and “t r u e ” if t h e a re a is 1 . 0 or “false” if t h e a re a is not 1.0 .
P u r p o s e : T h e p u rpose of th is p ro c e d u re is to help d e t e r m in e if a r a n d o m variable in
th e list-of-lists fo rm at is in fact a v ia b le re p re s e n ta tio n of a co n tin u o u s d is trib u tio n .
Specifically, th e p ro ced u re co nverts th e d is trib u tio n to th e P D F form a n d carries o u t
th e definite in te g ra tio n of t h e P D F to see if th e a r e a u n d e r t h e P D F is 1. If so, it
T h e a r e a u n d e r th e P D F is . 1
a n d r e tu r n s "true": otherw ise it r e tu r n s th e c o m p u t e d a r e a a n d th e s tr in g "false" if
th e a r e a is m ore th a n 0.0000001 aw ay from 1. T h is p ro c e d u re is p rim a rily an in d ic a to r
tool to check if the list-of-lists fo rm a t of a r a n d o m v aria b le has b een in p u t correctly.
S p e c i a l is s u e s : T h e p ro c e d u re only integ rate s t h e a r e a u n d e r each s egm ent of th e
P D F of t h e a rg u m e n t Random Variable. It does n o t check for n e g a tiv e fu n ctio n al
values of f ( x ) . T h e th ird e x a m p le below shows t h e co n tin u o u s fu n ctio n
f [ x ) = 3 |x | — 1 — 1 < x < 1
in te g r a te s to one. yet is not a P D F since / ( 0 ) = —1.
For m a n y well-known d is tr ib u tio n s , th e p r o c e d u r e will c a rry o u t th e sym bolic
in te g ra tio n a n d verify th a t th e a re a u n d er th e P D F is one. as il lu s tr a te d in th e
second e x a m p le below. Not all of th e d is trib u tio n s d e s c rib e d in Section 2.2 have this
sym bolic capability, b u t m ost do. For ex am ple, th e unspecified log n o rm a l d is tr ib u tio n
will in t e g r a te to one. b u t th e unspecified inverse G a u s s ia n d is tr i b u tio n will not: see
th e fo u rth e x am p le below.
E x a m p l e s :
• T h e following M aple s t a t e m e n t s crea te an e x p o n e n t ia l r a n d o m varia ble X w ith
a m e a n of 1. verify t h a t th e a r e a u n d e r f ( x ) is one, a n d r e t u r n true from
23
> X := E x p o n e n t i a l R V ( l ) ; > V e r i f y P D F ( X ) ;
• Since a s s u m p ti o n s a r e m a d e in te rn a lly in Ex po ne nt i al RV a b o u t th e p a r a m e te r space, th e following two s t a t e m e n t s will also re tu r n true:
> X := E x p o n e n t i a l R V ( l a m b d a ) ; > V e r i f y P D F ( X ) ;
• T h e following c o d e defines a f u n c tio n f { x ) such t h a t f ( x ) d x = 1 a n d / ( 0 ) =
— 1. so t h a t V e r i f y P D F re tu rn s true ev en th o u g h th is is n o t a le g itim a te P D F :
> X := [ [ x -> 3 * a b s ( x ) - 1 ] , [ - 1 , 1 ] , [ ' C o n t i n u o u s ' , ' P D F ' ] ] ; > V e r i f y P D F ( X ) ;
• M ap le is not ab le to c o n d u c t th e in te g ra tio n for m o re c o m p le x d is trib u tio n s . In
this e x a m p le . X is assigned t h e unspecified inverse G a u s s ia n d is tr ib u tio n , and
an a t t e m p t to i n t e g r a t e th e a r e a u n d e r t h e d en sity is unsuccessful.
> X := I n v e r s e G a u s s i a n R V ( p l , p 2 ) ; > V e r i f y P D F ( X ) ;
T h e s e s t a t e m e n t s r e t u r n an e rro r m essage in d icatin g t h a t t h e fu n c tio n does not
e v a lu a te to n u m e ric . T h e a s s u m p tio n is m a d e t h a t f u tu r e releases of M aple will
be ab le to c o rre c tly in t e g r a te th is P D F .
A l g o r i t h m : T h e a l g o r i t h m first checks to see w h e th e r t h e d i s t r i b u t i o n of in terest
is co n tin u o u s . N ex t, it checks to see if t h e d is tr ib u tio n of t h e r a n d o m variable is
re p re s e n te d by a P D F . If n o t, it converts a local d is tr ib u tio n to a P D F fo rm using th e
2 4
th e P D F is calc u la te d a n d p rin te d . T h e re tu rn e d value fro m th e pro ced u re is “ irue"
if th e are a is w ith in 0.0000001 of 1. a n d “false" oth e rw ise. T h e a lg o rith m is given in
A p p e n d ix D.
2.5
MedianRV
S y n tax :
T h e c o m m a n dMedianRV (/ta n d o m Variable) ;
re tu r n s th e m e d ia n of a specified d is trib u tio n .
P u rp o se :
T h is p r o c e d u re r e tu r n s t h e m edian of a r a n d o m variable.S pecial Issues:
It is fairly ro b u s t for use w ith d is tr i b u tio n s t h a t have unspecified p a ra m e te rs .E x am p les:
• For th e fully-specified W eibull d is trib u tio n , th e following s ta t e m e n t s will assign
th e m e d ia n of th e d is tr i b u tio n to the variable m.
> X := W e i b u l l R V ( l , 2 ) ; > m := MedianRV(X);
• T h e following s t a t e m e n t s d e t e r m in e th e m e d ia n of a n e x p o n e n tia l r a n d o m vari
able w ith unspecified p a ra m e te rs :
> X := E x p o n e n t i a l R V ( l a m b d a ) ; > m := MedianRV(X);
25
A l g o r i t h m : T h e a lg o rith m is a special case of th e tw o - p a r a m e te r IDF p ro c e d u re call,
w here th e second p a r a m e t e r is j .
2.6
DisplayRV
S y n t a x : T h e c o m m a n d
D is p la y R V (R andom Variable) ;
displays th e list-of-lists fo rm a t of th e d is trib u tio n in s t a n d a r d m a t h e m a t i c a l n o ta tio n ,
using t h e M aple p i e c e w i s e procedure.
P u r p o s e : T h e p u rp o s e of th is p rocedure is to m a k e th e list-of-lists re p re s e n ta tio n
of a d is tr i b u tio n m o re read a b le. A long list-of-lists w ith several seg m e n ts is not easy
to u n d e r s ta n d . T h is p ro c e d u re converts a list-of-lists f o r m a t te d d is trib u tio n into
th e M a p le -s y n ta x e d "piecewise" function. Such versions of s e g m e n te d functions are
displayed in a m o re re a d a b le m a n n e r in M aple. It also s ta t e s w h e th e r th e c u r re n t
re p r e s e n ta tio n is a P D F . C D F . etc. T h e re is no c o m p u ta ti o n in th is pro ced u re. T h e
p ro c e d u re a t t e m p t s to m a k e th e list-of-lists fo rm at m ore readable.
S p e c i a l I s s u e s : None.
E x a m p l e :
• T h e piecewise tr i a n g u la r d is trib u tio n could b e displayed as follows:
> D i s p l a y R V ( T r i a n g u l a r R V ( l , 2 , 3 ) ) ;
26
This random variable is currently represented as follows:
'Continuous'. “P D F '
0
x < 1
< x — 1 x < 2
3 — x x < 3
A l g o r i t h m : T h is a l g o ri th m is a set of c o m m a n d s t h a t c re a te s a sequence of c o n d itio n s
a n d fu n c tio n s in a m a n n e r t h a t is usable by th e p i e c e w i s e c o m m a n d .
2 .7
P l o t D i s t
S y n t a x : T h e c o m m a n d
P l o t D i s t (R a n d o m Variable. LowerLimit, U p p e rL im it);
plots t h e c u rre n t list-of-lists defined d is tr ib u tio n betw ee n Low erLim it a n d UpperLimit
on a c o o r d in a te axis.
P u r p o s e : To give a g r a p h ic a l re p re s e n ta tio n of any list-of-lists re p resen ted d i s t r i b u
tion. T h e a r g u m e n ts L ow erL im it a n d UpperLimit define t h e m in im u m a n d m a x i m u m
values d e s ire d on t h e h o riz o n ta l axis.
S p e c i a l I s s u e s : A d i s tr i b u tio n function m u s t be fully-specified for a p lo t to be
g e n e ra te d . T h e p r o c e d u r e is especially useful for p lo t tin g d is trib u tio n s t h a t have
E x a m p l e s :
• T h e following s t a t e m e n t s will g e n e ra te th e plot of t h e P D F for th e tr i a n g u la r ( 1.
2. 3) d is trib u tio n :
> X := T r i a n g u l a r R V ( 1 , 2 , 3 ) ; > P l o t D i s t ( X , 1, 3 ) ;
• To p lo t th e H F of t h e e x p o n e n t i a l 1 ) d is trib u tio n for 0 < t < 1 0. e n t e r t h e
s t a t e m e n t s :
> X := E x p o n e n t i a l R V ( 1 ) ; > P l o t D i s t ( H F ( X ) , 0 , 1 0 ) ;
• To see a progression of th e five P D F s of th e o rd e r s t a t i s t i c s (th e p r o c e d u re is
in t r o d u c e d in S e c tio n 2.10) for an e x p o n e n t i a l 1) d i s t r i b u t i o n , one could e n t e r
th e following s t a t e m e n t s : > X := E x p o n e n t i a l R V ( 1) ; > n : = 5 ; > F o r i f r o m 1 t o n do P l o t D i s t ( O r d e r S t a t ( X , n , i ) , 0, 1 0 ); o d;
T h e re su lt is five P D F s p lo t te d sequentially. T h is s e q u e n c e could be of use to
an in s t r u c t o r e x p l a in in g th e progressive n a tu re of o r d e r s ta tis tic s to first-y ea r
p r o b a b il it y s tu d e n t s .
• U nspecified d i s t r i b u t i o n s p ro d u c e “e m p ty p lo t” w arnings:
> X := E x p o n e n t i a l R V ( l a m b d a ) ; > P l o t D i s t C X , 0 , 1 0 ) ;
28
A lg o rith m :
T h e a lg o rith m is a n e ste d set of P l o t c o m m a n d s t h a t com bine to form a single plot. T h is is s ta n d a r d M ap le p r o g r a m m in g for p lo t tin g m u ltip le functions ona single set of axes. Since — oc a n d oo are co m m o n e n d p o in ts of ra n d o m variables, it
is necessary to specify th e lower a n d u p p e r e n d p o in ts of th e horizontal axis.
2.8
ExpectationRV
S y n ta x :
T h e c o m m a n dE x p e c t a t i o n R V ( / ? a n d o m Variable, Function) ;
r e tu r n s th e e x p e c te d value of a fun ctio n of a r a n d o m variable.
P u rp o s e :
To find th e ex p e c te d value of a fu n ctio n of a r a n d o m variable.S p e c i a l I s s u e s : P ro ced u res MeanRV a n d Va ri an ce RV are th e special cases of the
E x p e c t a t i o n R V pro ced u re, ev id en t by th e ir nam es.
E x am p les:
• In o rd er to find th e e x p ec ted value of a s t a n d a r d n o rm a l ra n d o m variable, type:
> X := NormalRV(0, 1) ;
> meanX := E x p e c t a t i o n R V ( X , x -> x ) ;
• U nspecified d is trib u tio n s m a y also be used. H ere is t h e m e a n of th e exponential(A )
r a n d o m v aria b le is calcu la ted w ith th e s ta te m e n ts :
> X := E x p o n e n t i a l R V ( l a m b d a ) ;
29
A l g o r i t h m : T h e algorithm is a s tra ig h tfo rw a rd im p le m e n ta tio n of th e following
resu lt. Let th e continuous ra n d o m varia ble X have P D F f x i * ) - Let g ( X ) be a
c o n tin u o u s fu n c tio n of th e X . T h e e x p e c te d value of <7(A”), when it exists, is given
by E[ g{ X) \ = [ 9 { x ) - f x ( x ) d x . J—oc T h e a l g o r i th m is in A ppendix D.
2.9
Transform
S y n t a x : T h e c o m m a n d T r a n s f o r m(R a n d o m Variable, Transformation) ;re tu r n s t h e P D F of the tra n s fo rm e d r a n d o m variable in t h e list-of-lists fo rm a t.
P u r p o s e : To d e te rm in e th e P D F of th e tra n s fo rm a tio n of a ra n d o m varia ble o f th e
form Y' = g { X) . As is the case for th e r a n d o m variable A \ th e tra n s fo rm a tio n fu n c tio n
g ( X ) m a y be defined in a piecewise fashion (see c h a p te r 3).
S p e c i a l I s s u e s : T h e tra n s fo rm a tio n fu n c tio n m ust also be defined in an a l t e r e d list-
of-lists fo r m a t. For this function, t h e m o d e le r m ust break th e tra n s fo rm a tio n into
piecew ise m o n o to n e segments. D etails on why this m ust be th e case, in a d d i t i o n to
o th e r im p le m e n ta t io n issues a re given in C h a p te r 3.
E x a m p l e s :
• Let A' ~ U ( 0 , 1) an d Y = g ( X ) = 4 A . T h e following s ta te m e n ts will g e n e r a t e
3 0
> X := [ [ x -> l ] , [ 0, 1 ] , [ ' C o n t i n u o u s ' , ' P D F ' ] ] ; > g := [ [ x -> 4 * x ] , [ - i n f i n i t y , i n f i n i t y ] ] ; > Y := T r a n s f o r m ( X , g ) ;
• T h e following s ta t e m e n t s d e t e r m in e t h e d is trib u tio n of t h e sq u a re of an inverse
G a u s s ia n ra n d o m variable w ith A = 1 a n d /z = 2:
> X := I n v e r s e G a u s s i a n R V ( l , 2 ) ;
> g := [ [ x -> x “ 2 ] , [0 , i n f i n i t y ] ] ; > Y := T r a n s f o r m ( X , g ) ;
• A n e x a m p le of finding th e n e g a tiv e of a ra n d o m v a ria b le is included in S ectio n
2.12 on t h e c o m m a n d SumRV. used in finding differences o f ra n d o m variables.
• An e x a m p l e of finding th e recip ro cal of a ra n d o m v a ria b le is included in S ectio n
2.11 o n t h e c o m m a n d P ro d u c tR V , used in finding ra tio s of r a n d o m variables.
• An e x a m p le of dividing a r a n d o m variable by a c o n s ta n t is included in S ectio n
2.15 on t h e c o m m a n d MLE. used to th e find d is tr ib u tio n s of ce rta in e s ti m a t o rs .
• A n u m b e r of o th e r illu s tra tiv e ex a m p le s are given in C h a p t e r 3.
A l g o r i t h m : T h e th e o re m which p rovides th e basis for th e a l g o ri th m a n d th e d e ta ils
asso ciated w ith t h e a lg o rith m are found in C h a p te r 3. T h e a l g o ri th m is in A p p e n d ix
31
2 .1 0
O rderStat
S y n t a x : T h e c o m m a n d
O r d e r S t a t (R andom Variable, n , r) ;
r e tu r n s t h e P D F of t h e r th of n o rd e r statistic s d ra w n fro m a p o p u la tio n having th e
sa m e d is trib u tio n as R a n d o m Variable.
P u r p o s e : T h is p r o c e d u r e is desig n ed to re tu rn th e m a rg in a l d i s tr i b u tio n of specified
o rd e r s ta tis tic s. T h e p r o c e d u r e s a rg u m e n ts are defined as follows: th e p o p u la tio n
d is tr i b u tio n is r e p re s e n te d by t h e list-of-lists fo rm a t, t h e in te g e r s a m p le size n, an d
th e integ er r to d e n o t e t h e r th o rd e r s tatistic . T h e p r o c e d u r e r e t u r n s th e m arg in al
P D F for th e r th o r d e r s t a t is tic in th e list-of-lists fo rm a t. T h e p ro c e d u re is a direct
im p le m e n ta tio n of t h e w id ely -p u b lish ed th e o re m on th e d i s t r i b u t i o n of th e order
s ta tis tic s (e.g., Larsen a n d M a rx , 19S6. p. 145).
S p e c i a l I s s u e s : T h is p r o c e d u r e is ro b u s t for unspecified p a r a m e t e r s in th e p o p u la tio n
d is tr ib u tio n . It is also fairly ro b u s t a t re tu rn in g th e a p p r o p r i a t e P D F w hen e ith e r
n or r is unspecified. It is also ro b u s t when dealing w ith m o re t h a n one s egm ent in
a P D F . T h is p ro c e d u r e was a c o rn e rs to n e p ro ced u re t h a t allow ed th e goodness-of-fit
c o n tr ib u tio n s d iscussed in C h a p t e r 6 in this d is s e rta tio n .
E x a m p l e s :
• T h e P D F of t h e th i r d o rd e r s ta tis tic from a s a m p le o f five ite m s d is tr i b u te d