INTRODUCTION TO MEDICAL STATISTICS:
THEORETICAL PROBABILITY
DISTRIBUTIONS
Mirjana KujundžićTiljak
• EMPIRICAL FREQUENCY DISTRIBUTION
– observed data
• THEORETICAL PROBABILITY DISTRIBUTION
- described by mathematical modelsTHEORETICAL PROBABILITY
DISTRIBUTIONS 2
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
3
• when some empirical distribution approximates a
particular probability distribution – theoretical
knowledge of that distribution could be used
→
answer questions about data
• evaluation of probabilities is required
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
4
PROBABILITY (P)
• measures uncertainty
• measures the chance of a given event occurring
0
≤
P
≤
1
– P = 0 →event cannot occur – P = 1 →event mustoccur
– Q = 1-P →probability of the complementary event (the event not occurring)
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
5
• Various approaches in probability calculations:
– Subjective – personal degree of belief that the event will occur
(e.g. the world sill come to an end in the year 2050)
– Frequentist– the proportion of times the event would occur if the experiment will be repeated a large number of times (e.g. the number of times we would get a “head")
– A priori– requires knowledge of the theoretical model –
probability distribution– which describes the probabilities of all possible outcomes of the “experiment (e.g. genetic theory allows us to describe the probability distribution for eye color in a baby born t a blue-eyed women and brown-eyed man by initially specifying all possible genotypes of eye color in the baby and their probabilities)
PROBABILITY (P)
• The addition rule:if two events (A and B) are mutually exclusive →the probability that either one or the other occurs (A or B) is equal to the sum of their probabilities
Prob (A or B) = Prob (A) + Prob (B)
• The multiplication rule:if two events (A and B) are independent →the probability that both events occur (A and B) is equal to the product of the probability of each
THEORETICAL PROBABILITY DISTRIBUTIONS 4 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 7
RANDOM VARIABLES
• random variable
– a quantity that can take any
one of a set of mutally excluseve values with a
given probability
•
discrete or discontinuous random variable
=
numerical values are integer
• E.g. number of children in family – 0, 1, 2, 3, … k
•
continuus random variable
=
numerical values are real numbers
• E.g. body weight 72,35 kg, blood glucose level 7,2 mmol/l
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
8
PROBABILITY DISTRIBUTION
• Probability distribution– shows the probabilities of allpossible values of the random variable
– a theoretical distribution that is expressed mathematically – has a mean and variance that are analogous to those of and
empirical distribution
• parameters– summary measures (e.g. mean, variance) characterizing that distribution → are estimated in the sample by relevant statistics
• depending on whether the random variable is discrete or continuous →the probability distribution can be either discrete or continuous
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
9
• DISCRETE (Binomial, Poisson)
– the probability can be derived corresponding to every possible value of the random variable
– the sum of all such probabilitis is 1
PROBABILITY DISTRIBUTIONS
• CONTINUOUS (Normal, Chi-squared, t, F)
– the probability of the random variable, x, taking values in certain ranges, could be derived
– if the horizontal axis represents the values of x →the curve from the equation of the distribution could be drawn (= probability density function)
– Total area under the curve = 1 →represents the probability of all possible events
• Probability that x lies between two limits is equal to the area under the curve between these values
THEORETICAL PROBABILITY DISTRIBUTIONS 6 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 11
PROBABILITY DISTRIBUTIONS
• Probability that x lies between two limits?
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
12
PROBABILITY DISTRIBUTIONS
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
13
• one of the most important distributions in
statistics
• german mathematician C.F. Gauss
• the most biological measurements follow normal
distribution
• it is used in many analytical models
THE NORMAL (GAUSSIAN) DISTRIBUTION
• Probability density function:
f (x) = (1/
σ√2π)
e
aTHEORETICAL PROBABILITY
DISTRIBUTIONS 8
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
15
THE NORMAL (GAUSSIAN) DISTRIBUTION
Completely described by two parameters:
-
mean (
µ
)
- variance (
σ
2)
X ~ N (
µ,σ
2)
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 1627.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
17 • normal distribution curve:
– area under curve = 1 – bell-shaped (unimodal= – symmetrical about its mean – apsolute maximum for x = µ
– shifted to the right if the mean is increased and to the left if the mean is decreased (assuming constant variance)
– flattened as the variance is increased but becomes more peaked as the variance is decreased (for a ficed mean)
THE NORMAL (GAUSSIAN) DISTRIBUTION
• the mean and median and mode of a Normal distribution are equal
• the probability (P)that a normally distributed random variable, x, with mean, µ, and standard deviation, σ, lies between:
(µ -σ) and (µ + σ) = 0,68 (µ - 1.96σ) and (µ + 1.96σ) = 0.95 (µ – 2.58σ) and (µ + 2.58σ) = 0.99
→these intervals may be used to define reference intervals
THEORETICAL PROBABILITY
DISTRIBUTIONS 10
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
19
THE NORMAL (GAUSSIAN) DISTRIBUTION
• changing µ, constant
σ
:
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
20
THE NORMAL (GAUSSIAN) DISTRIBUTION
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
21
• changing
σ
, constant
µ
:
THE NORMAL (GAUSSIAN) DISTRIBUTION
THEORETICAL PROBABILITY
DISTRIBUTIONS 12
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
23
THE NORMAL (GAUSSIAN) DISTRIBUTION
• changing
σ
, constant
µ
:
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
24
THE STANDARD NORMAL DISTRIBUTION
• transformation of original value (x) to
Standardized Normal Deviate (SND) (z
i):
z
i= (x
1-
µ)/σ
= random variable that has a Standard Normal distribution
• sample:
z
i= (x
1-
⎯
x)/s
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 25
X
1→
Z
1X
2→
Z
2X
3→
Z
3…
X
n→
Z
n⌧
, s
=?, s
z=?
THE STANDARD NORMAL DISTRIBUTION
X
1→
Z
1X
2→
Z
2X
3→
Z
3…
X
n→
Z
n⌧
, s
=0, s
z=1
THEORETICAL PROBABILITY
DISTRIBUTIONS 14
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
27
THE STANDARD NORMAL DISTRIBUTION
X
1→
Z
1X
2→
Z
2X
3→
Z
3…
X
n→
Z
n⌧
, s
=0, s
z=1
Z~N(0,1)
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 2827.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
29
THEORETICAL PROBABILITY
DISTRIBUTIONS 16
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
31
THE STANDARD NORMAL DISTRIBUTION
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
32
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
33
• W.S. Gossett (pseudonym Student)
• parameter that characterizes the t-distribution = the degrees of freedom
• Similar shape as normal distribution (more spread out with longer tails) – as the degrees of freedom increase its shape approaches Normality
• Useful for calculating confidence intervals for testing hypotheses about one or two means
THEORETICAL PROBABILITY
DISTRIBUTIONS 18
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
35
THE CHI-SQUARE (
χ
2) DISTRIBUTION
• a
right skewed distribution taking positive values • characterized by itsdegrees of freedom• its shape depends on the degrees of freedom – it
becomes more symmetrical and approaches Normality as they increase
• useful for analysing categorical data
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
36
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
37
• skewed to the right
• defined by a ratio – the distribution of a ratio of two estimated variances calculated from Normal dana approximates the F-distritution
• characterized by degrees of freedom of the numerator and the denominator of the ratio
• useful for comparing two variances, and more than two means using the analysis of variance
THE LOGNORMAL DISTRIBUTION
• the probability distribution of a random variable whose log (to base 10 or e) follows the Normal distribution • highly skewed to the right
• logs of row data skewed to the right →an empirical distribution that is nearly Normal = data approximate Log-normal distribution
THEORETICAL PROBABILITY
DISTRIBUTIONS 20
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
39
THE LOGNORMAL DISTRIBUTION
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
40
THE BINOMIAL DISTRIBUTION
• theoretical distribution for discrete random
variable
• definition: Jacob Bernuolli, 1700.
• two outcomes: “success” i “failure”
• n events
– E.g. n = 100 unrelated women undergoing IVF
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
41
• Two parameters that describe the Binomial distribution: n = number of indivudial in the sample (or repetitions of a trial)
π= the true probability of success for each individual (or in each trial)
X~B(n,p)
THE BINOMIAL DISTRIBUTION
• Mean = nπ
(the value for the random variable that we expect if we look at n individuals, or repeat the trial n times)
• Variance = nπ(1-π) • small n
– the distribution is skewed to the right if π<0.5 – the distribution is skewed to dhe right if π>0.5
THEORETICAL PROBABILITY
DISTRIBUTIONS 22
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
43
THE BINOMIAL DISTRIBUTION
• the distribution becomes more symmetrical as the sample size increases and approximates to the Normal
distribution if both nπandnπ(1 –π) aregreater than 5
• the properties of the Binomial distribution could be use when making inferences about proportions
• the Normal approximation of the Binomial distribution when analyzing proportions is often used
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
44
THE BINOMIAL DISTRIBUTION
Example: gene recombination
Chromosomal locus: 2 allels: A anda
p= probability of A
Q = 1−p = probability ofa
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
45
conception→outcame space:{AA, Aa, aa}
P(AA) = P(A) * P(A)= p2
P(aa) = P(a) * P(a) = q2
P(Aa) = P(A) * P(a) = pq
P(aA) = P(a) * P(A)= qp 2pq ______
1,0
p2+ 2pq + q2= (p+q)2 = 12= 1
THEORETICAL PROBABILITY
DISTRIBUTIONS 24
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
47
THE BINOMIAL DISTRIBUTION
– Example – probability of genotypes:
• frequency of gene A = 0,33 • frequency of gene a = 0,67 (p+q)2= (0,33 + 0,67)2= 0,332+ 2 * 0,33 * 0,67 + 0,672 P (AA)= 0,332 = 0.1089 P (Aa) = 0,33 * 0,67 = 0,2211 P (aA) = 0,67 * 0,33 = 0,2211 P (aa) = 0,672 = 0,4489 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 48
THE BINOMIAL DISTRIBUTION
Graphical presentatnion – probabilities of different genotypes
0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 0,45 0,5 AA Aa aa P
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
49
– Example – death outcome as binomial distribution:
• Letality od neke bolesti = 0,30 …..(30/100) • Survival probability = 0,70
• n = 5
• Binom: (0,30 + 0,70)5
Number of
death examinees Binom Probability 5 (everybody) 4 3 2 1 0 (nobody) P5 5p4q 10p3q2 10p2q3 5pq4 q5 0,00243 0,02835 0,13230 0,30870 0,36015 0,16807 Total 1,00000
THE POISSON DISTRIBUTION
• Poisson (begining of XIX century)
• the Poisson random variable= the count or the number of events that occur independently and randomly in time or space at some average rate, µ (0 and all positive integers)
– example: the number of hospital admissions per day typically follows the Poisson distribution
→use of the Poisson cistribution to calculate the probability of a certain number of admissions on any particular day
THEORETICAL PROBABILITY
DISTRIBUTIONS 26
27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS
51
THE POISSON DISTRIBUTION
• Mean (average rate, µ)= the parameter that describes the Poisson distribution
• The meanequals the variancein the Poisson distribution
• Unimodalcurve, right skewedif the mean is small, but becomes more symmetrical as the mean increases, when it approximates n Normal distribution