**INTRODUCTION TO MEDICAL STATISTICS:**

**THEORETICAL PROBABILITY **

**DISTRIBUTIONS**

Mirjana KujundžićTiljak

### • EMPIRICAL FREQUENCY DISTRIBUTION

– observed data

### • THEORETICAL PROBABILITY DISTRIBUTION

- described by mathematical modelsTHEORETICAL PROBABILITY

DISTRIBUTIONS 2

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

3

### • when some empirical distribution approximates a

### particular probability distribution – theoretical

### knowledge of that distribution could be used

### →

### answer questions about data

### • evaluation of probabilities is required

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

4

**PROBABILITY (P)**

### • measures uncertainty

### • measures the chance of a given event occurring

### 0

### ≤

### P

### ≤

### 1

– P = 0 →event cannot occur – P = 1 →event mustoccur

– Q = 1-P →probability of the complementary event (the event not occurring)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

5

• Various approaches in probability calculations:

– Subjective – personal degree of belief that the event will occur

(e.g. the world sill come to an end in the year 2050)

– Frequentist– the proportion of times the event would occur if the experiment will be repeated a large number of times (e.g. the number of times we would get a “head")

– A priori– requires knowledge of the theoretical model –

probability distribution– which describes the probabilities of all possible outcomes of the “experiment (e.g. genetic theory allows us to describe the probability distribution for eye color in a baby born t a blue-eyed women and brown-eyed man by initially specifying all possible genotypes of eye color in the baby and their probabilities)

**PROBABILITY (P)**

• The addition rule:if two events (A and B) are mutually exclusive →the probability that either one or the other occurs (A or B) is equal to the sum of their probabilities

Prob (A or B) = Prob (A) + Prob (B)

• The multiplication rule:if two events (A and B) are independent →the probability that both events occur (A and B) is equal to the product of the probability of each

THEORETICAL PROBABILITY DISTRIBUTIONS 4 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 7

**RANDOM VARIABLES**

### • random variable

### – a quantity that can take any

### one of a set of mutally excluseve values with a

### given probability

### •

*discrete or discontinuous random variable*

*= *

### numerical values are integer

• E.g. number of children in family – 0, 1, 2, 3, … k

### •

*continuus random variable*

*=*

### numerical values are real numbers

• E.g. body weight 72,35 kg, blood glucose level 7,2 mmol/l

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

8

**PROBABILITY DISTRIBUTION**

• Probability distribution– shows the probabilities of all
possible values of the random variable

– a theoretical distribution that is expressed mathematically – has a mean and variance that are analogous to those of and

empirical distribution

• parameters– summary measures (e.g. mean, variance) characterizing that distribution → are estimated in the sample by relevant statistics

• depending on whether the random variable is discrete or continuous →the probability distribution can be either discrete or continuous

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

9

### • DISCRETE (Binomial, Poisson)

– the probability can be derived corresponding to every possible value of the random variable

– the sum of all such probabilitis is 1

**PROBABILITY DISTRIBUTIONS**

### • CONTINUOUS (Normal, Chi-squared, t, F)

– the probability of the random variable, x, taking values in certain ranges, could be derived

– if the horizontal axis represents the values of x →the curve from the equation of the distribution could be drawn (= probability density function)

– Total area under the curve = 1 →represents the probability of all possible events

• Probability that x lies between two limits is equal to the area under the curve between these values

THEORETICAL PROBABILITY DISTRIBUTIONS 6 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 11

**PROBABILITY DISTRIBUTIONS**

### • Probability that x lies between two limits?

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

12

**PROBABILITY DISTRIBUTIONS**

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

13

### • one of the most important distributions in

### statistics

### • german mathematician C.F. Gauss

### • the most biological measurements follow normal

### distribution

### • it is used in many analytical models

**THE NORMAL (GAUSSIAN) DISTRIBUTION**

### • Probability density function:

### f (x) = (1/

### σ√2π)

### e

aTHEORETICAL PROBABILITY

DISTRIBUTIONS 8

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

15

**THE NORMAL (GAUSSIAN) DISTRIBUTION**

### Completely described by two parameters:

### -

### mean (

### µ

### )

### - variance (

### σ

2_{)}

**X ~ N (**

### µ,σ

**2**

_{)}

27.06.2006 THEORETICAL PROBABILITY
DISTRIBUTIONS
16
_{)}

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

17
• *normal distribution curve*:

– area under curve = 1 – bell-shaped (unimodal= – symmetrical about its mean – apsolute maximum for x = µ

– shifted to the right if the mean is increased and to the left if the mean is decreased (assuming constant variance)

– flattened as the variance is increased but becomes more peaked as the variance is decreased (for a ficed mean)

**THE NORMAL (GAUSSIAN) DISTRIBUTION**

• the mean and median and mode of a Normal distribution are equal

• the probability (P)that a normally distributed random variable, x, with mean, µ, and standard deviation, σ, lies between:

(µ -σ) and (µ + σ) = 0,68 (µ - 1.96σ) and (µ + 1.96σ) = 0.95 (µ – 2.58σ) and (µ + 2.58σ) = 0.99

→these intervals may be used to define reference intervals

THEORETICAL PROBABILITY

DISTRIBUTIONS 10

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

19

**THE NORMAL (GAUSSIAN) DISTRIBUTION**

### • changing µ, constant

### σ

### :

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

20

**THE NORMAL (GAUSSIAN) DISTRIBUTION**

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

21

### • changing

### σ

### , constant

### µ

### :

**THE NORMAL (GAUSSIAN) DISTRIBUTION**

THEORETICAL PROBABILITY

DISTRIBUTIONS 12

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

23

**THE NORMAL (GAUSSIAN) DISTRIBUTION**

### • changing

### σ

### , constant

### µ

### :

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

24

**THE STANDARD NORMAL DISTRIBUTION**

### • transformation of original value (x) to

### Standardized Normal Deviate (SND) (z

_{i}

### ):

### z

_{i}

### = (x

_{1 }

### -

### µ)/σ

= random variable that has a Standard Normal distribution

### • sample:

### z

_{i}

### = (x

_{1}

### -

### ⎯

### x)/s

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 25

### X

_{1}

### →

### Z

_{1}

### X

_{2}

### →

### Z

_{2}

### X

_{3}

### →

### Z

_{3}

### …

### X

_{n}

### →

### Z

_{n}

### ⌧

### , s

### =?, s

_{z}

### =?

**THE STANDARD NORMAL DISTRIBUTION**

### X

_{1}

### →

### Z

_{1}

### X

_{2}

### →

### Z

_{2}

### X

_{3}

### →

### Z

_{3}

### …

### X

_{n}

### →

### Z

_{n}

### ⌧

### , s

### =0, s

_{z}

### =1

THEORETICAL PROBABILITY

DISTRIBUTIONS 14

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

27

**THE STANDARD NORMAL DISTRIBUTION**

### X

_{1}

### →

### Z

_{1}

### X

_{2}

### →

### Z

_{2}

### X

_{3}

### →

### Z

_{3}

### …

### X

_{n}

### →

### Z

_{n}

### ⌧

### , s

### =0, s

_{z}

### =1

### Z~N(0,1)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 2827.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

29

THEORETICAL PROBABILITY

DISTRIBUTIONS 16

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

31

**THE STANDARD NORMAL DISTRIBUTION**

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

32

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

33

• W.S. Gossett (pseudonym Student)

• parameter that characterizes the t-distribution = the degrees of freedom

• Similar shape as normal distribution (more spread out with longer tails) – as the degrees of freedom increase its shape approaches Normality

• Useful for calculating confidence intervals for testing hypotheses about one or two means

THEORETICAL PROBABILITY

DISTRIBUTIONS 18

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

35

**THE CHI-SQUARE (**

### χ

**2**

_{) DISTRIBUTION}

_{) DISTRIBUTION}

### • a

right skewed distribution taking positive values • characterized by itsdegrees of freedom• its shape depends on the degrees of freedom – it

becomes more symmetrical and approaches Normality as they increase

• useful for analysing categorical data

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

36

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

37

• skewed to the right

• defined by a ratio – the distribution of a ratio of two estimated variances calculated from Normal dana approximates the F-distritution

• characterized by degrees of freedom of the numerator and the denominator of the ratio

• useful for comparing two variances, and more than two means using the analysis of variance

**THE LOGNORMAL DISTRIBUTION**

• the probability distribution of a random variable whose log (to base 10 or e) follows the Normal distribution • highly skewed to the right

• logs of row data skewed to the right →an empirical distribution that is nearly Normal = data approximate Log-normal distribution

THEORETICAL PROBABILITY

DISTRIBUTIONS 20

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

39

**THE LOGNORMAL DISTRIBUTION**

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

40

**THE BINOMIAL DISTRIBUTION**

### • theoretical distribution for discrete random

### variable

### • definition: Jacob Bernuolli, 1700.

### • two outcomes: “success” i “failure”

### • n events

– E.g. n = 100 unrelated women undergoing IVF

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

41

• Two parameters that describe the Binomial distribution: n = number of indivudial in the sample (or repetitions of a trial)

π= the true probability of success for each individual (or in each trial)

X~B(n,p)

**THE BINOMIAL DISTRIBUTION**

• Mean = nπ

(the value for the random variable that we expect if we look at n individuals, or repeat the trial n times)

• Variance = nπ(1-π) • small n

– the distribution is skewed to the right if π<0.5 – the distribution is skewed to dhe right if π>0.5

THEORETICAL PROBABILITY

DISTRIBUTIONS 22

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

43

**THE BINOMIAL DISTRIBUTION**

• the distribution becomes more symmetrical as the sample size increases and approximates to the Normal

distribution if both nπandnπ(1 –π) aregreater than 5

• the properties of the Binomial distribution could be use when making inferences about proportions

• the Normal approximation of the Binomial distribution when analyzing proportions is often used

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

44

**THE BINOMIAL DISTRIBUTION**

Example: gene recombination

Chromosomal locus: 2 allels: A anda

*p*= probability of A

*Q *= 1−p = probability ofa

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

45

conception→outcame space:{AA, Aa, aa}

P(AA) = P(A) * P(A)= p2

P(aa) = P(a) * P(a) = q2

P(Aa) = P(A) * P(a) = pq

P(aA) = P(a) * P(A)= qp 2pq ______

1,0

p2_{+ 2pq + q}2_{= (p+q)}2 _{= 1}2_{= 1}

THEORETICAL PROBABILITY

DISTRIBUTIONS 24

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

47

**THE BINOMIAL DISTRIBUTION**

– Example – probability of genotypes:

• frequency of gene A = 0,33
• frequency of gene a = 0,67
(p+q)2_{= (0,33 + 0,67)}2_{= 0,33}2_{+ 2 * 0,33 * 0,67 + 0,67}2
P (AA)= 0,332 _{= 0.1089}
P (Aa) = 0,33 * 0,67 = 0,2211
P (aA) = 0,67 * 0,33 = 0,2211
P (aa) = 0,672 _{= 0,4489}
27.06.2006 THEORETICAL PROBABILITY
DISTRIBUTIONS
48

**THE BINOMIAL DISTRIBUTION**

Graphical presentatnion – probabilities of different genotypes

**0**
**0,05**
**0,1**
**0,15**
**0,2**
**0,25**
**0,3**
**0,35**
**0,4**
**0,45**
**0,5**
**AA** **Aa** **aa**
**P**

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

49

– Example – death outcome as binomial distribution:

• Letality od neke bolesti = 0,30 …..(30/100) • Survival probability = 0,70

• n = 5

• Binom: (0,30 + 0,70)5

Number of

death examinees Binom Probability
5 (everybody)
4
3
2
1
0 (nobody)
P5
5p4_{q }
10p3_{q}2
10p2_{q}3
5pq4
q5
0,00243
0,02835
0,13230
0,30870
0,36015
0,16807
Total 1,00000

**THE POISSON DISTRIBUTION**

• Poisson (begining of XIX century)

• the Poisson random variable= the count or the number of events that occur independently and randomly in time or space at some average rate, µ (0 and all positive integers)

– example: the number of hospital admissions per day typically follows the Poisson distribution

→use of the Poisson cistribution to calculate the probability of a certain number of admissions on any particular day

THEORETICAL PROBABILITY

DISTRIBUTIONS 26

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

51

**THE POISSON DISTRIBUTION**

• Mean (average rate, µ)= the parameter that describes the Poisson distribution

• The meanequals the variancein the Poisson distribution

• Unimodalcurve, right skewedif the mean is small, but becomes more symmetrical as the mean increases, when it approximates n Normal distribution