# EMPIRICAL FREQUENCY DISTRIBUTION

26

## Full text

(1)

INTRODUCTION TO MEDICAL STATISTICS:

### DISTRIBUTIONS

Mirjana KujundžićTiljak

– observed data

### • THEORETICAL PROBABILITY DISTRIBUTION

- described by mathematical models

(2)

THEORETICAL PROBABILITY

DISTRIBUTIONS 2

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

3

### • evaluation of probabilities is required

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

4

## PROBABILITY (P)

### 1

– P = 0 →event cannot occur – P = 1 →event mustoccur

– Q = 1-P →probability of the complementary event (the event not occurring)

(3)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

5

• Various approaches in probability calculations:

– Subjective – personal degree of belief that the event will occur

(e.g. the world sill come to an end in the year 2050)

– Frequentist– the proportion of times the event would occur if the experiment will be repeated a large number of times (e.g. the number of times we would get a “head")

– A priori– requires knowledge of the theoretical model –

probability distribution– which describes the probabilities of all possible outcomes of the “experiment (e.g. genetic theory allows us to describe the probability distribution for eye color in a baby born t a blue-eyed women and brown-eyed man by initially specifying all possible genotypes of eye color in the baby and their probabilities)

## PROBABILITY (P)

• The addition rule:if two events (A and B) are mutually exclusive →the probability that either one or the other occurs (A or B) is equal to the sum of their probabilities

Prob (A or B) = Prob (A) + Prob (B)

• The multiplication rule:if two events (A and B) are independent →the probability that both events occur (A and B) is equal to the product of the probability of each

(4)

THEORETICAL PROBABILITY DISTRIBUTIONS 4 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 7

### numerical values are integer

• E.g. number of children in family – 0, 1, 2, 3, … k

### numerical values are real numbers

• E.g. body weight 72,35 kg, blood glucose level 7,2 mmol/l

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

8

### PROBABILITY DISTRIBUTION

• Probability distribution– shows the probabilities of all

possible values of the random variable

– a theoretical distribution that is expressed mathematically – has a mean and variance that are analogous to those of and

empirical distribution

• parameters– summary measures (e.g. mean, variance) characterizing that distribution → are estimated in the sample by relevant statistics

• depending on whether the random variable is discrete or continuous →the probability distribution can be either discrete or continuous

(5)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

9

### • DISCRETE (Binomial, Poisson)

– the probability can be derived corresponding to every possible value of the random variable

– the sum of all such probabilitis is 1

### • CONTINUOUS (Normal, Chi-squared, t, F)

– the probability of the random variable, x, taking values in certain ranges, could be derived

– if the horizontal axis represents the values of x →the curve from the equation of the distribution could be drawn (= probability density function)

– Total area under the curve = 1 →represents the probability of all possible events

• Probability that x lies between two limits is equal to the area under the curve between these values

(6)

THEORETICAL PROBABILITY DISTRIBUTIONS 6 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 11

### • Probability that x lies between two limits?

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

12

### PROBABILITY DISTRIBUTIONS

(7)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

13

### e

a

(8)

THEORETICAL PROBABILITY

DISTRIBUTIONS 8

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

15

2

2

### )

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 16

(9)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

17 • normal distribution curve:

– area under curve = 1 – bell-shaped (unimodal= – symmetrical about its mean – apsolute maximum for x = µ

– shifted to the right if the mean is increased and to the left if the mean is decreased (assuming constant variance)

– flattened as the variance is increased but becomes more peaked as the variance is decreased (for a ficed mean)

### THE NORMAL (GAUSSIAN) DISTRIBUTION

• the mean and median and mode of a Normal distribution are equal

• the probability (P)that a normally distributed random variable, x, with mean, µ, and standard deviation, σ, lies between:

(µ -σ) and (µ + σ) = 0,68 (µ - 1.96σ) and (µ + 1.96σ) = 0.95 (µ – 2.58σ) and (µ + 2.58σ) = 0.99

→these intervals may be used to define reference intervals

(10)

THEORETICAL PROBABILITY

DISTRIBUTIONS 10

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

19

### :

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

20

### THE NORMAL (GAUSSIAN) DISTRIBUTION

(11)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

21

### THE NORMAL (GAUSSIAN) DISTRIBUTION

(12)

THEORETICAL PROBABILITY

DISTRIBUTIONS 12

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

23

### :

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

24

i

i

1

### µ)/σ

= random variable that has a Standard Normal distribution

i

1

### x)/s

(13)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 25

1

1

2

2

3

3

n

n

z

1

1

2

2

3

3

n

n

z

### =1

(14)

THEORETICAL PROBABILITY

DISTRIBUTIONS 14

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

27

1

1

2

2

3

3

n

n

z

### Z~N(0,1)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 28

(15)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

29

(16)

THEORETICAL PROBABILITY

DISTRIBUTIONS 16

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

31

### THE STANDARD NORMAL DISTRIBUTION

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

32

(17)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

33

• W.S. Gossett (pseudonym Student)

• parameter that characterizes the t-distribution = the degrees of freedom

• Similar shape as normal distribution (more spread out with longer tails) – as the degrees of freedom increase its shape approaches Normality

• Useful for calculating confidence intervals for testing hypotheses about one or two means

(18)

THEORETICAL PROBABILITY

DISTRIBUTIONS 18

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

35

2

### • a

right skewed distribution taking positive values • characterized by itsdegrees of freedom

• its shape depends on the degrees of freedom – it

becomes more symmetrical and approaches Normality as they increase

• useful for analysing categorical data

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

36

(19)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

37

• skewed to the right

• defined by a ratio – the distribution of a ratio of two estimated variances calculated from Normal dana approximates the F-distritution

• characterized by degrees of freedom of the numerator and the denominator of the ratio

• useful for comparing two variances, and more than two means using the analysis of variance

### THE LOGNORMAL DISTRIBUTION

• the probability distribution of a random variable whose log (to base 10 or e) follows the Normal distribution • highly skewed to the right

• logs of row data skewed to the right →an empirical distribution that is nearly Normal = data approximate Log-normal distribution

(20)

THEORETICAL PROBABILITY

DISTRIBUTIONS 20

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

39

### THE LOGNORMAL DISTRIBUTION

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

40

### • n events

– E.g. n = 100 unrelated women undergoing IVF

(21)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

41

• Two parameters that describe the Binomial distribution: n = number of indivudial in the sample (or repetitions of a trial)

π= the true probability of success for each individual (or in each trial)

X~B(n,p)

### THE BINOMIAL DISTRIBUTION

• Mean = nπ

(the value for the random variable that we expect if we look at n individuals, or repeat the trial n times)

• Variance = nπ(1-π) • small n

– the distribution is skewed to the right if π<0.5 – the distribution is skewed to dhe right if π>0.5

(22)

THEORETICAL PROBABILITY

DISTRIBUTIONS 22

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

43

### THE BINOMIAL DISTRIBUTION

• the distribution becomes more symmetrical as the sample size increases and approximates to the Normal

distribution if both nπandnπ(1 –π) aregreater than 5

• the properties of the Binomial distribution could be use when making inferences about proportions

• the Normal approximation of the Binomial distribution when analyzing proportions is often used

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

44

### THE BINOMIAL DISTRIBUTION

Example: gene recombination

Chromosomal locus: 2 allels: A anda

p= probability of A

Q = 1−p = probability ofa

(23)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

45

conception→outcame space:{AA, Aa, aa}

P(AA) = P(A) * P(A)= p2

P(aa) = P(a) * P(a) = q2

P(Aa) = P(A) * P(a) = pq

P(aA) = P(a) * P(A)= qp 2pq ______

1,0

p2+ 2pq + q2= (p+q)2 = 12= 1

(24)

THEORETICAL PROBABILITY

DISTRIBUTIONS 24

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

47

### THE BINOMIAL DISTRIBUTION

– Example – probability of genotypes:

• frequency of gene A = 0,33 • frequency of gene a = 0,67 (p+q)2= (0,33 + 0,67)2= 0,332+ 2 * 0,33 * 0,67 + 0,672 P (AA)= 0,332 = 0.1089 P (Aa) = 0,33 * 0,67 = 0,2211 P (aA) = 0,67 * 0,33 = 0,2211 P (aa) = 0,672 = 0,4489 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 48

### THE BINOMIAL DISTRIBUTION

Graphical presentatnion – probabilities of different genotypes

0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 0,45 0,5 AA Aa aa P

(25)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

49

– Example – death outcome as binomial distribution:

• Letality od neke bolesti = 0,30 …..(30/100) • Survival probability = 0,70

• n = 5

• Binom: (0,30 + 0,70)5

Number of

death examinees Binom Probability 5 (everybody) 4 3 2 1 0 (nobody) P5 5p4q 10p3q2 10p2q3 5pq4 q5 0,00243 0,02835 0,13230 0,30870 0,36015 0,16807 Total 1,00000

### THE POISSON DISTRIBUTION

• Poisson (begining of XIX century)

• the Poisson random variable= the count or the number of events that occur independently and randomly in time or space at some average rate, µ (0 and all positive integers)

– example: the number of hospital admissions per day typically follows the Poisson distribution

→use of the Poisson cistribution to calculate the probability of a certain number of admissions on any particular day

(26)

THEORETICAL PROBABILITY

DISTRIBUTIONS 26

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

51

### THE POISSON DISTRIBUTION

• Mean (average rate, µ)= the parameter that describes the Poisson distribution

• The meanequals the variancein the Poisson distribution

• Unimodalcurve, right skewedif the mean is small, but becomes more symmetrical as the mean increases, when it approximates n Normal distribution

Updating...

## References

Updating...

Related subjects :