EMPIRICAL FREQUENCY DISTRIBUTION

26 

Loading....

Loading....

Loading....

Loading....

Loading....

Full text

(1)

INTRODUCTION TO MEDICAL STATISTICS:

THEORETICAL PROBABILITY

DISTRIBUTIONS

Mirjana KujundžićTiljak

• EMPIRICAL FREQUENCY DISTRIBUTION

– observed data

• THEORETICAL PROBABILITY DISTRIBUTION

- described by mathematical models

(2)

THEORETICAL PROBABILITY

DISTRIBUTIONS 2

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

3

• when some empirical distribution approximates a

particular probability distribution – theoretical

knowledge of that distribution could be used

answer questions about data

• evaluation of probabilities is required

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

4

PROBABILITY (P)

• measures uncertainty

• measures the chance of a given event occurring

0

P

1

– P = 0 →event cannot occur – P = 1 →event mustoccur

– Q = 1-P →probability of the complementary event (the event not occurring)

(3)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

5

• Various approaches in probability calculations:

– Subjective – personal degree of belief that the event will occur

(e.g. the world sill come to an end in the year 2050)

– Frequentist– the proportion of times the event would occur if the experiment will be repeated a large number of times (e.g. the number of times we would get a “head")

– A priori– requires knowledge of the theoretical model –

probability distribution– which describes the probabilities of all possible outcomes of the “experiment (e.g. genetic theory allows us to describe the probability distribution for eye color in a baby born t a blue-eyed women and brown-eyed man by initially specifying all possible genotypes of eye color in the baby and their probabilities)

PROBABILITY (P)

• The addition rule:if two events (A and B) are mutually exclusive →the probability that either one or the other occurs (A or B) is equal to the sum of their probabilities

Prob (A or B) = Prob (A) + Prob (B)

• The multiplication rule:if two events (A and B) are independent →the probability that both events occur (A and B) is equal to the product of the probability of each

(4)

THEORETICAL PROBABILITY DISTRIBUTIONS 4 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 7

RANDOM VARIABLES

• random variable

– a quantity that can take any

one of a set of mutally excluseve values with a

given probability

discrete or discontinuous random variable

=

numerical values are integer

• E.g. number of children in family – 0, 1, 2, 3, … k

continuus random variable

=

numerical values are real numbers

• E.g. body weight 72,35 kg, blood glucose level 7,2 mmol/l

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

8

PROBABILITY DISTRIBUTION

• Probability distribution– shows the probabilities of all

possible values of the random variable

– a theoretical distribution that is expressed mathematically – has a mean and variance that are analogous to those of and

empirical distribution

• parameters– summary measures (e.g. mean, variance) characterizing that distribution → are estimated in the sample by relevant statistics

• depending on whether the random variable is discrete or continuous →the probability distribution can be either discrete or continuous

(5)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

9

• DISCRETE (Binomial, Poisson)

– the probability can be derived corresponding to every possible value of the random variable

– the sum of all such probabilitis is 1

PROBABILITY DISTRIBUTIONS

• CONTINUOUS (Normal, Chi-squared, t, F)

– the probability of the random variable, x, taking values in certain ranges, could be derived

– if the horizontal axis represents the values of x →the curve from the equation of the distribution could be drawn (= probability density function)

– Total area under the curve = 1 →represents the probability of all possible events

• Probability that x lies between two limits is equal to the area under the curve between these values

(6)

THEORETICAL PROBABILITY DISTRIBUTIONS 6 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 11

PROBABILITY DISTRIBUTIONS

• Probability that x lies between two limits?

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

12

PROBABILITY DISTRIBUTIONS

(7)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

13

• one of the most important distributions in

statistics

• german mathematician C.F. Gauss

• the most biological measurements follow normal

distribution

• it is used in many analytical models

THE NORMAL (GAUSSIAN) DISTRIBUTION

• Probability density function:

f (x) = (1/

σ√2π)

e

a

(8)

THEORETICAL PROBABILITY

DISTRIBUTIONS 8

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

15

THE NORMAL (GAUSSIAN) DISTRIBUTION

Completely described by two parameters:

-

mean (

µ

)

- variance (

σ

2

)

X ~ N (

µ,σ

2

)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 16

(9)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

17 • normal distribution curve:

– area under curve = 1 – bell-shaped (unimodal= – symmetrical about its mean – apsolute maximum for x = µ

– shifted to the right if the mean is increased and to the left if the mean is decreased (assuming constant variance)

– flattened as the variance is increased but becomes more peaked as the variance is decreased (for a ficed mean)

THE NORMAL (GAUSSIAN) DISTRIBUTION

• the mean and median and mode of a Normal distribution are equal

• the probability (P)that a normally distributed random variable, x, with mean, µ, and standard deviation, σ, lies between:

(µ -σ) and (µ + σ) = 0,68 (µ - 1.96σ) and (µ + 1.96σ) = 0.95 (µ – 2.58σ) and (µ + 2.58σ) = 0.99

→these intervals may be used to define reference intervals

(10)

THEORETICAL PROBABILITY

DISTRIBUTIONS 10

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

19

THE NORMAL (GAUSSIAN) DISTRIBUTION

• changing µ, constant

σ

:

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

20

THE NORMAL (GAUSSIAN) DISTRIBUTION

(11)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

21

• changing

σ

, constant

µ

:

THE NORMAL (GAUSSIAN) DISTRIBUTION

(12)

THEORETICAL PROBABILITY

DISTRIBUTIONS 12

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

23

THE NORMAL (GAUSSIAN) DISTRIBUTION

• changing

σ

, constant

µ

:

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

24

THE STANDARD NORMAL DISTRIBUTION

• transformation of original value (x) to

Standardized Normal Deviate (SND) (z

i

):

z

i

= (x

1

-

µ)/σ

= random variable that has a Standard Normal distribution

• sample:

z

i

= (x

1

-

x)/s

(13)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 25

X

1

Z

1

X

2

Z

2

X

3

Z

3

X

n

Z

n

, s

=?, s

z

=?

THE STANDARD NORMAL DISTRIBUTION

X

1

Z

1

X

2

Z

2

X

3

Z

3

X

n

Z

n

, s

=0, s

z

=1

(14)

THEORETICAL PROBABILITY

DISTRIBUTIONS 14

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

27

THE STANDARD NORMAL DISTRIBUTION

X

1

Z

1

X

2

Z

2

X

3

Z

3

X

n

Z

n

, s

=0, s

z

=1

Z~N(0,1)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 28

(15)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

29

(16)

THEORETICAL PROBABILITY

DISTRIBUTIONS 16

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

31

THE STANDARD NORMAL DISTRIBUTION

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

32

(17)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

33

• W.S. Gossett (pseudonym Student)

• parameter that characterizes the t-distribution = the degrees of freedom

• Similar shape as normal distribution (more spread out with longer tails) – as the degrees of freedom increase its shape approaches Normality

• Useful for calculating confidence intervals for testing hypotheses about one or two means

(18)

THEORETICAL PROBABILITY

DISTRIBUTIONS 18

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

35

THE CHI-SQUARE (

χ

2

) DISTRIBUTION

• a

right skewed distribution taking positive values • characterized by itsdegrees of freedom

• its shape depends on the degrees of freedom – it

becomes more symmetrical and approaches Normality as they increase

• useful for analysing categorical data

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

36

(19)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

37

• skewed to the right

• defined by a ratio – the distribution of a ratio of two estimated variances calculated from Normal dana approximates the F-distritution

• characterized by degrees of freedom of the numerator and the denominator of the ratio

• useful for comparing two variances, and more than two means using the analysis of variance

THE LOGNORMAL DISTRIBUTION

• the probability distribution of a random variable whose log (to base 10 or e) follows the Normal distribution • highly skewed to the right

• logs of row data skewed to the right →an empirical distribution that is nearly Normal = data approximate Log-normal distribution

(20)

THEORETICAL PROBABILITY

DISTRIBUTIONS 20

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

39

THE LOGNORMAL DISTRIBUTION

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

40

THE BINOMIAL DISTRIBUTION

• theoretical distribution for discrete random

variable

• definition: Jacob Bernuolli, 1700.

• two outcomes: “success” i “failure”

• n events

– E.g. n = 100 unrelated women undergoing IVF

(21)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

41

• Two parameters that describe the Binomial distribution: n = number of indivudial in the sample (or repetitions of a trial)

π= the true probability of success for each individual (or in each trial)

X~B(n,p)

THE BINOMIAL DISTRIBUTION

• Mean = nπ

(the value for the random variable that we expect if we look at n individuals, or repeat the trial n times)

• Variance = nπ(1-π) • small n

– the distribution is skewed to the right if π<0.5 – the distribution is skewed to dhe right if π>0.5

(22)

THEORETICAL PROBABILITY

DISTRIBUTIONS 22

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

43

THE BINOMIAL DISTRIBUTION

• the distribution becomes more symmetrical as the sample size increases and approximates to the Normal

distribution if both nπandnπ(1 –π) aregreater than 5

• the properties of the Binomial distribution could be use when making inferences about proportions

• the Normal approximation of the Binomial distribution when analyzing proportions is often used

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

44

THE BINOMIAL DISTRIBUTION

Example: gene recombination

Chromosomal locus: 2 allels: A anda

p= probability of A

Q = 1−p = probability ofa

(23)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

45

conception→outcame space:{AA, Aa, aa}

P(AA) = P(A) * P(A)= p2

P(aa) = P(a) * P(a) = q2

P(Aa) = P(A) * P(a) = pq

P(aA) = P(a) * P(A)= qp 2pq ______

1,0

p2+ 2pq + q2= (p+q)2 = 12= 1

(24)

THEORETICAL PROBABILITY

DISTRIBUTIONS 24

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

47

THE BINOMIAL DISTRIBUTION

– Example – probability of genotypes:

• frequency of gene A = 0,33 • frequency of gene a = 0,67 (p+q)2= (0,33 + 0,67)2= 0,332+ 2 * 0,33 * 0,67 + 0,672 P (AA)= 0,332 = 0.1089 P (Aa) = 0,33 * 0,67 = 0,2211 P (aA) = 0,67 * 0,33 = 0,2211 P (aa) = 0,672 = 0,4489 27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS 48

THE BINOMIAL DISTRIBUTION

Graphical presentatnion – probabilities of different genotypes

0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 0,45 0,5 AA Aa aa P

(25)

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

49

– Example – death outcome as binomial distribution:

• Letality od neke bolesti = 0,30 …..(30/100) • Survival probability = 0,70

• n = 5

• Binom: (0,30 + 0,70)5

Number of

death examinees Binom Probability 5 (everybody) 4 3 2 1 0 (nobody) P5 5p4q 10p3q2 10p2q3 5pq4 q5 0,00243 0,02835 0,13230 0,30870 0,36015 0,16807 Total 1,00000

THE POISSON DISTRIBUTION

• Poisson (begining of XIX century)

• the Poisson random variable= the count or the number of events that occur independently and randomly in time or space at some average rate, µ (0 and all positive integers)

– example: the number of hospital admissions per day typically follows the Poisson distribution

→use of the Poisson cistribution to calculate the probability of a certain number of admissions on any particular day

(26)

THEORETICAL PROBABILITY

DISTRIBUTIONS 26

27.06.2006 THEORETICAL PROBABILITY DISTRIBUTIONS

51

THE POISSON DISTRIBUTION

• Mean (average rate, µ)= the parameter that describes the Poisson distribution

• The meanequals the variancein the Poisson distribution

• Unimodalcurve, right skewedif the mean is small, but becomes more symmetrical as the mean increases, when it approximates n Normal distribution

Figure

Updating...

References

Updating...

Related subjects :