ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

(1)

Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections 5.1-5.2] The joint probability mass function of two discrete random quantities X, Y is

p x y

 

,  P



X x and Y  y



The marginal probability mass functions are

( ) ( , ) X y p x 



p x y  P



X x



_Y( ) ( , ) x p y 



p x y Example 7.01

Find the marginal p.m.f.s for the following joint p.m.f.

 

, p x y y = 3 y = 4 y = 5 pX

 

x x = 0 .30 .10 .20 .60 x = 1 .20 .05 .15 .40

 

Y p y .50 .15 .35 1

[Check that both marginal p.m.f.s, (row and column totals), each add up to 1.]

Like any other probability mass function,

a joint p.m.f. must be non-negative for all (x, y) and be coherent,

 

, 1 x y

p x y

  



.

It is easy to check that both conditions are satisfied in example 7.01. The random quantities X and Y are independent if and only if

 

, _X

 

_Y

 

, p x y  p x p y  x y

(2)

In example 7.01, p_X

 

0 p_Y

 

4 .60 .15 .09  , but p

 

0, 4 .10. Therefore X and Y are dependent,

[despite p x

 

, 3  p_X

 

x p_Y

 

3 for x = 0 and x = 1 !].

For any two possible events A and B , conditional probability is defined by

] [ P ] [ P ] | [ P B B A B

A   , which leads to the conditional probability mass functions

) ( ) , ( ) | ( | _p _x y x p x y X Y p X  and ) ( ) , ( ) | ( | _p _y y x p y x Y X p Y  . In example 7.01,

 

, p x y y = 3 y = 4 y = 5 pX

 

x x = 0 .30 .10 .20 .60 x = 1 .20 .05 .15 .40

 

Y p y .50 .15 .35 1 “p_{Y X}_|



5 | 0



” means “P[Y = 5 | X = 0]”. “ p

 

0, 5 ” means “P[X = 0 and Y = 5]” 3 1 60 . 20 . ) 0 ( ) 5 , 0 ( ) 0 | 5 ( |    X X Y p p p “p_X

 

0 ” means “P[X = 0]” Compare with P[Y = 5] = .35 :

(3)

Joint Probability Density Functions [for bonus questions only]

The continuous analogue to the discrete joint probability mass function is the joint probability density function, f (x, y).

It must satisfy the conditions f x y

 

, 0 

 

x y, and f x y dy dx

 

, 1  

  

 

.

Joint density functions are related to probability statements by integration over intervals:



 



 

P , d b c a aX b c Y d  f x y dx dy    

 

Marginal p.d.f.s are defined by

 

, X f x f x y dy   



and f_Y

 

y f x y dx

 

,   



Conditional p.d.f.s are defined by





 

_{ }

| , | Y X X f x y f y x f x  and





 

| , | X Y Y f x y f x y f y  .

Two continuous distributions are independent if and only if

 

, _X

 

_Y

 

,

f x y  f x f y  x y

Example 7.02 (Navidi, exercises 2.6, page 156, question 20) [bonus question only] Let X denote the amount of shrinkage (in %) undergone by a randomly chosen fibre of a certain type when heated to a temperature of 120C. Let Y represent the additional shrinkage (in %) when the fibre is heated to 140C. The joint probability density function of X and Y is given by

 









2 48 3 4 and 0.5 1 , 49 0 otherwise x y x y f x y         

(a) Find P



X 3.25 and Y 0.8



.

(b) Find the marginal probability density functions f_X

 

x and f_Y

 

y . (c) Are X and Y independent? Explain.

(a)





 

3.25 0.8 P X 3.25 and Y 0.8 f x y dx dy,     

 

2 1 3.25 ₂ 0.8 3 3.25 1 3 0.8 48 48 49 49 x y dx dy x dx y dy 

 









2 3.25 3 1 3 0.8 48 48 10.5625 9 1 0.512 61 49 2 3 49 2 3 490 x y           _{ } _{ }  _ _ _          .124

(4)

Example 7.02 (continued) (b)

 

1 2 3 1 0.5 0.5 48 48 16 1 2 , 1 49 49 3 49 8 7 X x y y x f x f x y dy dy x x          _ _  _  _     



(for 3 < x < 4 only) and

 

2 2 2 2





2 4 4 3 3 48 48 24 24 , 16 9 49 49 2 49 7 Y x y x y f y f x y dx dx y y        _{ }     



(for 0.5 < y < 1 only).

(c) For all (x, y) such that 3 < x < 4 and 0.5 < y < 1,

 

2 24 2 48 2

 

, 7 7 49 X Y x y xy f x f y     f x y

Therefore yes, X and Y are independent.

Whenever f x y

 

, g x h y

     

  x y, on a rectangular domain aligned with the coordinate axes (or on all of 2), the random quantities X and Y are independent.

These concepts of joint probability distributions (both discrete and continuous) can be extended to the cases of three or more random quantities.

(5)

Expected Value E[ ( , )] ( , ) ( , ) x y h X Y 



h x y p x y or h x y f x y dx dy

   

, ,    

 

A measure of linear dependence is the covariance of X and Y :













Cov[ , ]X Y E X E[ ]X Y E[ ]Y x _X y _Y p x y( , ) x y

 

 _   _ 



 

or, in the continuous case,











  

Cov[ , ]X Y E X E[ ]X Y E[ ]Y x X y Y f x y dx dy,  

 

 _   _ 

 

 

Manipulate the double summation:





Cov[ , ]X Y xy _Xy x _Y _X _Y p x y( , ) x y     



  

 

, _X

 

, _Y

 

, _X _Y

 

, xy p x y y p x y x p x y p x y x y y x x y x y     



 

 

 

 

 



 

E X Y _XE Y _YE X  _X _Y     Therefore

     

Cov[ , ]X Y  E X Y E X E Y for both discrete and continuous random quantities.

Note that V[ X ] = Cov[ X, X ] .

In Example 7.01, find the covariance of X and Y :

 

, p x y y = 3 y = 4 y = 5 pX

 

x x = 0 .30 .10 .20 .60 x = 1 .20 .05 .15 .40

 

Y p y .50 .15 .35 1

(6)

Example 7.01 (continued)

 

1

 

0 0 .60 .40 E _X 1 x X x p x        



0.40

 

5

 

3 3 .50 4 .15 5 .3 E 5 Y y Y y p y          



3.85

 

1 5

 

0 3 E , x y x y p x y X Y    



x  y 3 4 5 0 0  3  .30 0  4  .10 0  5  .20 1 1  3  .20 1  4  .05 1  5  .15 = 0 + 0 + 0 + .60 + .20 + .75 = 1.55 Cov



X Y,



 E

     

X Y E X E Y = 1.55 − 0.403.85 = 0.01

Note that the covariance depends on the units of measurement. If X is re-scaled by a factor c and Y by a factor k , then







    

 

Cov cX kY,  E cX kY E cX E kY  ckE X Y cE X kE Y

 

   



E E E



Cov



,



ck X Y X Y ck X Y    

A special case is V

 

cX  Cov



cX cX,



 c2V

 

X .

This dependence on the units of measurement of the random quantities can be eliminated by dividing the covariance by the product of the standard deviations of the two random quantities.

(7)

The correlation coefficient of X and Y is Corr



X Y,



 _{X Y}_,  Cov[ , ] E[ ] E[ ] E[ ] V[ ] V[ ] X Y X Y X Y X Y X Y          In Example 7.01,

 

1 2 2 2 2 0 0 .60 1 4 E _X . 0 x x p X x        



    0.40 V[ X ] = E_X2 _



E

 

X



2  0.40



0.40



2  0.24

 

5 2 2 3 E _Y y y p y Y       



  15.65 V[ Y ] = 15.65



3.85



2  0.827 5 0.01 0.022 4 0.24 0.8275    

 [See the Excel file

"www.engr.mun.ca/~ggeorge/4421/demos/jointpmf.xls"].

Example 7.02 For a joint uniform probability distribution:

(and noting



_x



_y



x y

 



  ): n possible points, p(x, y) = 1/n for each.

[When  1 exactly, then Y a X b exactly, with sign

 

a .]

In general, for constants a, b, c, d, with a and c both positive or both negative,









Corr aX b cY, d  Corr X Y, Also: 1    +1.

(8)

Rule of thumb:

 .8  strong correlation .5  . 8  moderate correlation

 .5  weak correlation

In the example above,  = 0.0224  very weak correlation (almost uncorrelated). X , Y are independent  p(x, y) = p_X

 

x p_Y

 

y

 

   

 

   

E X Y x y p x p y x p x y p y E X E Y

 



  



 



  

 Cov[X , Y ] = E[X Y]E[ ] E[ ]X  Y  0. It then follows that

X , Y are independent  X , Y are uncorrelated ( = 0) , but X , Y are uncorrelated  X , Y are independent .

Counterexample (7.03):

Let the points shown be equally likely. Then the value of Y is completely determined by the value of X . The two random quantities are thus highly dependent. Yet they are uncorrelated! 2 Y X (dependent)

 

E X 0 (by symmetry)

 

3 E X Y E_X _0 (sym.)





Cov X Y,  E[X Y]E[ ] E[ ]X  Y   0 0 E

 

Y  0 Therefore  0 !

(9)

Linear Combinations of Random Quantities

Let the random quantity Y be a linear combination of n random quantities X_i :

1 i n i i Y a X 





[Note: lower case a_i is constant; upper case X_i is random]

 

1 1 E then E E i i n i i n i i a X Y a X              







(linear function) 1 Y i n i i a     



But it can be shown that

1 1 V[ ] Cov[ , ] i j n n i j i j Y a a X X   



 

X_i independent  2 1 V[ ] V i n i i Y a X  



 _{ } Special case: n2, a₁1, a₂  1:



1 2



E X  X  ₁₂ and



1 2



V X  X  2 2

 

2 2 2 2 1 2 1 2 1  1    

[“The variance of a difference is the sum of the variances”]

[It cannot be a difference - that would lead to negative variance when ₂ ₁ .]

Example 7.04

Two runners’ times in a race are independent random quantities T₁ and T₂ , with 1 40   2 1 4   , 2 42   2 2 4   ,

Find E_T₁T₂_ and V_T₁T₂_. [



T₁T₂



is runner 2’s victory margin]

1 2 1 2 E_T T _     2 2 2 1 2 1 2 V_T T _     4 4  8

(10)

Distribution of the Sample Mean

If a random sample of size n is taken and the observed values are



X₁,X₂,X₃, ,X_n



then the X_i are independent and identically distributed (iid) (each with population mean  and population variance 2

) and two more random quantities can be defined: Sample total:



 i i X T Sample mean:



  i i X n n T X 1

 

1 E E _i 1 E i i X n X n X n n   _         



 Therefore E  _{ }X  2 Also V V 1 _i 1 V _i i i X n X X n   _           _    



 





2 1 V are independent i i i X X n 



 _{ } 2 2 2 2 2 2 1 1 i i n n n n    







  Therefore 2 V X n       as n , X .    

and the “standard error of the mean” (“SE Mean” in Minitab) is defined to be

n



(11)

Unbiased estimator A for Biased estimator B for some unknown parameter  : the unknown parameter  :

E[A] =  E[B]  

Which estimator should we choose to estimate  ?

A is unbiased

BUT

B is more efficient

(accurate)

If P["B is closer than A to θ"] is high, then choose B ,

else choose A .

A minimum variance unbiased estimator is ideal. [See also Problem Set 7 Question 8]

(12)

Accuracy and Precision (Example 7.05) [Navidi section 3.1; Devore section 6.1] An archer fires several arrows at the same target. [The fable of William Tell, forced to

use a cross-bow to shoot an apple off his son’s head]

Precise

but [William is precise, but his son dies every time!] biased

Unbiased

[William hits his son occasionally, but often misses both son and apple, but,

on average, is centred on the apple!] not precise

Unbiased and

precise [William hits the apple most of the time, to the relief of his son!]

= accurate

Error = Systematic error + Random error (error)2 = (bias)2 + V[estimator] Estimator A for  is consistent iff

 

E A  and V

 

A 0 (as n )

(13)

Sample Mean

A random sample of n values



X₁,X₂,X₃, ,X_n



is drawn from a population of mean  and standard deviation  .

Then E  _{ }X_i , V  _{ }X_i 2 and the sample mean

1 1 i n i X X n _ 



. . estimates  X

 

n X X 2 V , E     

But, if  is unknown, then 2 is unknown (usually). Sample Variance

and the sample standard deviation is 2 S S  n  1 = number of degrees of freedom for S 2 . Justification for the divisor (n  1) [not examinable]: Using

 

2



 



2

V Y E _{ }Y  E Y for all random quantities Y ,

 



 



2 2 2 2 E  _{ }Y V Y  E Y  _Y _Y





2 2 2 1 E _i E _i _i i i i X X X X n          _  _  _ _  _ _  



 _



 



 _ 2 2 2 2 set set 1 1 E _i E _i E _i E _i i i i i i i Y X Y X X X X X n n  _         _ _    _ _   _ _   _ _  _ _    _   _ _  _ 



 



 







2 2 1 V i E i V i E i i i i X X X X n  _ _           _  _  _ _ _  _ _ __ _ 



 _ 



  



 _







₂ ₂



1



₂

 

2







. . . n n i i d n     



   2 2 n n   2 2 n    





2 1 n   



 

2



2





2 2 1 2 ... 1 n X X X X X X S n        





2 2 ( 1) i i n X X n n   



(14)

Therefore





2 2 2 2 E E 1 i i X X S n    _          _  _{ }    



but





2 2 2 1 1 E _i i n X X n n     _  _   _     _ _ 



 - biased! 2

S is the minimum variance unbiased estimator of 2 and X is the minimum variance unbiased estimator of . Both estimators are also consistent.

Inference – Some Initial Considerations

Is a null hypothesis Ho true (our “default belief”), or do we have sufficient evidence to reject _H_o in favour of the alternative hypothesis _H_A?

o

H could be “defendant is not guilty” or “  o” , etc.

The corresponding _H_A could be “defendant is guilty” or “  _o”, etc.

The burden of proof is on _H_A. [innocent person convicted]

[guilty person goes free]

Bayesian analysis:  is treated as a random quantity. Data are used to modify prior belief about . Conclusions are drawn using both old and new information.

Classical analysis: Data are used to draw conclusions about , without using any prior information.