• No results found

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

N/A
N/A
Protected

Academic year: 2021

Share "ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections 5.1-5.2] The joint probability mass function of two discrete random quantities X, Y is

p x y

 

,  P

Xx and Yy

The marginal probability mass functions are

( ) ( , ) X y p x

p x y  P

Xx

Y( ) ( , ) x p y

p x y Example 7.01

Find the marginal p.m.f.s for the following joint p.m.f.

 

, p x y y = 3 y = 4 y = 5 pX

 

x x = 0 .30 .10 .20 .60 x = 1 .20 .05 .15 .40

 

Y p y .50 .15 .35 1

[Check that both marginal p.m.f.s, (row and column totals), each add up to 1.]

Like any other probability mass function,

a joint p.m.f. must be non-negative for all (x, y) and be coherent,

 

, 1 x y

p x y

  



.

It is easy to check that both conditions are satisfied in example 7.01. The random quantities X and Y are independent if and only if

 

, X

 

Y

 

 

, p x yp xp yx y

(2)

In example 7.01, pX

 

0 pY

 

4 .60 .15 .09  , but p

 

0, 4 .10. Therefore X and Y are dependent,

[despite p x

 

, 3  pX

 

xpY

 

3 for x = 0 and x = 1 !].

For any two possible events A and B , conditional probability is defined by

] [ P ] [ P ] | [ P B B A B

A   , which leads to the conditional probability mass functions

) ( ) , ( ) | ( | p x y x p x y X Y p X  and ) ( ) , ( ) | ( | p y y x p y x Y X p Y  . In example 7.01,

 

, p x y y = 3 y = 4 y = 5 pX

 

x x = 0 .30 .10 .20 .60 x = 1 .20 .05 .15 .40

 

Y p y .50 .15 .35 1 pY X|

5 | 0

” means “P[Y = 5 | X = 0]”. p

 

0, 5 ” means “P[X = 0 and Y = 5]” 3 1 60 . 20 . ) 0 ( ) 5 , 0 ( ) 0 | 5 ( |    X X Y p p p pX

 

0 ” means “P[X = 0]” Compare with P[Y = 5] = .35 :

(3)

Joint Probability Density Functions [for bonus questions only]

The continuous analogue to the discrete joint probability mass function is the joint probability density function, f (x, y).

It must satisfy the conditions f x y

 

, 0 

 

x y, and f x y dy dx

 

, 1  

  

 

.

Joint density functions are related to probability statements by integration over intervals:

 

 

P , d b c a aXb c Y df x y dx dy    

 

Marginal p.d.f.s are defined by

 

 

, X f x f x y dy   

and fY

 

y f x y dx

 

,   

Conditional p.d.f.s are defined by

 

 

| , | Y X X f x y f y x f x  and

 

 

| , | X Y Y f x y f x y f y  .

Two continuous distributions are independent if and only if

 

, X

 

Y

 

 

,

f x yf xf yx y

Example 7.02 (Navidi, exercises 2.6, page 156, question 20) [bonus question only] Let X denote the amount of shrinkage (in %) undergone by a randomly chosen fibre of a certain type when heated to a temperature of 120C. Let Y represent the additional shrinkage (in %) when the fibre is heated to 140C. The joint probability density function of X and Y is given by

 

2 48 3 4 and 0.5 1 , 49 0 otherwise x y x y f x y         

(a) Find P

X 3.25 and Y 0.8

.

(b) Find the marginal probability density functions fX

 

x and fY

 

y . (c) Are X and Y independent? Explain.

(a)

 

3.25 0.8 P X 3.25 and Y 0.8 f x y dx dy,     

 

2 1 3.25 2 0.8 3 3.25 1 3 0.8 48 48 49 49 x y dx dy x dx y dy

 

2 3.25 3 1 3 0.8 48 48 10.5625 9 1 0.512 61 49 2 3 49 2 3 490 x y                         .124

(4)

Example 7.02 (continued) (b)

 

 

1 2 3 1 0.5 0.5 48 48 16 1 2 , 1 49 49 3 49 8 7 X x y y x f x f x y dy dy x x              

(for 3 < x < 4 only) and

 

 

2 2 2 2

2 4 4 3 3 48 48 24 24 , 16 9 49 49 2 49 7 Y x y x y f y f x y dx dx y y              

(for 0.5 < y < 1 only).

(c) For all (x, y) such that 3 < x < 4 and 0.5 < y < 1,

 

 

2 24 2 48 2

 

, 7 7 49 X Y x y xy f xf y     f x y

Therefore yes, X and Y are independent.

Whenever f x y

 

, g x h y

     

  x y, on a rectangular domain aligned with the coordinate axes (or on all of 2), the random quantities X and Y are independent.

These concepts of joint probability distributions (both discrete and continuous) can be extended to the cases of three or more random quantities.

(5)

Expected Value E[ ( , )] ( , ) ( , ) x y h X Y



h x yp x y or h x y f x y dx dy

   

, ,    

 

A measure of linear dependence is the covariance of X and Y :





Cov[ , ]X Y E X E[ ]X Y E[ ]Y x X y Y p x y( , ) x y

 

    



 

or, in the continuous case,





  

Cov[ , ]X Y E X E[ ]X Y E[ ]Y xX yY f x y dx dy,  

 

    

 

 

Manipulate the double summation:

Cov[ , ]X Y xy Xy x Y X Y p x y( , ) x y     



  

 

, X

 

, Y

 

, X Y

 

, xy p x y y p x y x p x y p x y x y y x x y x y     



 

 

 

 

 



 

 

 

E X YXE YYE X  X Y     Therefore

     

Cov[ , ]X Y  E X Y E X E Y for both discrete and continuous random quantities.

Note that V[ X ] = Cov[ X, X ] .

In Example 7.01, find the covariance of X and Y :

 

, p x y y = 3 y = 4 y = 5 pX

 

x x = 0 .30 .10 .20 .60 x = 1 .20 .05 .15 .40

 

Y p y .50 .15 .35 1

(6)

Example 7.01 (continued)

 

1

 

0 0 .60 .40 E X 1 x X x p x        

0.40

 

5

 

3 3 .50 4 .15 5 .3 E 5 Y y Y y p y          

3.85

 

1 5

 

0 3 E , x y x y p x y X Y    



x y 3 4 5 0 0  3  .30 0  4  .10 0  5  .20 1 1  3  .20 1  4  .05 1  5  .15 = 0 + 0 + 0 + .60 + .20 + .75 = 1.55 Cov

X Y,

 E

     

X Y E X E Y = 1.55 − 0.403.85 = 0.01

Note that the covariance depends on the units of measurement. If X is re-scaled by a factor c and Y by a factor k , then

    

 

 

 

Cov cX kY,  E cX kY E cX E kYckE X YcE XkE Y

 

   

E E E

Cov

,

ck X Y X Y ck X Y    

A special case is V

 

cX  Cov

cX cX,

c2V

 

X .

This dependence on the units of measurement of the random quantities can be eliminated by dividing the covariance by the product of the standard deviations of the two random quantities.

(7)

The correlation coefficient of X and Y is Corr

X Y,

 X Y,  Cov[ , ] E[ ] E[ ] E[ ] V[ ] V[ ] X Y X Y X Y X Y X Y          In Example 7.01,

 

1 2 2 2 2 0 0 .60 1 4 E X . 0 x x p X x        

    0.40 V[ X ] = EX2 

E

 

X

2  0.40

0.40

2  0.24

 

5 2 2 3 E Y y y p y Y       

  15.65 V[ Y ] = 15.65

3.85

2  0.827 5 0.01 0.022 4 0.24 0.8275    

[See the Excel file

"www.engr.mun.ca/~ggeorge/4421/demos/jointpmf.xls"].

Example 7.02 For a joint uniform probability distribution:

(and noting

x



y

x y

x y

 



  ): n possible points, p(x, y) = 1/n for each.

[When  1 exactly, then Ya Xb exactly, with sign

 

a .]

In general, for constants a, b, c, d, with a and c both positive or both negative,

Corr aXb cY, d  Corr X Y, Also: 1    +1.

(8)

Rule of thumb:

 .8  strong correlation .5  . 8  moderate correlation

 .5  weak correlation

In the example above,  = 0.0224  very weak correlation (almost uncorrelated). X , Y are independent  p(x, y) = pX

 

xpY

 

y

 

   

 

 

   

E X Y x y p x p y x p x y p y E X E Y

 



  

 

  

 Cov[X , Y ] = E[X Y]E[ ] E[ ]XY0. It then follows that

X , Y are independent  X , Y are uncorrelated ( = 0) , but X , Y are uncorrelated  X , Y are independent .

Counterexample (7.03):

Let the points shown be equally likely. Then the value of Y is completely determined by the value of X . The two random quantities are thus highly dependent. Yet they are uncorrelated! 2 YX (dependent)

 

E X 0 (by symmetry)

 

3 E X Y EX0 (sym.)

Cov X Y,  E[X Y]E[ ] E[ ]XY   0 0 E

 

Y  0 Therefore  0 !

(9)

Linear Combinations of Random Quantities

Let the random quantity Y be a linear combination of n random quantities Xi :

1 i n i i Y a X

[Note: lower case ai is constant; upper case Xi is random]

 

1 1 E then E E i i n i i n i i a X Y a X              

(linear function) 1 Y i n i i a     

But it can be shown that

1 1 V[ ] Cov[ , ] i j n n i j i j Y a a X X   



 

Xi independent  2 1 V[ ] V i n i i Y a X  

   Special case: n2, a11, a2  1:

1 2

E XX  12 and

1 2

V XX  2 2

 

2 2 2 2 1 2 1 2 1  1    

[“The variance of a difference is the sum of the variances”]

[It cannot be a difference - that would lead to negative variance when 2 1 .]

Example 7.04

Two runners’ times in a race are independent random quantities T1 and T2 , with 1 40   2 1 4   , 2 42   2 2 4   ,

Find ET1T2 and VT1T2. [

T1T2

is runner 2’s victory margin]

1 2 1 2 ETT     2 2 2 1 2 1 2 VTT      4 4  8

(10)

Distribution of the Sample Mean

If a random sample of size n is taken and the observed values are

X1,X2,X3, ,Xn

then the Xi are independent and identically distributed (iid) (each with population mean  and population variance 2

) and two more random quantities can be defined: Sample total:

i i X T Sample mean:

  i i X n n T X 1

 

 

1 E E i 1 E i i X n X n X n n           

 Therefore E   X  2 Also V V 1 i 1 V i i i X n X X n                

 

2 1 V are independent i i i X X n

   2 2 2 2 2 2 1 1 i i n n n n    

  Therefore 2 V X n       as n , X .    

and the “standard error of the mean” (“SE Mean” in Minitab) is defined to be

n

(11)

Unbiased estimator A for Biased estimator B for some unknown parameter  : the unknown parameter  :

E[A] =  E[B]  

Which estimator should we choose to estimate  ?

A is unbiased

BUT

B is more efficient

(accurate)

If P["B is closer than A to θ"] is high, then choose B ,

else choose A .

A minimum variance unbiased estimator is ideal. [See also Problem Set 7 Question 8]

(12)

Accuracy and Precision (Example 7.05) [Navidi section 3.1; Devore section 6.1] An archer fires several arrows at the same target. [The fable of William Tell, forced to

use a cross-bow to shoot an apple off his son’s head]

Precise

but [William is precise, but his son dies every time!] biased

Unbiased

[William hits his son occasionally, but often misses both son and apple, but,

on average, is centred on the apple!] not precise

Unbiased and

precise [William hits the apple most of the time, to the relief of his son!]

= accurate

Error = Systematic error + Random error (error)2 = (bias)2 + V[estimator] Estimator A for  is consistent iff

 

E A  and V

 

A 0 (as n )

(13)

Sample Mean

A random sample of n values

X1,X2,X3, ,Xn

is drawn from a population of mean  and standard deviation  .

Then E   Xi , V   Xi 2 and the sample mean

1 1 i n i X X n

. . estimates  X

 

 

n X X 2 V , E     

But, if  is unknown, then 2 is unknown (usually). Sample Variance

and the sample standard deviation is 2 S Sn  1 = number of degrees of freedom for S 2 . Justification for the divisor (n  1) [not examinable]: Using

 

2

 

2

V Y E  Y  E Y for all random quantities Y ,

 

 

2 2 2 2 E   Y V Y  E Y  Y Y

2 2 2 1 E i E i i i i i X X X X n             

 

 

  2 2 2 2 set set 1 1 E i E i E i E i i i i i i i Y X Y X X X X X n n                          

 

 

2 2 1 V i E i V i E i i i i X X X X n          

 

  

 

2 2

1

2

 

2

. . . n n i i d n     

   2 2 nn   2 2 n    

2 1 n   

 

2

2

2 2 1 2 ... 1 n X X X X X X S n        

2 2 ( 1) i i n X X n n   

(14)

Therefore

2 2 2 2 E E 1 i i X X S n                  

but

2 2 2 1 1 E i i n X X n n          

- biased! 2

S is the minimum variance unbiased estimator of 2 and X is the minimum variance unbiased estimator of . Both estimators are also consistent.

Inference – Some Initial Considerations

Is a null hypothesis Ho true (our “default belief”), or do we have sufficient evidence to reject Ho in favour of the alternative hypothesis HA?

o

H could be “defendant is not guilty” or “  o” , etc.

The corresponding HA could be “defendant is guilty” or “  o”, etc.

The burden of proof is on HA. [innocent person convicted]

[guilty person goes free]

Bayesian analysis:  is treated as a random quantity. Data are used to modify prior belief about . Conclusions are drawn using both old and new information.

Classical analysis: Data are used to draw conclusions about , without using any prior information.

References

Related documents

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in

• Follow up with your employer each reporting period to ensure your hours are reported on a regular basis?. • Discuss your progress with

The uniaxial compressive strengths and tensile strengths of individual shale samples after four hours exposure to water, 2.85x10 -3 M cationic surfactant

As part of the Big Data Audit, we will perform an audit of your existing Hadoop installation and provide a list of recommendations to get your Big Data solution up to best

Biological control is the use of living organisms, such as predators, parasitoids, and pathogens, to control pest insects, weeds, or diseases.. Other items addressed

National Conference on Technical Vocational Education, Training and Skills Development: A Roadmap for Empowerment (Dec. 2008): Ministry of Human Resource Development, Department

Small molecule developmental screens reveal the logic and timing of vertebrate development... Small molecule developmental screens reveal the logic and timing of

• General clerical skills or experience (filing, telephones, copying, faxing, organizing) • Computer aptitude a plus (Microsoft Office Suite- Word, Excel; ability to use email)