Probability and Random Variable Primer

(1)

B. Maddah INDE 504 Simulation 09/02/17

Probability and Random Variable Primer

• Sample space and Events

¾ Suppose that an experiment with an uncertain outcome is performed (e.g., rolling a die).

¾ While the outcome of the experiment is not known in

advance, the set of all possible outcomes is known. This set is the sample space, Ω.

¾ For example, when rolling a die Ω = {1, 2, 3, 4, 5, 6}. When tossing a coin, Ω = {H, T}. When measuring life time of a machine (years), Ω = {1, 2, 3, …}.

¾ A subset E⊂ Ω is known as an event.

¾ E.g., when rolling a die, E = {1} is the event that one appears and F = {1, 3, 5} is the event that an odd number appears.

• Probability of an event

¾ If an experiment is repeated for a number of times which is large enough, the fraction of time that event E occurs is the probability that event E occurs, P{E}.

¾ E.g., when rolling a fair die, P{1} = 1/6, and P{1, 3, 5} = 3/6 = 1/2. When tossing a fair coin, P{H} = P{T} = 1/2.

¾ In some cases, events are not repeated many times.

¾ For such cases, probabilities can be a measure of belief (subjective probability).

(2)

• Axioms of probability (1) For E ⊂ Ω, 0 ≤ P{E} ≤ 1;

(2) P{ Ω} = 1;

(3) For events E₁, E₂, …, E_i, …, with E_i ⊂ Ω, E_i ∩ E_j = ∅, for all i and j,

1 1

i { }i

i i

P ^∞ E ^∞ P E

=

⎧ ⎫=

⎨ ⎬

⎩

∪

⎭

∑

.

• Implications

¾ The axioms of probability imply the following results:

o For E and F ⊂ Ω,

P{E “or” F} = P{E ∪ F} = P{E} + P{F} − P{E ∩ F} ;¹ o If E and F are mutually exclusive (i.e., E ∩ F = ∅), then P{E ∪ F} = P{E} + P{F};

o For E ⊂ Ω, let E^c be the complement of E (i.e., E ∪ E^c = Ω), P{E^c} = 1 − P{E};

o P{∅} = 0.

• Conditional probability

¾ The probability that event E occurs given that event F has already occurred is

{ }

{ | }

{ } P E F P E F

P F

= ∩ .

(3)

• Independent events

¾ For E and F ⊂ Ω, P{E ∩ F} = P{E|F}P{F} .

¾ Two events are independent if an only if P{E ∩ F} = P{E}P{F}. That is, P{E|F} = P{E} .

• Example 1

¾ Suppose that two fair coins are tossed. What is the probability that either the first or the second coin falls heads?

¾ In this example, Ω = {(H, H), (H, T), (T, H), (T,T)}. Let E (F) be the event that the first (second) coin falls heads, E ={(H, H), (H, T)} and F = {(H, H), (T, H)}, and E ∩ F ={H, H}. The desired probability is

{ } { } { } { } 1/ 2 1/ 2 1/ 4 3 / 4 .

P E∪F = P E +P F −P E F∩ = + − =

• Example 2

¾ When rolling two fair dice, suppose the first die is 3, what is the probability the sum of the two dice is 7?

¾ Let E be the event that the sum of the two dice is 7, E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, and F be the event that the first die is 3, F= {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}. Then,

{ } {(3, 4)}

{ | }

{ } {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}

1/ 36 1 .

6 / 36 6

P E F P

P E F

P F P

= ∩ =

= =

• Finding Probability by Conditioning

¾ Suppose that we know the probability of event B once event A is realized (or not). We also know P{A}. That is, we know P{B|A}, and P{B|A^c} and P{A}. What is P{B}?

(4)

¾ Note that

B = (A ∩ B) ∪ (A^c ∩ B) ⇒ P{B} = P{A ∩ B} + P{A^c ∩ B}.

¾ Therefore,

P{B} = P{B|A}P{A} + P{B|A^c}P{A^c} = P{B|A}P{A} + P{B|A^c}(1 −P{A}) .

¾ Here we are finding P{B} by “conditioning” on A .

¾ In general, if the realization of B depends on a partition A_i of Ω, A1∪A2 ∪ ∪… A_n = Ω, A_i ∩A_j = ∅ ≠,i j,

1

{ } ⁿ { | } { }._i _i

i

P B P B A P A

=

∑

• Bayes’ Formula

¾ This follows from conditional probabilities. For two events,

{ } { | } { }

{ | } .

{ } { | } { } { | ^c} { }^c

P A B P B A P A

P A B

P B P B A P A P B A P A

= ∩ =

+

¾ With a partition Ai,

1

{ } { | } { }

{ | } .

{ } { | } { }

j j j

j n

i i

i

P A B P B A P A P A B

P B P B A P A

=

= ∩ =

∑

• Example 3

¾ Consider two urns. The first urn contains three white and seven black balls, and the second contains five white and five black balls. We flip a coin and then draw a ball from the first urn or the second urn

depending on whether the outcome was heads or tails.

¾ What is the probability that a white ball is selected?

(5)

P{W} = P{W|H}P{H} + P{W|T}P{T} = (3/10)(1/2) + (5/10)(1/2) = 2/5 .

¾ What is the probability that a black ball is selected?

P{B} = 1 − P{W} = 3/5 .

¾ What is the probability that the coin has landed heads given that a white ball is selected?

From Bayes’ formula, { | } { } (3/10)(1/ 2) 3 { | }

{ } 2 / 5 8

P W H P H P H W

= P W = = .

• Random Variables

¾ Consider a function that assigns real numbers to events (outcomes) in Ω. Such real-valued function is a random variable.

¾ E.g., when rolling two fair dice, define X as the sum of the two dice. Then, X is a random variable with P{X = 2} = P{(1,1)}=1/36, P{X = 3} = P{(1, 2), (2, 1)}=2/36=1/18, etc.

¾ E.g., the salvage value of a machine, S, is $1,500 if the market goes up (with probability 0.4) and $1,000 if the market goes down (with probability 0.6). Then, S is a

random variable with P{S = 1500} = 0.4 and P{S = 1000} = 0.6 .

¾ If the random variable can take on a limited number of values. Then, this is a discrete random variable. E.g., the random variable X representing the sum of two dice.

(6)

¾ If the random variable can take on an uncountable number of values. Then, this is a continuous random variable. E.g., the random variable H representing height of an AUB student.

¾ If X is a discrete random variable, the function fX(x) = P{X = x} is the probability mass function (pmf) of X .

¾ The function FX(x) = P{X ≤ x} = ^{( )}

i

X i

x x

f x

∑

≤ is the cumulative distribution function (cdf) of X.

¾ E.g., for the random variable S representing salvage value of a machine above,

0.6 if 1000 0 if 1000 ( ) 0.4 if 1500 , ( ) 0.6 if 1000 1500

0 othewise 1 if 1500

S S

s s

f s s F s s

s

= <

⎧ ⎧

⎪ ⎪

=⎨ = =⎨ ≤ <

⎪ ⎪ ≥

⎩ ⎩

.

¾ For a continuous random variable, X, the cdf is defined based on a function fX(x) called the density function, where

{ } ( ) ( )

x

X X

P X x F x f t dt

−∞

≤ = =

∫

^.

Fact. For a discrete random variable ^{( ) 1.}

i

X i

x

f x =

∑

For a

continuous random variable, ^{f x}^X^{( ) 1.}

∞

−∞

∫

=

• Independent Random variables

¾ Two random variables X and Y are said to be independent if

{ , } { } { } _X ( ) ( )_Y

P X ≤ x Y ≤ y = P X ≤ x P Y ≤ y = F x F y _.

(7)

• Expectation of a random variable

¾ The expectation of a discrete random variable X is

[ ] { } ( )

i i

i i i X i

x x

E X =

∑

x P X = x =

∑

x f x _.

¾ The expectation of a continuous random variable X is [ ] _X( )

E X xf x dx

∞

−∞

=

∫

^.

¾ The expectation of a random variable is the value obtained if the underlying experience is repeated for a number of times which is large enough and the resulting values are averaged.

¾ The expectation is “linear.” That is, for two random variables X and Y, E[aX + bY] = aE[X] + bE[Y] .

¾ The expectation of a function of random variable X, g(X), is [ ( )] ( ) _X( )

E g X g x f x dx

∞

−∞

=

∫

^.

¾ An important measure is the n^th moment of X, n =1, 2, … [ ⁿ] ⁿ _X( )

E X x f x dx

∞

−∞

=

∫

• Measures of variability

¾ The variance of a random variable X is

^{Var[ ]}^X ⁼ ^{E X E X}^[( ⁻ ^{[ ]) ]}² ⁼ ^{E X}^[ ²^]⁻

(

^{E X}^{[ ]}

)

².

¾ The standard deviation of a random variable X is Var[ ]

X X

σ = .

(8)

¾ The coefficient of variation of X is CV[X] = σ_X/E[X] .

¾ The variance (standard deviation) measures the spread of the random variable around the expectation.

¾ The coefficient of variation is useful when comparing variability of different alternatives.

¾ Note that Var[aX+b] =a² Var[X], for any real number a and random variable a .

• Joint distribution

¾ The joint distribution function of two random variables is

, ( , ) { , }.

FX Y x y =P X ≤x Y ≤ y

¾ If X and Y are discrete random variables then,

, ,

( , ) { , } ( , ) ,

X Y X Y

i x j y i x j y

F x y P X i Y j f i j

≤ ≤ ≤ ≤

=

∑

= = =

∑

where fX,Y (.) is the joint pmf of X and Y.

¾ If X and Y are continuous random variables then,

, ( , ) , ( , ) ,

x y

X Y X Y

F x y f x y dxdy

−∞ −∞

=

∫ ∫

where fX,Y (.) is the joint pdf of X and Y.

Fact. F_{X Y}, ( , )x y = F x F y_X ( ) _Y ( ) if and only if (iff) X and Y are independent.

(9)

• Covariance

¾ The covariance measures the dependence of two random variables. For two random variables X and Y,

Cov[ , ] [( [ ])( [ ])]

[ ] [ ] [ ] ,

X Y X Y E X E X Y E Y

E XY E X E Y

σ = = − −

= −

where,

[ ] _{X Y}, ( , ) ,

E XY xyf x y dxdy

∞ ∞

−∞ −∞

=

∫ ∫

¾ If σ > 0 (<0), X and Y are said to be positively (negatively) correlated.

¾ σ_xy = 0 iff X and Y are independent.

¾ Properties of covariance

Cov[ , ] Var[ ], Cov[ , ] Cov[ , ], Cov[ , ] Cov[ , ],

Cov[ , ] Cov[ , ] Cov[ , ], Cov[ , ] _XY _X _Y.

X X X

X Y Y X

aX Y a Y X

X Y Z X Y X Z

X Y σ σ σ

=

+ = +

= ≤

¾ The coefficient of correlation is defined as ^XY ^XY ^.

X Y

ρ σ

= σ σ

¾ Note that ρXY ≤¹

¾ Note that ^Var[X Y+ ] Var[ ] 2Cov[ , ] Var[ ]= X + X Y + Y .

¾ If X and Y are independent, Var[X+Y] = Var[X] + Var[Y].

(10)

• The Bernoulli Random Variable

¾ Suppose an experiment can result in success with probability p and failure with probability (w.p.) 1−p. We define a

Bernoulli random variable X as X =1 if the experiment outcome is a success and X = 0, otherwise.

¾ The pmf of X is

1 if 0

( ) { }

if 1

X

p x

f x P X x

p x

− =

= = = ⎨⎧⎩ = ^.

¾ The expected value of X is E[X] = 0(1−p) + 1(p) = p.

¾ The second moment of X is E[X²] = 0²(1−p) + 1²(p) = p.

¾ The variance of X is Var[X] = E[X²] −(E[X])² = p − p² = p(1−p).

• The Binomial Random Variable

¾ Consider n independent trials, each of which can results in a success w.p. p and failure w.p. 1−p .

¾ We define a Binomial random variable, X, as the number of successes in the n trials.

¾ The pmf of X is defined as

( ) { } ⁱ(1 )^{n i} , 0,1,

X

f i P X i n p p i n

i

⎛ ⎞ −

= = =⎜ ⎟ − =

⎝ ⎠ …

where ^{⎛ ⎞}⎜ ⎟⎝ ⎠ⁿ_i ⁼ ₍_{n i i}−ⁿ^!_{)! !}^.

(11)

0 1 2 3 4 5 0

0.1 0.2 0.3

fX i()

1 i

¾ Note that

1 0

( ) (1 ) 1, 0,1, .

n n

i n i

X

i i

f i n p p i n

i

−

= =

= ⎛ ⎞⎜ ⎟ − = =

∑ ∑

⎝ ⎠ ^…

Fact. Let Xi = 1, i =1, …, n, if the i^th trial results in success and Xi = 0, otherwise. Then

1 n

i i

X X

=

∑

.

¾ Note that Xi are independent and identically distributed (iid) Bernoulli random variable with parameter p.

¾ Therefore,

1 1

[ ] ⁿ [ _i] , [ ] ⁿ [ _i] (1 ).

i i

E X E X np Var X Var X np p

= =

=

∑

= =

∑

= −

• Example 4

¾ A fair coin is flipped 5 times.

¾ What is the probability that two heads are obtained?

¾ The number of heads, X, is a binomial random variable with

parameters n = 5 and p = 0.5. Then, the desired probability is P{X = 2} = [5!/(2!×3!)]×(0.5)²(0.5)³ = 0.313 .

(12)

• The Geometric Random Variable

¾ Suppose independent trials, each having a probability p of being a success, are performed.

¾ We define the geometric random variable (rv) X as the number of trials until the first success occurs.

¾ The pmf of X is defined as

( ) { } (1 )ⁱ 1 , 1, 2, f iX = P X = = −i p ⁻ p i = …

5 10 15 20

0 0.05 0.1 0.15

fX i()

i

¾ Note that f_X(i) defines a pmf since

1

1 1 0

( ) (1 )ⁱ (1 )ⁱ /[1 (1 )] 1.

X

i i i

f i p p p p p p

∞ ∞ ∞

−

= = =

= − = − = − − =

∑ ∑ ∑

¾ Let q = 1−p. The first two moments and variance of X are

1 1

2 2 1

2 1

2 2

2

[ ] 1 ,

[ ] 2 ,

Var[ ] [ ] ( [ ]) 1 .

i i

E X iq p p E X p i q p

p X E X E X p

p

∞ −

=

∞ −

=

= =

= = −

= − = −

∑

(13)

Example 5.

¾ When rolling a die repetitively, what is the probability that the first 6 appears on the sixth roll?

¾ Let X be the number of rolls until a 6 appears. Then, X is a geometric rv with parameter p = 1/6, and the desired probability is P{X = 6} = (5/6)⁵(1/6) = 0.0667 .

¾ What is the expected number of rolls until a 6 appears?

¾ E[X] = 1/p = 6.

• The Poisson Random Variable

¾ A rv, taking on values 0, 1, …, is said to be a Poisson random variable with parameter λ > 0 if

( ) { } , 0,1,

!

i

f iX P X i e i

i

λ λ

= = = − = …

0 2 4 6 8 10

0 0.05 0.1 0.15

fX i()

i

¾ f_X(i) defines a pmf since

0 0

( ) ( )( ) 1.

!

i X

i i

f i e e e

i

λ λ λ λ

∞ ∞

− −

= =

= = =

∑ ∑

¾ The Poisson rv is a good model for demand, arrivals, and certain rare events.

(14)

¾ The first two moments and the variance of X are

² ²

2 2

[ ] ,

[ ] [ ] ( [ ]) .

E X E X

Var X E X E X λ

λ λ

λ

=

= +

= − =

¾ Let X1 and X2 be two independent Poisson rv’s with means λ1

and λ2. Then, Z = X1 + X2 is a Poisson rv with mean λ₁ + λ₂ .

Example 6.

¾ The monthly demand for a certain airplane spare part of Fly High Airlines (FHA) fleet at Beirut airport is estimated to be a Poisson random variable with mean 0.5. Suppose that FHA will stock one spare part at the beginning of March. Once the part is used, a new part is ordered. The delivery lead time for a part is 2 months.

¾ What is the probability that the spare part will be used during March?

¾ Let X be the demand for the spare part. The desired probability is P{X ≥ 1} = 1− e^−λ = 1 − e^−0.5 = 0.393 .

¾ What is the probability that FHA will face a shortage on this part in March?

¾ The desired probability is P{X > 1} = 1 − P{X = 0} − P{X = 1}

= 1 − e^−0.5 − 0.5e^−0.5 = 0.09 .

• The Uniform Random Variable

¾ A rv X that is equally like to be “near” any point of an interval (a, b) is said to have a uniform distribution.

(15)

¾ The pdf of X is

1 , if ( )

0 , otherwise

X

a x b

f x b a

⎧ < <

= ⎪⎨ −

⎪⎩

¾ Note that f_X(x) defines a pdf since ^{( )} ¹ ^1.

b b

X

a a

f x x dx

= b a =

∫ ∫

−

0 1 2 3 4 5 6

0 0.1 0.2 0.3

fX x( )

x

¾ The cdf of X is

0, if

( ) ( ) , if

1, otherwise

x

X X

x a F x f t dt x a a x b

−∞ b a

⎧ <

⎪ −⎪

= =⎨ −⎪ ≤ ≤

⎪⎩

∫

¾ The first two moments of X are

2 2

2 3 3 2 2

2 2

[ ] ( ) ,

2( ) 2

[ ] ( ) .

3( ) 3

b b

X

a a

b b

X

a a

x b a b a

E X xf x dx dx

b a b a

x b a a ab b

E X x f x dx dx

b a b a

− +

= = = =

− −

− + +

= = = =

− −

∫ ∫

¾ The variance of X is E[X²] − (E[X])² = (b − a)² / 12 .

(16)

• The Exponential Random Variable

¾ An exponential rv with parameter λ is a rv whose pdf is , if 0

( ) 0, othewise

x X

e x

f x

λ ⁻λ

⎧ ≥

= ⎨⎩

0 5 10 15 20

0 0.05 0.1 0.15

fX x( )

x

¾ Note that f_X(x) defines a pdf since

0 0 0

( ) ^x ^x 1.

f x xX λe dx^λ e ^λ

∞ ∞

− − ∞

= = − =

∫ ∫

¾ The exponential rv is a good model for time between arrivals or time to failure of certain equipments.

¾ The cdf of X is

0 0 0

( ) ( ) 1 , 0 .

x x

t x x x

X X

F x =

∫

f t dt =

∫

λe dt⁻^λ = −e⁻^λ = −e⁻^λ x ≥

¾ A useful property of the exponential distribution is that P{X > x} = e^−λx .

(17)

17

¾ The first two moments and the variance of X are

2

2 2

2

[ ] 1 ,

[ ] 2 ,

[ ] [ ] ( [ ]) 1 .

E X E X

Var X E X E X

λ λ

λ

=

= − =

Preposition. The exponential distribution has the memoryless property. I.e.,^{P X}^{ ^{> +}^{t u X}^| ^{> =}^t^} ^{P X}^{ ^>^u^}.

Proof.

( )

{ , } { }

{ | }

{ } { }

{ } .

t u

u t

P X t u X t P X t u P X t u X t

P X t P X t

e e P X u

e

λ λ

λ

− +

−

> + > > +

> + > = =

> >

= = = >

¾ The memoryless property allows developing tractable

analytical models with the exponential distribution. It makes the exponential distribution very popular in modeling.

Preposition. Let X1 and X2 be two independent exponential

random variables with parameters λ1and λ2 .Let X = min(X1, X2).

Then, X is an exponential random variable with parameter λ1+ λ2. Proof. P{X > x} = P{X1 > x, X2 > x} = P{X1 > x}P{X2 > x}

= e⁻^λ¹^xe⁻^λ²^x = e^{− +}⁽^{λ λ}²⁾^x.

A Poisson process N(t) with rate λ is a stochastic process defined over time, t, such that

(i) The number of events (typically arrivals) over a time t, N(t), is a Poisson random variable with mean λt,

(18)

Example 7.

¾ The amount of time one spends in the bank is exponentially distributed with mean 10 minutes.

¾ A customer arrives at 1:00 PM. What is the probability that the customer will be in the bank at 1:15 PM?

¾ Let X be the time the customers spends in the bank. Then, X is exponentially distributed with parameter λ = 1/10. The desired probability is P{X > 15} = e^−15λ = e^−15/10 = 0.223.

¾ It is now 1:20 PM and the customer is still in the bank? What is the probability that the customer will be in the bank at 1:35 PM?

¾ 0.223 (by the memoryless property).

• The Normal Random Variable

¾ We say that a random variable X is a normal rv with parameters μ and σ > 0 if it has the following pdf:

2 2

( ) /(2 )

( ) , ( , ) .

2

x X

f x e x

μ σ

πσ

= − − ∈ −∞ ∞

5 0 5 10 15

0 0.05 0.1 0.15

fX x( )

x

(19)

¾ Note that fX(x) defines a pdf. With a change of variable z = (x − μ)/σ and using the fact that ^∞ ^e⁻^z²^{/ 2}^dz ^{2 ,}π

−∞

∫

=

2 2

( ) /(2 ) 2

1 / 2

( ) 1.

2 2

x z

X

f x dx e dx e dz

μ σ

πσ π

∞ ∞ − − ∞

−

−∞ −∞ −∞

= = =

∫ ∫ ∫

¾ The normal rv is a good model for quantities that can be seen as sums or averages of a large number of rv’s.

¾ The cdf of X, ^{( )} ^{( ) ,}

x

X X

F x f t dt

−∞

=

∫

has no closed-form.

¾ The first two moments of and variance of X are

2 2 2

2

[ ] ,

[ ] .

E X E X Var X

μ

σ μ

σ

=

= +

=

Fact. If X is a normal rv, then Z = (X − μ)/σ is a “standard normal r.v.” with parameters 0 and 1.

Proof. Note that

2 2

( ) /(2 )

{ } { } { } .

2

z t

X e

P Z z P z P X z dt

μ σ μ σ

μ μ σ

σ πσ

+ − −

−∞

< = − < = < + =

∫

Let u = (t − μ)/σ, then

2/ 2

{ } ,

2

z e u

P Z z du

π

−

−∞

< =

∫

which is the cdf of the standard normal.

¾ This fact implies that X = μ + σ Z .

(20)

¾ The cdf of X, FX(x), is evaluates through the cdf of Z, FZ(z), which is often tabulated,

{ } { x } _X( ) _Z x

P X x P Z μ F x F μ

σ σ

− ⎛ − ⎞

< = < ⇒ = ⎜⎝ ⎟⎠ .

Proposition If X1 and X2 are two independent normal rvs with means μi and variances σi2 , i =1,2, then Z = X1 + X2 is normal with mean μ1 + μ2 and variance σ12 + σ22 .

Theorem (central limit theorem). If Xi, i =1, 2,…, n, are iid rv’s with mean μ and variance σ². Then, for n large

enough,

1 n

i i

X

∑

= is normally distributed with mean nμ and variance nσ².

Example 8.

¾ The height of an AUB male student is a normal rv with mean 170 cm and standard deviation 8 cm.

¾ What is the probability that the height of an AUB student is less than 180 cm?

¾ Let X be the height of the student. Then, the desired probability is P{X < 180} = P{Z < (180 −170)/8} = P{Z < 1.25} = 0.894.