Lecture9

(1)

Lecture 9

Conditional PMF and PDF. Basic probability laws of

continuous random variables.

Plan of the lecture:

1. Conditioning

1.1 Conditioning a Random Variable on an Event (discrete RV)

1.2 Conditioning one Random Variable on Another

1.3 Conditioning on a event (continuous RV)

2. Basic probability laws of continuous random variables

2.1Unifrom Random Variable

2.1.1 Continuous Uniform Random Variable

2.1.2 Discrete Uniform Random Variable

2.2Exponential Random Variable

2.2.1 Exponential PDF and CDF

2.2.2 Memoryless property

2.2.3 Occurrence and applications

2.3Normal random variables

(2)

1 Conditionig

If we have a probabilistic model and we are also told that a certain event 𝐴 has occurred,

we can capture this knowledge by employing the conditional instead of the original

(unconditional) probabilities. As discussed earlier, conditional probabilities are like ordinary

probabilities (satisfy the three axioms) except that they refer to a new universe in which event A

is known to have occurred. In the same spirit, we can talk about conditional PMFs which provide

the probabilities of the possible values of a random variable, conditioned on the occurrence of

some event.

1.1 Conditioning a Random Variable on an Event (discrete RV)

The conditional PMF of a random variable 𝑋, conditioned on a particular event 𝐴 with

𝑃(𝐴) > 0, is defined by

𝑝_𝑋|𝐴(𝑥) = 𝑃(𝑋 = 𝑥|𝐴) = 𝑃 {𝑋=𝑥}∩𝐴 _𝑃(𝐴) .

Note that the events {𝑋 = 𝑥} ∩ 𝐴 are disjoint for different values of 𝑥, their union is 𝐴,

and, therefore,

𝑃(𝐴) = 𝑃 {𝑋 = 𝑥} ∩ 𝐴 _𝑥 .

Combining the above two formulas, we see that

𝑝_𝑥 _𝑋|𝐴(𝑥)= 1,

so 𝑝_𝑋|𝐴is a legitimate PMF.

As an example, let 𝑋 be the roll of a die and let 𝐴 be the event that the roll is an even number. Then, by applying the preceding formula, we obtain

𝑝𝑋|𝐴 𝑥 = 𝑃 𝑋 = 𝑥 𝑟𝑜𝑙𝑙 𝑖𝑠 𝑒𝑣𝑒𝑛) =

𝑃 𝑋 = 𝑥 𝑎𝑛𝑑 𝑋 𝑖𝑠 𝑒𝑣𝑒𝑛

(3)

The conditional PMF is calculated similar to its unconditional counterpart: to obtain

𝑝𝑋|𝐴 𝑥 , we add the probabilities of the outcomes that give rise to 𝑋 = 𝑥 and belong to the

conditioning event 𝐴, and then normalize by dividing with 𝑃(𝐴) (see Fig. 1).

Figure 1: Visualization and calculation of the conditional PMF 𝑝𝑋|𝐴 𝑥 . For each 𝑥, we add the

probabilities of the outcomes in the intersection {𝑋 = 𝑥} ∩ 𝐴and normalize by diving with

𝑃(𝐴).

1.2 Conditioning one Random Variable on Another

Let 𝑋 and 𝑌 be two random variables associated with the same experiment. If we know

that the experimental value of 𝑌 is some particular 𝑦 (with 𝑝𝑌(𝑦) > 0), this provides partial knowledge about the value of 𝑋. This knowledge is captured by the conditional PMF 𝑝_𝑋|𝑌 of 𝑋

given 𝑌, which is defined by specializing the definition of 𝑝𝑋|𝐴 to events 𝐴of the form {𝑌 = 𝑦}:

𝑝𝑋|𝑌(𝑥|𝑦) = 𝑃(𝑋 = 𝑥|𝑌 = 𝑦).

Using the definition of conditional probabilities, we have

𝑝𝑋|𝑌 𝑥 𝑦 =𝑃 𝑋=𝑥,𝑌=𝑦 _{𝑃 𝑌=𝑦} =𝑝𝑋 ,𝑌_𝑝 (𝑥,𝑦)

𝑌(𝑦) .

Let us fix some 𝑦, with 𝑝_𝑌(𝑦) > 0 and consider 𝑝_𝑋|𝑌 𝑥 𝑦 as a function of 𝑥. This

function is a valid PMF for 𝑋: it assigns nonnegative values to each possible 𝑥, and these values

add to 1. Furthermore, this function of 𝑥, has the same shape as 𝑝_𝑋,𝑌(𝑥, 𝑦) except that it is

normalized by dividing with 𝑝_𝑌(𝑦), which enforces the normalization property

(4)

Figure 2 provides a visualization of the conditional PMF.

Figure 2: Visualization of the conditional PMF 𝑝_𝑋|𝑌 𝑥 𝑦 . For each 𝑦, we view the joint PMF

along the slice 𝑌 = 𝑦and renormalize so that 𝑝_𝑥 _𝑋|𝑌 𝑥 𝑦 = 1.

The conditional PMF is often convenient for the calculation of the joint PMF, using a

sequential approach and the formula

𝑝𝑋,𝑌 𝑥, 𝑦 = 𝑝𝑌(𝑦)𝑝𝑋|𝑌 𝑥 𝑦 ,

or its counterpart

𝑝𝑋,𝑌 𝑥, 𝑦 = 𝑝𝑋(𝑥)𝑝𝑌|𝑋 𝑦 𝑥 .

This method is entirely similar to the use of the multiplication rule.

Conditional Expectation

A conditional PMF can be thought of as an ordinary PMF over a new universe

determined by the conditioning event. In the same spirit, a conditional expectation is the same as

an ordinary expectation, except that it refers to the new universe, and all probabilities and PMFs

are replaced by their conditional counterparts. We list the main definitions and relevant facts

(5)

Summary of Facts About Conditional Expectations

Let 𝑋and 𝑌be random variables associated with the same experiment.

 The conditional expectation of 𝑋given an event 𝐴with 𝑃(𝐴) > 0, is defined by

𝐸[𝑋|𝐴] = 𝑥𝑝_𝑥 _𝑋|𝐴(𝑥|𝐴).

For a function 𝑔(𝑋), it is given by

𝐸 𝑔(𝑋)|𝐴 = 𝑔(𝑥)𝑝𝑥 𝑋|𝐴(𝑥|𝐴).

 The conditional expectation of 𝑋given a value 𝑦of 𝑌is defined by

𝐸[𝑋|𝑌 = 𝑦] = 𝑥𝑝𝑥 𝑋|𝑌(𝑥|𝑦).

 We have

𝐸[𝑋] = 𝑝_𝑦 _𝑌(𝑦)𝐸[𝑋|𝑌 = 𝑦].

This is the total expectation theorem.

 Let 𝐴₁, 𝐴₂, . . . , 𝐴_𝑛 be disjoint events that form a partition of the sample space, and

assume that 𝑃(𝐴_𝑖) > 0 for all 𝑖. Then,

𝐸[𝑋] = 𝑛 𝑃(𝐴_𝑖)𝐸[𝑋|𝐴_𝑖]

𝑖=1 .

1.3 Conditioning on a event (continuous RV)

The conditional PDF of a continuous random variable 𝑋, conditioned on a particular

event 𝐴with 𝑃(𝐴) > 0, is a function 𝑓_𝑋|𝐴 that satisfies

𝑃(𝑋 ∈ 𝐵|𝐴) = 𝑓_𝐵 𝑋|𝐴(𝑥)𝑑𝑥,

for any subset 𝐵of the real line. It is the same as an ordinary PDF, except that it now refers to a

(6)

An important special case arises when we condition on 𝑋 belonging to a subset 𝐴 of the

real line, with 𝑃(𝑋 ∈ 𝐴) > 0. We then have

𝑃(𝑋 ∈ 𝐵|𝑋 ∈ 𝐴) = 𝑃(𝑋∈𝐵 𝑎𝑛𝑑 𝑋∈𝐴)_{𝑃(𝑋∈𝐴)} = 𝐴∩𝐵𝑓𝑋(𝑥)𝑑𝑥

𝑃(𝑋∈𝐴) .

This formula must agree with the earlier one, and therefore,

𝑓_𝑋|𝐴(𝑥|𝐴) =

𝑓𝑋(𝑥)

𝑃(𝑋 ∈ 𝐴) 𝑖𝑓 𝑥 ∈ 𝐴, 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.

As in the discrete case, the conditional PDF is zero outside the conditioning set. Within

the conditioning set, the conditional PDF has exactly the same shape as the unconditional one,

except that it is scaled by the constant factor 1/𝑃(𝑋 ∈ 𝐴). This normalization ensures that 𝑓_𝑋|𝐴

integrates to 1, which makes it a legitimate PDF; see Fig. 3.

Figure 3: The unconditional PDF 𝑓_𝑋 and the conditional PDF 𝑓_𝑋|𝐴, where A is the interval [𝑎, 𝑏].

Note that within the conditioning event 𝐴, 𝑓_𝑋|𝐴 retains the same shape as 𝑓_𝑋, except that it is

scaled along the vertical axis.

Conditional PDF and Expectation Given an Event

 The corresponding conditional expectation is defined by

𝐸[𝑋|𝐴] = 𝑥𝑓_−∞∞ 𝑋|𝐴(𝑥) 𝑑𝑥.

 The expected value rule remains valid:

(7)

 If 𝐴1, 𝐴2, . . . , 𝐴𝑛 are disjoint events with 𝑃(𝐴𝑖) > 0 for each 𝑖, that form a partition

of the sample space, then

𝑓_𝑋(𝑥) = 𝑃(𝐴_𝑖)𝑓_𝑋|𝐴_𝑖(𝑥)

𝑛

𝑖=1

(a version of the total probability theorem), and

𝐸[𝑋] = 𝑃(𝐴𝑖)𝐸[𝑋|𝐴𝑖] 𝑛

𝑖=1

(the total expectation theorem). Similarly,

𝐸 𝑔(𝑋) = 𝑛𝑖=1𝑃(𝐴𝑖)𝐸 𝑔(𝑋)|𝐴𝑖 .

2 Basic probability laws of continuous random variables

2.1 Unifrom Random Variable

2.1.1 Continuous Uniform Random Variable

A gambler spins a wheel of fortune, continuously calibrated between 0 and 1, and

observes the resulting number. Assuming that all subintervals of [0,1] of the same length are equally likely, this experiment can be modeled in terms a random variable 𝑋 with PDF

𝑓𝑋(𝑥) = 𝑐 𝑖𝑓 0 ≤ 𝑥 ≤ 1,_{0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒,}

for some constant 𝑐. This constant can be determined by using the normalization property

1 = 𝑓∞ _𝑋(𝑥)𝑑𝑥

−∞

= 𝑐𝑑𝑥1

0

= 𝑐 𝑑𝑥1

0

(8)

so that 𝑐 = 1.

More generally, we can consider a random variable 𝑋 that takes values in an interval

[𝑎, 𝑏], and again assume that all subintervals of the same length are equally likely. We refer to

this type of random variable as uniform or uniformly distributed. Its PDF has the form

𝑓_𝑋(𝑥) = 𝑐 𝑖𝑓 𝑎 ≤ 𝑥 ≤ 𝑏, 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒,

where 𝑐 is a constant. This is the continuous analog of the discrete uniform random variable. For

𝑓𝑋 to satisfy the normalization property, we must have (cf. Fig. 2)

1 = 𝑐𝑑𝑥𝑏

𝑎

= 𝑐 𝑑𝑥𝑏

𝑎

= 𝑐 𝑏 − 𝑎

so that 𝑐 =_𝑏−𝑎1 .

Note that the probability 𝑃(𝑋 ∈ 𝐼) that 𝑋 takes value in a set 𝐼 is

𝑃 𝑋 ∈ 𝐼 = _{[𝑎,𝑏]∩𝐼}_𝑏−𝑎1 𝑑𝑥= _𝑏−𝑎1 _{[𝑎,𝑏]∩𝐼}𝑑𝑥 =𝑙𝑒𝑛𝑔𝑡 𝑕 𝑜𝑓 [𝑎,𝑏]∩𝐼 _{𝑙𝑒𝑛𝑔𝑡 𝑕 𝑜𝑓 [𝑎,𝑏]} .

The uniform random variable bears a relation to the discrete uniform law, which involves

a sample space with a finite number of equally likely outcomes. The difference is that to obtain

the probability of various events, we must now calculate the “length” of various subsets of the

real line instead of counting the number of outcomes contained in various events.

The CDF is

𝐹_𝑋 𝑥 =

0 𝑓𝑜𝑟 𝑥 < 𝑎, 𝑥 − 𝑎

(9)

a) b)

Figure 4: Uniform Probability density function (a) and Cumulative distribution function (b)

Mean and Variance of the Uniform Random Variable

Consider the case of a uniform PDF over an interval [𝑎, 𝑏]. We have

𝐸 𝑋 = 𝑥𝑓_−∞∞ 𝑋 𝑥 𝑑𝑥= 𝑥 ∙_𝑎𝑏 _𝑏−𝑎1 𝑑𝑥= _𝑏−𝑎1 ∙1₂𝑥2 𝑏_𝑎 =_𝑏−𝑎1 ∙𝑏

2_−𝑎2

2 =

𝑎+𝑏 2 ,

as one expects based on the symmetry of the PDF around (𝑎 + 𝑏)/2.

To obtain the variance, we first calculate the second moment. We have

𝐸 𝑋2₌ 𝑥2 𝑏−𝑎𝑑𝑥 𝑏

𝑎 =

1 𝑏−𝑎 𝑥

2_𝑑𝑥 𝑏 𝑎 = 1 𝑏−𝑎∙ 1 3𝑥

3_𝑏

𝑎 =

𝑏3_−𝑎3

3(𝑏−𝑎)=

𝑎2_{+𝑎𝑏 +𝑏}2

3 .

Thus, the variance is obtained as

var 𝑋 = 𝐸 𝑋2_{− 𝐸 𝑋}2 ₌ 𝑎2+𝑎𝑏 +𝑏2

3 −

𝑎+𝑏 2

4 =

𝑏−𝑎 2

12 ,

after some calculation.

2.1.2 Discrete Uniform Random Variable

What is the mean and variance of the roll of a fair six-sided die? If we view the result of

(10)

𝑝_𝑋(𝑘) = 1/6 𝑖𝑓 𝑘 = 1, 2, 3, 4, 5, 6, 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.

Since the PMF is symmetric around 3.5, we conclude that 𝐸[𝑋] = 3.5. Regarding the variance, we have

var 𝑋 = 𝐸 𝑋2_{− 𝐸 𝑋}2 ₌1 6(1

2_{+ 2}2 _{+ 3}2_{+ 4}2 _{+ 5}2_{+ 6}2_{) − (3.5)}2_,

which yields var(𝑋) = 35/12.

The above random variable is a special case of a discrete uniformly distributed random

variable (or discrete uniform for short), which by definition, takes one out of a range of

contiguous integer values, with equal probability. More precisely, this random variable has a

PMF of the form

𝑝𝑋(𝑘) = 1

𝑏−𝑎+1 𝑖𝑓 𝑘 = 𝑎, 𝑎 + 1, … , 𝑏,

0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒, ,

where 𝑎and 𝑏are two integers with 𝑎 < 𝑏. The mean is

𝐸[𝑋] =𝑎+𝑏₂ ,

as can be seen by inspection, since the PMF is symmetric around (𝑎 + 𝑏)/2. To calculate the

variance of 𝑋, we first consider the simpler case where 𝑎 = 1 and 𝑏 = 𝑛. It can be verified by induction on 𝑛that

𝐸[𝑋2_{] =} 1

𝑛 𝑘

2 𝑛

𝑘=1 =1₆(𝑛 + 1)(2𝑛 + 1).

We leave the verification of this as an exercise for the reader. The variance can now be

obtained in terms of the first and second moments

var(𝑋) = 𝐸[𝑋2_{] − 𝐸[𝑋]}2 ₌1

6(𝑛 + 1)(2𝑛 + 1) − 1

4(𝑛 + 1) 2 ₌ 1

12(𝑛 + 1)(4𝑛 + 2 −

(11)

Figure 5: PMF of the discrete random variable that is uniformly distributed between two

integers 𝑎and 𝑏. Its mean and variance are

𝐸[𝑋] =𝑎+𝑏₂ , var[𝑋] = 𝑏−𝑎 (𝑏−𝑎+2)₁₂ .

For the case of general integers 𝑎 and 𝑏, we note that the uniformly distributed random variable over [𝑎, 𝑏] has the same variance as the uniformly distributed random variable over the

interval [1, 𝑏 − 𝑎 + 1], since these two random variables differ by the constant 𝑎 − 1. Therefore,

the desired variance is given by the above formula with 𝑛 = 𝑏 − 𝑎 + 1, which yields

var 𝑋 = 𝑏−𝑎+1 ₁₂ 2−1 = 𝑏−𝑎 (𝑏−𝑎+2)₁₂ .

a) b)

Figure 6: Discrete Uniform PDF (a) and CDF (b)

2.2 Exponential Random Variable

2.2.1 Exponential PDF and CDF

(12)

𝑓𝑋 𝑥 = 𝜆𝑒

−𝜆𝑥_{𝑖𝑓 𝑥 ≥ 0,}

0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒,

where 𝜆 is a positive parameter characterizing the PDF (see Fig. 4). This is a legitimate PDF because

𝑓_−∞∞ 𝑋(𝑥)𝑑𝑥 = 𝜆𝑒₀∞ −𝜆𝑥𝑑𝑥= −𝑒−𝜆𝑥|0∞ = 1.

Note that the probability that 𝑋 exceeds a certain value falls exponentially. Indeed, for any 𝑎 ≥ 0, we have

𝑃(𝑋 ≥ 𝑎) = 𝜆𝑒∞ −𝜆𝑥_𝑑𝑥

𝑎 = −𝑒−𝜆𝑥|𝑎∞ = 𝑒−𝜆𝑎.

An exponential random variable can be a very good model for the amount of time until a

piece of equipment breaks down, until a light bulb burns out, or until an accident occurs. It will

play a major role in study of random processes.

Figure 7: The PDF λe−λx of an exponential random variable.

The mean and the variance can be calculated to be

𝐸[𝑋] =1_𝜆, var(𝑋) =_𝜆1₂.

These formulas can be verified by straightforward calculation. We have, using integration

by parts,

𝐸[𝑋] = 𝑥𝜆𝑒∞ −𝜆𝑥_𝑑𝑥

0 = (−𝑥𝑒−𝜆𝑥)|0∞ + 𝑒−𝜆𝑥𝑑𝑥

∞

0 = 0 −

𝑒−𝜆𝑥 𝜆 |0

∞ ₌1 𝜆.

(13)

𝐸[𝑋2_{] = 𝑥}∞ 2_𝜆𝑒−𝜆𝑥_𝑑𝑥

0 = (−𝑥2𝑒−𝜆𝑥)|0∞+ 2𝑥𝑒−𝜆𝑥𝑑𝑥

∞

0 = 0 +

2

𝜆𝐸[𝑋] = 2 𝜆2.

Finally, using the formula var(𝑋) = 𝐸[𝑋2] − 𝐸[𝑋] 2, we obtain

var 𝑋 = _𝜆2₂−_𝜆1₂ = _𝜆1₂.

The cumulative distribution function is given by:

𝐹𝑋 𝑥 = 1 − 𝑒

−𝜆𝑥_{𝑖𝑓 𝑥 ≥ 0,}

0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.

Other moments:

Median: ln ⁡(2) 𝜆 .

Mode: 0. Skewness: 2.

Kurtosis: 6.

2.2.2 Memoryless property

An important property of the exponential distribution is that it is memoryless. This

means that if a random variable 𝑇 is exponentially distributed, its conditional probability obeys:

𝑃 𝑇 > 𝑠 + 𝑡 | 𝑇 > 𝑠 = 𝑃 𝑇 > 𝑡 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑠, 𝑡 ≥ 0.

This says that the conditional probability that we need to wait, for example, more than

another 10 seconds before the first arrival, given that the first arrival has not yet happened after

30 seconds, is no different from the initial probability that we need to wait more than 10 seconds

for the first arrival. This is often misunderstood by students taking courses on probability: the

fact tha 𝑃(𝑇 > 40 | 𝑇 > 30) = 𝑃(𝑇 > 10)t does not mean that the events 𝑇 > 40 and 𝑇 > 30

are independent. To summarize: "memorylessness" of the probability distribution of the waiting

time 𝑇 until the first arrival means

(14)

It does NOT MEAN

𝑊𝑅𝑂𝑁𝐺 𝑃(𝑇 > 40 | 𝑇 > 30) = 𝑃(𝑇 > 40).

(That would be independence. These two events are NOT independent.)

The exponential distributions and the geometric distributions are the only memoryless

probability distributions.

a) b)

Figure 8: Exponential Probability density function (a) and Cumulative distribution function (b)

2.2.3 Occurrence and applications

The exponential distribution occurs naturally when describing the lengths of the

inter-arrival times in a homogeneous Poisson process.

The exponential distribution may be viewed as a continuous counterpart of the geometric

distribution, which describes the number of Bernoulli trials necessary for a discrete process to

change state. In contrast, the exponential distribution describes the time for a continuous process

to change state.

In real-world scenarios, the assumption of a constant rate (or probability per unit time) is

rarely satisfied. For example, the rate of incoming phone calls differs according to the time of

day. But if we focus on a time interval during which the rate is roughly constant, such as from 2

to 4 p.m. during work days, the exponential distribution can be used as a good approximate

(15)

In queuing theory, the service times of agents in a system (e.g. how long it takes for a

bank teller etc. to serve a customer) are often modeled as exponentially distributed variables.

Reliability theory and reliability engineering also make extensive use of the exponential

distribution. Because of the memoryless property of this distribution, it is well-suited to model

the constant hazard rate portion of the bathtub curve used in reliability theory. It is also very

convenient because it is so easy to add failure rates in a reliability model. The exponential

distribution is however not appropriate to model the overall lifetime of organisms or technical

devices, because the "failure rates" here are not constant: more failures occur for very “young”

and for very “old” systems.

2.3 Normal random variables

A continuous random variable 𝑋is said to be normal or Gaussian if it has a PDF of the form (see Fig. 9)

𝑓_𝑋(𝑥) = 1

2𝜋𝜎𝑒

−(𝑥−𝜇)2/2𝜎2_,

where 𝜇and 𝜎are two scalar parameters characterizing the PDF, with 𝜎assumed nonnegative. It can be verified that the normalization property

1

2𝜋𝜎 𝑒

−(𝑥−𝜇)2/2𝜎2_𝑑𝑥 ∞

−∞

= 1

holds.

Figure 9: A normal PDF and CDF, with 𝜇 = 1 and 𝜎2 = 1. We observe that the PDF is

symmetric around its mean 𝜇, and has a characteristic bell-shape. As 𝑥gets further from 𝜇, the

(16)

The mean and the variance can be calculated to be

𝐸[𝑋] = 𝜇, var(𝑋) = 𝜎2.

To see this, note that the PDF is symmetric around 𝜇, so its mean must be 𝜇.

Furthermore, the variance is given by

var(𝑋) =_2𝜋𝜎1 (𝑥 − 𝜇)2𝑒−(𝑥−𝜇)2_/2𝜎2

𝑑𝑥

∞

−∞ .

Using the change of variables 𝑦 = (𝑥 − 𝜇)/𝜎and integration by parts, we have

var 𝑋 = 𝜎2

2𝜋 𝑦

2_𝑒−𝑦 2₂_𝑑𝑦 ∞

−∞ = var 𝑋 =

𝜎2

2𝜋 −𝑦𝑒

−𝑦 2₂ ∞

−∞ +

𝜎2

2𝜋 𝑒

−𝑦 2₂_𝑑𝑦 ∞

−∞ =

𝜎2

2𝜋 𝑒

−𝑦 2₂_𝑑𝑦 ∞

−∞ = 𝜎2.

The last equality above is obtained by using the fact

1

2𝜋 𝑒

−𝑦 2₂_𝑑𝑦 ∞

−∞ = 1,

which is just the normalization property of the normal PDF for the case where 𝜇 = 0 and 𝜎 = 1. The normal random variable has several special properties.

Normality is Preserved by Linear Transformations

If 𝑋 is a normal random variable with mean 𝜇 and variance 𝜎2, and if 𝑎, 𝑏 are scalars, then the random variable

𝑌 = 𝑎𝑋 + 𝑏

is also normal, with mean and variance

𝐸[𝑌] = 𝑎𝜇 + 𝑏, var(𝑌) = 𝑎2𝜎2.

(17)

A normal random variable 𝑌 with zero mean (𝜇 = 0) and unit variance (𝜎2 = 1) is said

to be a standard normal. Its CDF is denoted by Φ,

Φ(𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑌 < 𝑦) = 1

2𝜋 𝑒

−𝑡2₂_𝑑𝑡 𝑦

−∞ .

It is recorded in a table, and is a very useful tool for calculating various probabilities

involving normal random variables; see also Fig. 10.

Note that the table only provides the values of Φ(𝑦) for 𝑦 ≥ 0, because the omitted values can be found using the symmetry of the PDF. For example, if 𝑌 is a standard normal

random variable, we have

Φ(−0.5) = 𝑃(𝑌 ≤ −0.5) = 𝑃(𝑌 ≥ 0.5) = 1 − 𝑃(𝑌 < 0.5) = 1 − Φ(0.5) = 1 − 0.6915 = 0.3085.

Let 𝑋be a normal random variable with mean 𝜇and variance 𝜎2. We “standardize” 𝑋by defining a new random variable 𝑌given by

𝑌 =𝑋−𝜇_𝜎 .

Since 𝑌is a linear transformation of 𝑋, it is normal. Furthermore,

𝐸[𝑌] =𝐸[𝑋]−𝜇_𝜎 = 0, var(𝑌) =var (𝑋)_𝜎₂ = 1.

Thus, 𝑌 is a standard normal random variable. This fact allows us to calculate the

(18)

Figure 10: The PDF 𝑓_𝑌(𝑦) = 1 2𝜋𝑒

−𝑦2_/2

of the standard normal random variable. Its

corresponding CDF, which is denoted by Φ(𝑦), is recorded in a table.

Generalizing the approach, we have the following procedure.

CDF Calculation of the Normal Random Variable

The CDF of a normal random variable 𝑋 with mean 𝜇 and variance 𝜎2 is obtained using the standard normal table as

𝑃(𝑋 ≤ 𝑥) = 𝑃 𝑋−𝜇_𝜎 ≤𝑥−𝜇_𝜎 = 𝑃 𝑌 ≤ 𝑥−𝜇_𝜎 = Φ 𝑥−𝜇_𝜎 ,

where 𝑌is a standard normal random variable.

The normal random variable is often used in signal processing and communications

engineering to model noise and unpredictable distortions of signals.

The normal random variable plays an important role in a broad range of probabilistic

models. The main reason is that, generally speaking, it models well the additive effect of many

independent factors, in a variety of engineering, physical, and statistical contexts.

Mathematically, the key fact is that the sum of a large number of independent and identically

distributed (not necessarily normal) random variables has an approximately normal CDF,

regardless of the CDF of the individual random variables. This property is captured in the

celebrated central limit theorem.

The standard normal CDF can be expressed in terms of a special function called the error

function, as

Φ(𝑦) =1₂ 1 + erf 𝑦

2 .

Probability Content from

−∞

to

𝒁

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

(19)

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441

1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545

1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633

1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706

1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817

2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857

2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890

2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916

2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952

2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964

2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974

2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981

2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Examples

A professor's exam scores are approximately distributed normally with mean 80 and

standard deviation 5.

What is the probability that a student scores an 82 or less?

𝑃(𝑋 ≤ 82) = 𝑃(𝑍 ≤ (82 − 80)/5) = 𝑃(𝑍 ≤ 0.40) = 0.6554.

(20)

𝑃(𝑋 ≥ 90) = 𝑃(𝑍 ≥ (90 − 80)/5) = 𝑃(𝑍 ≥ 2.00) = 1 − 𝑃(𝑍 ≤ 2.00) = 1 − 0.9772 = 0.0228.

What is the probability that a student scores a 74 or less?

𝑃(𝑋 ≤ 74) = 𝑃(𝑍 ≤ (74 − 80)/5) = 𝑃(𝑍 ≤ −1.2) = 0.1151.

If your table does not have negatives, use 𝑃(𝑍 ≤ −1.2) = 𝑃(𝑍 ≥ 1.2) = 1 − 0.8849 =

0.1151.

What is the probability that a student scores between 78 and 88?

𝑃(78 ≤ 𝑋 ≤ 88) = 𝑃((78 − 80)/5 ≤ 𝑍 ≤ (88 − 80)/5) = 𝑃(−0.4 ≤ 𝑍 ≤ 1.6) = 𝑃(𝑍 ≤ 1.6) − 𝑃(𝑍 ≤ −0.4) = 0.9452 − 0.3446 = 0.6006.

What is the probability that an average of three scores is 82 or less?

𝑃(𝑋 ≤ 82) = 𝑃(𝑍 ≤ (82 − 80)/(5/ 3)) = 𝑃(𝑍 ≤ 0.69) = 0.7549.

Order Raw moment Central moment Cumulant

1 0

2

3 0 0

4 0

5 0 0

(21)

7 0 0

8 0

About 68% of values drawn from a normal distribution are within one standard deviation

σ > 0 away from the mean μ; about 95% of the values are within two standard deviations and

about 99.7% lie within three standard deviations. This is known as the 68-95-99.7 rule, or the

empirical rule, or the 3-sigma rule.

Figure 4 Dark blue is less than one standard deviation from the mean. For the normal

distribution, this accounts for about 68% of the set (dark blue), while two standard deviations

from the mean (medium and dark blue) account for about 95%, and three standard deviations

(light, medium, and dark blue) account for about 99.7%.

In statistics, the 68-95-99.7 rule, or three-sigma rule, or empirical rule, states that for a

normal distribution, nearly all values lie within 3 standard deviations of the mean.

About 68% of the values lie within 1 standard deviation of the mean (or between the

mean minus 1 times the standard deviation, and the mean plus 1 times the standard deviation). In

statistical notation, this is represented as: 𝜇 ± 𝜎.

About 95% of the values lie within 2 standard deviations of the mean (or between the

mean minus 2 times the standard deviation, and the mean plus 2 times the standard deviation).

The statistical notation for this is: 𝜇 ± 2𝜎.

Nearly all (99.7%) of the values lie within 3 standard deviations of the mean (or between

the mean minus 3 times the standard deviation and the mean plus 3 times the standard deviation).

(22)

This rule is often used to quickly get a rough estimate of something's probability, given

its standard deviation, if the population is assumed normal, thus also as a simple test for outliers

(if the population is assumed normal), and as a normality test (if the population is potentially not