• No results found

Lecture9

N/A
N/A
Protected

Academic year: 2020

Share "Lecture9"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 9

Conditional PMF and PDF. Basic probability laws of

continuous random variables.

Plan of the lecture:

1. Conditioning

1.1 Conditioning a Random Variable on an Event (discrete RV)

1.2 Conditioning one Random Variable on Another

1.3 Conditioning on a event (continuous RV)

2. Basic probability laws of continuous random variables

2.1Unifrom Random Variable

2.1.1 Continuous Uniform Random Variable

2.1.2 Discrete Uniform Random Variable

2.2Exponential Random Variable

2.2.1 Exponential PDF and CDF

2.2.2 Memoryless property

2.2.3 Occurrence and applications

2.3Normal random variables

(2)

1 Conditionig

If we have a probabilistic model and we are also told that a certain event 𝐴 has occurred,

we can capture this knowledge by employing the conditional instead of the original

(unconditional) probabilities. As discussed earlier, conditional probabilities are like ordinary

probabilities (satisfy the three axioms) except that they refer to a new universe in which event A

is known to have occurred. In the same spirit, we can talk about conditional PMFs which provide

the probabilities of the possible values of a random variable, conditioned on the occurrence of

some event.

1.1 Conditioning a Random Variable on an Event (discrete RV)

The conditional PMF of a random variable 𝑋, conditioned on a particular event 𝐴 with

𝑃(𝐴) > 0, is defined by

𝑝𝑋|𝐴(𝑥) = 𝑃(𝑋 = 𝑥|𝐴) = 𝑃 {𝑋=𝑥}∩𝐴 𝑃(𝐴) .

Note that the events {𝑋 = 𝑥} ∩ 𝐴 are disjoint for different values of 𝑥, their union is 𝐴,

and, therefore,

𝑃(𝐴) = 𝑃 {𝑋 = 𝑥} ∩ 𝐴 𝑥 .

Combining the above two formulas, we see that

𝑝𝑥 𝑋|𝐴(𝑥)= 1,

so 𝑝𝑋|𝐴is a legitimate PMF.

As an example, let 𝑋 be the roll of a die and let 𝐴 be the event that the roll is an even number. Then, by applying the preceding formula, we obtain

𝑝𝑋|𝐴 𝑥 = 𝑃 𝑋 = 𝑥 𝑟𝑜𝑙𝑙 𝑖𝑠 𝑒𝑣𝑒𝑛) =

𝑃 𝑋 = 𝑥 𝑎𝑛𝑑 𝑋 𝑖𝑠 𝑒𝑣𝑒𝑛

(3)

The conditional PMF is calculated similar to its unconditional counterpart: to obtain

𝑝𝑋|𝐴 𝑥 , we add the probabilities of the outcomes that give rise to 𝑋 = 𝑥 and belong to the

conditioning event 𝐴, and then normalize by dividing with 𝑃(𝐴) (see Fig. 1).

Figure 1: Visualization and calculation of the conditional PMF 𝑝𝑋|𝐴 𝑥 . For each 𝑥, we add the

probabilities of the outcomes in the intersection {𝑋 = 𝑥} ∩ 𝐴and normalize by diving with

𝑃(𝐴).

1.2 Conditioning one Random Variable on Another

Let 𝑋 and 𝑌 be two random variables associated with the same experiment. If we know

that the experimental value of 𝑌 is some particular 𝑦 (with 𝑝𝑌(𝑦) > 0), this provides partial knowledge about the value of 𝑋. This knowledge is captured by the conditional PMF 𝑝𝑋|𝑌 of 𝑋

given 𝑌, which is defined by specializing the definition of 𝑝𝑋|𝐴 to events 𝐴of the form {𝑌 = 𝑦}:

𝑝𝑋|𝑌(𝑥|𝑦) = 𝑃(𝑋 = 𝑥|𝑌 = 𝑦).

Using the definition of conditional probabilities, we have

𝑝𝑋|𝑌 𝑥 𝑦 =𝑃 𝑋=𝑥,𝑌=𝑦 𝑃 𝑌=𝑦 =𝑝𝑋 ,𝑌𝑝 (𝑥,𝑦)

𝑌(𝑦) .

Let us fix some 𝑦, with 𝑝𝑌(𝑦) > 0 and consider 𝑝𝑋|𝑌 𝑥 𝑦 as a function of 𝑥. This

function is a valid PMF for 𝑋: it assigns nonnegative values to each possible 𝑥, and these values

add to 1. Furthermore, this function of 𝑥, has the same shape as 𝑝𝑋,𝑌(𝑥, 𝑦) except that it is

normalized by dividing with 𝑝𝑌(𝑦), which enforces the normalization property

(4)

Figure 2 provides a visualization of the conditional PMF.

Figure 2: Visualization of the conditional PMF 𝑝𝑋|𝑌 𝑥 𝑦 . For each 𝑦, we view the joint PMF

along the slice 𝑌 = 𝑦and renormalize so that 𝑝𝑥 𝑋|𝑌 𝑥 𝑦 = 1.

The conditional PMF is often convenient for the calculation of the joint PMF, using a

sequential approach and the formula

𝑝𝑋,𝑌 𝑥, 𝑦 = 𝑝𝑌(𝑦)𝑝𝑋|𝑌 𝑥 𝑦 ,

or its counterpart

𝑝𝑋,𝑌 𝑥, 𝑦 = 𝑝𝑋(𝑥)𝑝𝑌|𝑋 𝑦 𝑥 .

This method is entirely similar to the use of the multiplication rule.

Conditional Expectation

A conditional PMF can be thought of as an ordinary PMF over a new universe

determined by the conditioning event. In the same spirit, a conditional expectation is the same as

an ordinary expectation, except that it refers to the new universe, and all probabilities and PMFs

are replaced by their conditional counterparts. We list the main definitions and relevant facts

(5)

Summary of Facts About Conditional Expectations

Let 𝑋and 𝑌be random variables associated with the same experiment.

 The conditional expectation of 𝑋given an event 𝐴with 𝑃(𝐴) > 0, is defined by

𝐸[𝑋|𝐴] = 𝑥𝑝𝑥 𝑋|𝐴(𝑥|𝐴).

For a function 𝑔(𝑋), it is given by

𝐸 𝑔(𝑋)|𝐴 = 𝑔(𝑥)𝑝𝑥 𝑋|𝐴(𝑥|𝐴).

 The conditional expectation of 𝑋given a value 𝑦of 𝑌is defined by

𝐸[𝑋|𝑌 = 𝑦] = 𝑥𝑝𝑥 𝑋|𝑌(𝑥|𝑦).

 We have

𝐸[𝑋] = 𝑝𝑦 𝑌(𝑦)𝐸[𝑋|𝑌 = 𝑦].

This is the total expectation theorem.

 Let 𝐴1, 𝐴2, . . . , 𝐴𝑛 be disjoint events that form a partition of the sample space, and

assume that 𝑃(𝐴𝑖) > 0 for all 𝑖. Then,

𝐸[𝑋] = 𝑛 𝑃(𝐴𝑖)𝐸[𝑋|𝐴𝑖]

𝑖=1 .

1.3 Conditioning on a event (continuous RV)

The conditional PDF of a continuous random variable 𝑋, conditioned on a particular

event 𝐴with 𝑃(𝐴) > 0, is a function 𝑓𝑋|𝐴 that satisfies

𝑃(𝑋 ∈ 𝐵|𝐴) = 𝑓𝐵 𝑋|𝐴(𝑥)𝑑𝑥,

for any subset 𝐵of the real line. It is the same as an ordinary PDF, except that it now refers to a

(6)

An important special case arises when we condition on 𝑋 belonging to a subset 𝐴 of the

real line, with 𝑃(𝑋 ∈ 𝐴) > 0. We then have

𝑃(𝑋 ∈ 𝐵|𝑋 ∈ 𝐴) = 𝑃(𝑋∈𝐵 𝑎𝑛𝑑 𝑋∈𝐴)𝑃(𝑋∈𝐴) = 𝐴∩𝐵𝑓𝑋(𝑥)𝑑𝑥

𝑃(𝑋∈𝐴) .

This formula must agree with the earlier one, and therefore,

𝑓𝑋|𝐴(𝑥|𝐴) =

𝑓𝑋(𝑥)

𝑃(𝑋 ∈ 𝐴) 𝑖𝑓 𝑥 ∈ 𝐴, 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.

As in the discrete case, the conditional PDF is zero outside the conditioning set. Within

the conditioning set, the conditional PDF has exactly the same shape as the unconditional one,

except that it is scaled by the constant factor 1/𝑃(𝑋 ∈ 𝐴). This normalization ensures that 𝑓𝑋|𝐴

integrates to 1, which makes it a legitimate PDF; see Fig. 3.

Figure 3: The unconditional PDF 𝑓𝑋 and the conditional PDF 𝑓𝑋|𝐴, where A is the interval [𝑎, 𝑏].

Note that within the conditioning event 𝐴, 𝑓𝑋|𝐴 retains the same shape as 𝑓𝑋, except that it is

scaled along the vertical axis.

Conditional PDF and Expectation Given an Event

 The corresponding conditional expectation is defined by

𝐸[𝑋|𝐴] = 𝑥𝑓−∞∞ 𝑋|𝐴(𝑥) 𝑑𝑥.

 The expected value rule remains valid:

(7)

 If 𝐴1, 𝐴2, . . . , 𝐴𝑛 are disjoint events with 𝑃(𝐴𝑖) > 0 for each 𝑖, that form a partition

of the sample space, then

𝑓𝑋(𝑥) = 𝑃(𝐴𝑖)𝑓𝑋|𝐴𝑖(𝑥)

𝑛

𝑖=1

(a version of the total probability theorem), and

𝐸[𝑋] = 𝑃(𝐴𝑖)𝐸[𝑋|𝐴𝑖] 𝑛

𝑖=1

(the total expectation theorem). Similarly,

𝐸 𝑔(𝑋) = 𝑛𝑖=1𝑃(𝐴𝑖)𝐸 𝑔(𝑋)|𝐴𝑖 .

2 Basic probability laws of continuous random variables

2.1 Unifrom Random Variable

2.1.1 Continuous Uniform Random Variable

A gambler spins a wheel of fortune, continuously calibrated between 0 and 1, and

observes the resulting number. Assuming that all subintervals of [0,1] of the same length are equally likely, this experiment can be modeled in terms a random variable 𝑋 with PDF

𝑓𝑋(𝑥) = 𝑐 𝑖𝑓 0 ≤ 𝑥 ≤ 1,0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒,

for some constant 𝑐. This constant can be determined by using the normalization property

1 = 𝑓∞ 𝑋(𝑥)𝑑𝑥

−∞

= 𝑐𝑑𝑥1

0

= 𝑐 𝑑𝑥1

0

(8)

so that 𝑐 = 1.

More generally, we can consider a random variable 𝑋 that takes values in an interval

[𝑎, 𝑏], and again assume that all subintervals of the same length are equally likely. We refer to

this type of random variable as uniform or uniformly distributed. Its PDF has the form

𝑓𝑋(𝑥) = 𝑐 𝑖𝑓 𝑎 ≤ 𝑥 ≤ 𝑏, 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒,

where 𝑐 is a constant. This is the continuous analog of the discrete uniform random variable. For

𝑓𝑋 to satisfy the normalization property, we must have (cf. Fig. 2)

1 = 𝑐𝑑𝑥𝑏

𝑎

= 𝑐 𝑑𝑥𝑏

𝑎

= 𝑐 𝑏 − 𝑎

so that 𝑐 =𝑏−𝑎1 .

Note that the probability 𝑃(𝑋 ∈ 𝐼) that 𝑋 takes value in a set 𝐼 is

𝑃 𝑋 ∈ 𝐼 = [𝑎,𝑏]∩𝐼𝑏−𝑎1 𝑑𝑥= 𝑏−𝑎1 [𝑎,𝑏]∩𝐼𝑑𝑥 =𝑙𝑒𝑛𝑔𝑡 𝑕 𝑜𝑓 [𝑎,𝑏]∩𝐼 𝑙𝑒𝑛𝑔𝑡 𝑕 𝑜𝑓 [𝑎,𝑏] .

The uniform random variable bears a relation to the discrete uniform law, which involves

a sample space with a finite number of equally likely outcomes. The difference is that to obtain

the probability of various events, we must now calculate the “length” of various subsets of the

real line instead of counting the number of outcomes contained in various events.

The CDF is

𝐹𝑋 𝑥 =

0 𝑓𝑜𝑟 𝑥 < 𝑎, 𝑥 − 𝑎

(9)

a) b)

Figure 4: Uniform Probability density function (a) and Cumulative distribution function (b)

Mean and Variance of the Uniform Random Variable

Consider the case of a uniform PDF over an interval [𝑎, 𝑏]. We have

𝐸 𝑋 = 𝑥𝑓−∞∞ 𝑋 𝑥 𝑑𝑥= 𝑥 ∙𝑎𝑏 𝑏−𝑎1 𝑑𝑥= 𝑏−𝑎1 ∙12𝑥2 𝑏𝑎 =𝑏−𝑎1 ∙𝑏

2−𝑎2

2 =

𝑎+𝑏 2 ,

as one expects based on the symmetry of the PDF around (𝑎 + 𝑏)/2.

To obtain the variance, we first calculate the second moment. We have

𝐸 𝑋2 = 𝑥2 𝑏−𝑎𝑑𝑥 𝑏

𝑎 =

1 𝑏−𝑎 𝑥

2𝑑𝑥 𝑏 𝑎 = 1 𝑏−𝑎∙ 1 3𝑥

3 𝑏

𝑎 =

𝑏3−𝑎3

3(𝑏−𝑎)=

𝑎2+𝑎𝑏 +𝑏2

3 .

Thus, the variance is obtained as

var 𝑋 = 𝐸 𝑋2 − 𝐸 𝑋 2 = 𝑎2+𝑎𝑏 +𝑏2

3 −

𝑎+𝑏 2

4 =

𝑏−𝑎 2

12 ,

after some calculation.

2.1.2 Discrete Uniform Random Variable

What is the mean and variance of the roll of a fair six-sided die? If we view the result of

(10)

𝑝𝑋(𝑘) = 1/6 𝑖𝑓 𝑘 = 1, 2, 3, 4, 5, 6, 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.

Since the PMF is symmetric around 3.5, we conclude that 𝐸[𝑋] = 3.5. Regarding the variance, we have

var 𝑋 = 𝐸 𝑋2 − 𝐸 𝑋 2 =1 6(1

2+ 22 + 32+ 42 + 52+ 62) − (3.5)2,

which yields var(𝑋) = 35/12.

The above random variable is a special case of a discrete uniformly distributed random

variable (or discrete uniform for short), which by definition, takes one out of a range of

contiguous integer values, with equal probability. More precisely, this random variable has a

PMF of the form

𝑝𝑋(𝑘) = 1

𝑏−𝑎+1 𝑖𝑓 𝑘 = 𝑎, 𝑎 + 1, … , 𝑏,

0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒, ,

where 𝑎and 𝑏are two integers with 𝑎 < 𝑏. The mean is

𝐸[𝑋] =𝑎+𝑏2 ,

as can be seen by inspection, since the PMF is symmetric around (𝑎 + 𝑏)/2. To calculate the

variance of 𝑋, we first consider the simpler case where 𝑎 = 1 and 𝑏 = 𝑛. It can be verified by induction on 𝑛that

𝐸[𝑋2] = 1

𝑛 𝑘

2 𝑛

𝑘=1 =16(𝑛 + 1)(2𝑛 + 1).

We leave the verification of this as an exercise for the reader. The variance can now be

obtained in terms of the first and second moments

var(𝑋) = 𝐸[𝑋2] − 𝐸[𝑋] 2 =1

6(𝑛 + 1)(2𝑛 + 1) − 1

4(𝑛 + 1) 2 = 1

12(𝑛 + 1)(4𝑛 + 2 −

(11)

Figure 5: PMF of the discrete random variable that is uniformly distributed between two

integers 𝑎and 𝑏. Its mean and variance are

𝐸[𝑋] =𝑎+𝑏2 , var[𝑋] = 𝑏−𝑎 (𝑏−𝑎+2)12 .

For the case of general integers 𝑎 and 𝑏, we note that the uniformly distributed random variable over [𝑎, 𝑏] has the same variance as the uniformly distributed random variable over the

interval [1, 𝑏 − 𝑎 + 1], since these two random variables differ by the constant 𝑎 − 1. Therefore,

the desired variance is given by the above formula with 𝑛 = 𝑏 − 𝑎 + 1, which yields

var 𝑋 = 𝑏−𝑎+1 12 2−1 = 𝑏−𝑎 (𝑏−𝑎+2)12 .

a) b)

Figure 6: Discrete Uniform PDF (a) and CDF (b)

2.2 Exponential Random Variable

2.2.1 Exponential PDF and CDF

(12)

𝑓𝑋 𝑥 = 𝜆𝑒

−𝜆𝑥 𝑖𝑓 𝑥 ≥ 0,

0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒,

where 𝜆 is a positive parameter characterizing the PDF (see Fig. 4). This is a legitimate PDF because

𝑓−∞∞ 𝑋(𝑥)𝑑𝑥 = 𝜆𝑒0∞ −𝜆𝑥𝑑𝑥= −𝑒−𝜆𝑥|0∞ = 1.

Note that the probability that 𝑋 exceeds a certain value falls exponentially. Indeed, for any 𝑎 ≥ 0, we have

𝑃(𝑋 ≥ 𝑎) = 𝜆𝑒∞ −𝜆𝑥𝑑𝑥

𝑎 = −𝑒−𝜆𝑥|𝑎∞ = 𝑒−𝜆𝑎.

An exponential random variable can be a very good model for the amount of time until a

piece of equipment breaks down, until a light bulb burns out, or until an accident occurs. It will

play a major role in study of random processes.

Figure 7: The PDF λe−λx of an exponential random variable.

The mean and the variance can be calculated to be

𝐸[𝑋] =1𝜆, var(𝑋) =𝜆12.

These formulas can be verified by straightforward calculation. We have, using integration

by parts,

𝐸[𝑋] = 𝑥𝜆𝑒∞ −𝜆𝑥𝑑𝑥

0 = (−𝑥𝑒−𝜆𝑥)|0∞ + 𝑒−𝜆𝑥𝑑𝑥

0 = 0 −

𝑒−𝜆𝑥 𝜆 |0

=1 𝜆.

(13)

𝐸[𝑋2] = 𝑥∞ 2𝜆𝑒−𝜆𝑥𝑑𝑥

0 = (−𝑥2𝑒−𝜆𝑥)|0∞+ 2𝑥𝑒−𝜆𝑥𝑑𝑥

0 = 0 +

2

𝜆𝐸[𝑋] = 2 𝜆2.

Finally, using the formula var(𝑋) = 𝐸[𝑋2] − 𝐸[𝑋] 2, we obtain

var 𝑋 = 𝜆22𝜆12 = 𝜆12.

The cumulative distribution function is given by:

𝐹𝑋 𝑥 = 1 − 𝑒

−𝜆𝑥 𝑖𝑓 𝑥 ≥ 0,

0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.

Other moments:

Median: ln ⁡(2) 𝜆 .

Mode: 0. Skewness: 2.

Kurtosis: 6.

2.2.2 Memoryless property

An important property of the exponential distribution is that it is memoryless. This

means that if a random variable 𝑇 is exponentially distributed, its conditional probability obeys:

𝑃 𝑇 > 𝑠 + 𝑡 | 𝑇 > 𝑠 = 𝑃 𝑇 > 𝑡 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑠, 𝑡 ≥ 0.

This says that the conditional probability that we need to wait, for example, more than

another 10 seconds before the first arrival, given that the first arrival has not yet happened after

30 seconds, is no different from the initial probability that we need to wait more than 10 seconds

for the first arrival. This is often misunderstood by students taking courses on probability: the

fact tha 𝑃(𝑇 > 40 | 𝑇 > 30) = 𝑃(𝑇 > 10)t does not mean that the events 𝑇 > 40 and 𝑇 > 30

are independent. To summarize: "memorylessness" of the probability distribution of the waiting

time 𝑇 until the first arrival means

(14)

It does NOT MEAN

𝑊𝑅𝑂𝑁𝐺 𝑃(𝑇 > 40 | 𝑇 > 30) = 𝑃(𝑇 > 40).

(That would be independence. These two events are NOT independent.)

The exponential distributions and the geometric distributions are the only memoryless

probability distributions.

a) b)

Figure 8: Exponential Probability density function (a) and Cumulative distribution function (b)

2.2.3 Occurrence and applications

The exponential distribution occurs naturally when describing the lengths of the

inter-arrival times in a homogeneous Poisson process.

The exponential distribution may be viewed as a continuous counterpart of the geometric

distribution, which describes the number of Bernoulli trials necessary for a discrete process to

change state. In contrast, the exponential distribution describes the time for a continuous process

to change state.

In real-world scenarios, the assumption of a constant rate (or probability per unit time) is

rarely satisfied. For example, the rate of incoming phone calls differs according to the time of

day. But if we focus on a time interval during which the rate is roughly constant, such as from 2

to 4 p.m. during work days, the exponential distribution can be used as a good approximate

(15)

In queuing theory, the service times of agents in a system (e.g. how long it takes for a

bank teller etc. to serve a customer) are often modeled as exponentially distributed variables.

Reliability theory and reliability engineering also make extensive use of the exponential

distribution. Because of the memoryless property of this distribution, it is well-suited to model

the constant hazard rate portion of the bathtub curve used in reliability theory. It is also very

convenient because it is so easy to add failure rates in a reliability model. The exponential

distribution is however not appropriate to model the overall lifetime of organisms or technical

devices, because the "failure rates" here are not constant: more failures occur for very “young”

and for very “old” systems.

2.3 Normal random variables

A continuous random variable 𝑋is said to be normal or Gaussian if it has a PDF of the form (see Fig. 9)

𝑓𝑋(𝑥) = 1

2𝜋𝜎𝑒

−(𝑥−𝜇)2/2𝜎2,

where 𝜇and 𝜎are two scalar parameters characterizing the PDF, with 𝜎assumed nonnegative. It can be verified that the normalization property

1

2𝜋𝜎 𝑒

−(𝑥−𝜇)2/2𝜎2𝑑𝑥

−∞

= 1

holds.

Figure 9: A normal PDF and CDF, with 𝜇 = 1 and 𝜎2 = 1. We observe that the PDF is

symmetric around its mean 𝜇, and has a characteristic bell-shape. As 𝑥gets further from 𝜇, the

(16)

The mean and the variance can be calculated to be

𝐸[𝑋] = 𝜇, var(𝑋) = 𝜎2.

To see this, note that the PDF is symmetric around 𝜇, so its mean must be 𝜇.

Furthermore, the variance is given by

var(𝑋) = 2𝜋𝜎1 (𝑥 − 𝜇)2𝑒−(𝑥−𝜇)2/2𝜎2

𝑑𝑥

−∞ .

Using the change of variables 𝑦 = (𝑥 − 𝜇)/𝜎and integration by parts, we have

var 𝑋 = 𝜎2

2𝜋 𝑦

2𝑒−𝑦 22𝑑𝑦

−∞ = var 𝑋 =

𝜎2

2𝜋 −𝑦𝑒

−𝑦 22

−∞ +

𝜎2

2𝜋 𝑒

−𝑦 22𝑑𝑦

−∞ =

𝜎2

2𝜋 𝑒

−𝑦 22𝑑𝑦

−∞ = 𝜎2.

The last equality above is obtained by using the fact

1

2𝜋 𝑒

−𝑦 22𝑑𝑦

−∞ = 1,

which is just the normalization property of the normal PDF for the case where 𝜇 = 0 and 𝜎 = 1. The normal random variable has several special properties.

Normality is Preserved by Linear Transformations

If 𝑋 is a normal random variable with mean 𝜇 and variance 𝜎2, and if 𝑎, 𝑏 are scalars, then the random variable

𝑌 = 𝑎𝑋 + 𝑏

is also normal, with mean and variance

𝐸[𝑌] = 𝑎𝜇 + 𝑏, var(𝑌) = 𝑎2𝜎2.

(17)

A normal random variable 𝑌 with zero mean (𝜇 = 0) and unit variance (𝜎2 = 1) is said

to be a standard normal. Its CDF is denoted by Φ,

Φ(𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑌 < 𝑦) = 1

2𝜋 𝑒

−𝑡22𝑑𝑡 𝑦

−∞ .

It is recorded in a table, and is a very useful tool for calculating various probabilities

involving normal random variables; see also Fig. 10.

Note that the table only provides the values of Φ(𝑦) for 𝑦 ≥ 0, because the omitted values can be found using the symmetry of the PDF. For example, if 𝑌 is a standard normal

random variable, we have

Φ(−0.5) = 𝑃(𝑌 ≤ −0.5) = 𝑃(𝑌 ≥ 0.5) = 1 − 𝑃(𝑌 < 0.5) = 1 − Φ(0.5) = 1 − 0.6915 = 0.3085.

Let 𝑋be a normal random variable with mean 𝜇and variance 𝜎2. We “standardize” 𝑋by defining a new random variable 𝑌given by

𝑌 =𝑋−𝜇𝜎 .

Since 𝑌is a linear transformation of 𝑋, it is normal. Furthermore,

𝐸[𝑌] =𝐸[𝑋]−𝜇𝜎 = 0, var(𝑌) =var (𝑋)𝜎2 = 1.

Thus, 𝑌 is a standard normal random variable. This fact allows us to calculate the

(18)

Figure 10: The PDF 𝑓𝑌(𝑦) = 1 2𝜋𝑒

−𝑦2/2

of the standard normal random variable. Its

corresponding CDF, which is denoted by Φ(𝑦), is recorded in a table.

Generalizing the approach, we have the following procedure.

CDF Calculation of the Normal Random Variable

The CDF of a normal random variable 𝑋 with mean 𝜇 and variance 𝜎2 is obtained using the standard normal table as

𝑃(𝑋 ≤ 𝑥) = 𝑃 𝑋−𝜇𝜎 ≤𝑥−𝜇𝜎 = 𝑃 𝑌 ≤ 𝑥−𝜇𝜎 = Φ 𝑥−𝜇𝜎 ,

where 𝑌is a standard normal random variable.

The normal random variable is often used in signal processing and communications

engineering to model noise and unpredictable distortions of signals.

The normal random variable plays an important role in a broad range of probabilistic

models. The main reason is that, generally speaking, it models well the additive effect of many

independent factors, in a variety of engineering, physical, and statistical contexts.

Mathematically, the key fact is that the sum of a large number of independent and identically

distributed (not necessarily normal) random variables has an approximately normal CDF,

regardless of the CDF of the individual random variables. This property is captured in the

celebrated central limit theorem.

The standard normal CDF can be expressed in terms of a special function called the error

function, as

Φ(𝑦) =12 1 + erf 𝑦

2 .

Probability Content from

−∞

to

𝒁

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

(19)

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441

1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545

1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633

1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706

1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817

2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857

2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890

2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916

2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952

2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964

2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974

2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981

2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Examples

A professor's exam scores are approximately distributed normally with mean 80 and

standard deviation 5.

What is the probability that a student scores an 82 or less?

𝑃(𝑋 ≤ 82) = 𝑃(𝑍 ≤ (82 − 80)/5) = 𝑃(𝑍 ≤ 0.40) = 0.6554.

(20)

𝑃(𝑋 ≥ 90) = 𝑃(𝑍 ≥ (90 − 80)/5) = 𝑃(𝑍 ≥ 2.00) = 1 − 𝑃(𝑍 ≤ 2.00) = 1 − 0.9772 = 0.0228.

What is the probability that a student scores a 74 or less?

𝑃(𝑋 ≤ 74) = 𝑃(𝑍 ≤ (74 − 80)/5) = 𝑃(𝑍 ≤ −1.2) = 0.1151.

If your table does not have negatives, use 𝑃(𝑍 ≤ −1.2) = 𝑃(𝑍 ≥ 1.2) = 1 − 0.8849 =

0.1151.

What is the probability that a student scores between 78 and 88?

𝑃(78 ≤ 𝑋 ≤ 88) = 𝑃((78 − 80)/5 ≤ 𝑍 ≤ (88 − 80)/5) = 𝑃(−0.4 ≤ 𝑍 ≤ 1.6) = 𝑃(𝑍 ≤ 1.6) − 𝑃(𝑍 ≤ −0.4) = 0.9452 − 0.3446 = 0.6006.

What is the probability that an average of three scores is 82 or less?

𝑃(𝑋 ≤ 82) = 𝑃(𝑍 ≤ (82 − 80)/(5/ 3)) = 𝑃(𝑍 ≤ 0.69) = 0.7549.

Order Raw moment Central moment Cumulant

1 0

2

3 0 0

4 0

5 0 0

(21)

7 0 0

8 0

About 68% of values drawn from a normal distribution are within one standard deviation

σ > 0 away from the mean μ; about 95% of the values are within two standard deviations and

about 99.7% lie within three standard deviations. This is known as the 68-95-99.7 rule, or the

empirical rule, or the 3-sigma rule.

Figure 4 Dark blue is less than one standard deviation from the mean. For the normal

distribution, this accounts for about 68% of the set (dark blue), while two standard deviations

from the mean (medium and dark blue) account for about 95%, and three standard deviations

(light, medium, and dark blue) account for about 99.7%.

In statistics, the 68-95-99.7 rule, or three-sigma rule, or empirical rule, states that for a

normal distribution, nearly all values lie within 3 standard deviations of the mean.

About 68% of the values lie within 1 standard deviation of the mean (or between the

mean minus 1 times the standard deviation, and the mean plus 1 times the standard deviation). In

statistical notation, this is represented as: 𝜇 ± 𝜎.

About 95% of the values lie within 2 standard deviations of the mean (or between the

mean minus 2 times the standard deviation, and the mean plus 2 times the standard deviation).

The statistical notation for this is: 𝜇 ± 2𝜎.

Nearly all (99.7%) of the values lie within 3 standard deviations of the mean (or between

the mean minus 3 times the standard deviation and the mean plus 3 times the standard deviation).

(22)

This rule is often used to quickly get a rough estimate of something's probability, given

its standard deviation, if the population is assumed normal, thus also as a simple test for outliers

(if the population is assumed normal), and as a normality test (if the population is potentially not

Figure

Figure 1: Visualization and calculation of the conditional PMF
Figure 2 provides a visualization of the conditional PMF.
Figure 3: The unconditional PDF
Figure 4: Uniform Probability density function (a) and Cumulative distribution function (b)
+6

References

Related documents

It interrogates what value municipal services, framed in the language and form of welfare but within a commodification milieu and in the context of shifting

Effects are measured relative to: quarters Q1-Q2 2011 for loans and Q2 2012 for overdrafts; sales: £49,999 or less; risk rating: minimal; loan default: 0-4 number of times unable

Development of the curriculum of the Master of Advanced Industrial Management European Academy on Industrial Management (AIM) Industrial enterprises &amp; Organizations.

Think of the 4th grade math question: &#34;If 12 apples cost $1.20, how much does 1 apple cost?&#34; If you know the red cell count and you know how much volume these cells take

The table contains data for all EU-27 countries on temporary employment (in thousands) for NACE sectors L, M and N (rev 1.) and O, P and Q (rev.. Data are presented separately

The results of hedonic test showed that average value of panellist’s favourite level on the overall attribute of high antioxidant cake ranging from 4.44 to 5.26 (usual).. The

2 Percentage endorsement rates for items from the DISCO PDA measure stratified by group ( “substantial” PDA features, “some” PDA features and the rest of the sample).. N

Given the high degree of vertical integration in the industry, the existence of prices, which are close to the competitive level, can only be provided by increase