ST 371 (IV): Discrete Random Variables

(1)

ST 371 (IV): Discrete Random Variables

1 Random Variables

A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible outcome of the experiment. We denote random variables by uppercase letters, often X, Y or Z.

Examples for random variables (rv).

• Toss a coin. The sample space S = {H, T }. Define an rv X such that X({H}) = 1 and X({T }) = 0.

X is called a Bernoulli random variable.

• Toss a coin until a head is observed. The sample space S = {H, T H, T T H, · · · }.

Define X = number of tosses needed until a head is observed. Then X({T H}) = 2, X({T T T T H}) = 5.

• Roll a pair of dice. Define

– X= sum of the numbers on the dice

– Y =the difference between the two numbers on the dice – Z=the maximum of the two numbers on the dice

Consider outcome ω = (2, 3). Then

X(ω) = 5, Y (ω) = −1, Z(ω) = 3.

• Define Y = the height above sea level at the selected location in US.

(2)

Discrete and continuous random variables. A random variable that can take on a finite or at most countably infinite number of values is said to be discrete (countably infinite means that the members in a set can be listed in an infinite sequence in which there is a first element, second element and so on). Examples include:

• the gender of a randomly selected student in class

• the total number of coin tosses required for observing two heads

• the number of students who are absent on the first day of class or the number of people arriving for treatment at an emergency room.

A random variable that can take on values in an interval of real numbers is said to be continuous. Examples include:

• the depth at randomly chosen locations of a lake

• the amount of gas needed to drive to work on a given day

• the survival time of a cancer patient

We will focus on discrete random variables in Chapter 3 and consider continuous random variables in Chapter 4.

2 Probability Mass Function

Associated with each discrete random variable X is a probability mass func- tion (pmf) that gives the probability that X equals x:

p(x) = P ({X = x}) = P ({all s ∈ S : X(s) = x}).

(3)

Example 1 Consider whether the next customer buying a laptop at a uni- versity bookstore buys a Mac or a PC model. Let

X =

½ 1 if a customer purchases a Mac 0 if a customer purchases a PC

If 20% of all customers during that week select a Mac, what is the pmf of the rv X?

Example 2 Suppose two fair dice are tossed.

Let X be the random variable that is the sum of the two upturned faces.

X is a discrete random variable since it has finitely many possible values (the 11 integers 2, 3, ..., 12).

The probability mass function of X is

x 2 3 4 5 6 7 8 9 10 11 12

p(x) ₃₆¹ ₃₆² ₃₆³ ₃₆⁴ ₃₆⁵ ₃₆⁶ ₃₆⁵ ₃₆⁴ ₃₆³ ₃₆² ₃₆¹

It is often instructive to present the probability mass function in a graph- ical format plotting p(xi) on the y-axis against xi on the x-axis.

(4)

2 4 6 8 10 12

0.040.080.120.16

X

Probability Mass Function

Remarks: So far, we have been defining probability functions in terms of the elementary outcomes making up an experiment’s sample space. Thus, if two fair dice were tossed, a probability was assigned to each of the 36 possible pairs of upturned faces.

We have seen that in certain situations some attribute of an outcome may hold more interest for the experimenter than the outcome itself. A craps player, for example, may be concerned only that he throws a 7, not whether the 7 was the result of a 5 and a 2, a 4 and a 3 or a 6 and a 1. That, being the case, it makes sense to replace the 36-member sample space

S = {(i, j) : i = 1, · · · , 6; j = 1, · · · , 6}

with the more relevant (and simpler) 11-member sample space of all possible two-dice sums,

S⁰ = {x = i + j : i + j = 2, 3, · · · , 12}.

This redefinition of the sample space not only changes the number of outcomes in the space (from 36 to 11) but also changes the probability struc- ture. In the original sample space, all 36 outcomes are equally likely. In the revised sample space, the 11 outcomes are not equally likely.

(5)

Example 3 Three balls are to be randomly selected without replacement from an urn containing balls numbered 1 through 20. Let X denote the largest number selected. X is a random variable taking on values 3, 4, ..., 20. Since we select the balls randomly, each of the C_3,20 combinations of the balls is equally likely to be chosen.

The probability mass function of X is P ({X = i}) = C_2,i−1

C_3,20 , i = 3, · · · , 20.

This equation follows because the number of selections that result in the event {X = i} is just the number of selections that result in the ball num- bered i and two of the balls numbered 1 through i − 1 being chosen.

5 10 15 20

0.000.050.100.15

X

Probability Mass Function

Suppose the random variable X can take on values {x₁, x₂, · · · }. Since the probability mass function is a probability function on the redefined sample space that considers values of X, we have that P^∞

i=1

P (X = x_i) = 1.

(6)

This follows from

1 = P (S)

= P ( [∞ i=1

{X = x_i})

= X∞

i=1

P (X = x_i).

Example 4 Independent trials, consisting of the flipping of a coin having probability p of coming up heads, are continually performed until a head occurs. Let X be the random variable that denotes the number of times the coin is flipped. The probability mass function for X is

P {X = 1} = P {H} = p

P {X = 2} = P {(T, H)} = (1 − p)p P {X = 3} = P {(T, T, H)} = (1 − p)²p

· · · ·

P {X = n − 1} = P {(T, T, . . . , T| {z }

n−2

, H)} = (1 − p)ⁿ⁻²p P {X = n} = P {(T, T, . . . , T| {z }

n−1

, T )} = (1 − p)ⁿ⁻¹p

· · · ·

3 Cumulative Distribution Function

The cumulative distribution function (CDF) of a random variable X is the function

F (x) = P (X ≤ x) = X

y:y≤x

p(y).

(7)

Example 5 The pmf of a random variable X is given by

x 1 2 3 4 5

p(x) 0.3 0.3 0.2 0.1 c

• What is c?

• What is the cdf of X?

• Calculate P (2 ≤ X ≤ 4).

(8)

All probability questions about X can be answered in terms of the cdf F . Specifically for discrete random variables,

P (a < X ≤ b) = F (b) − F (a) P (a ≤ X ≤ b) = F (b) − F (a − 1)

for all a < b. This can be seen by writing the event {X ≤ b} as the union of the mutually exclusive events {X ≤ a} and {a < X ≤ b}. That is, {X ≤ b} = {X ≤ a} ∪ {a < X ≤ b}. Therefore, we have P {X ≤ b} = P {X ≤ a} + P {a < X ≤ b} and the result follows.

Example 6 Consider selecting at random a student who is among the 15,000 registered for the current semester at NCSU. Let X=the number of courses for which the selected student is registered, and suppose that X has the following pmf:

x 1 2 3 4 5 6 7

p(x) .01 .03 .13 .25 .39 .17 .02

What is the probability of a student chooses three or more courses?

(9)

4 Expected Value

Probability mass functions provide a global overview of a random variable’s behavior. Detail that explicit, though, is not always necessary - or even helpful. Often times, we want to focus the information contained in the pmf by summarizing certain of its features with single numbers.

The first feature of a pmf that we will examine is central tendency, a term referring to the “average” value of a random variable. The most frequently used measure for describing central tendency is the expected value.

Generally, for a discrete random variable, the expected value of a random variable X is a weighted average of the possible values X can take on, each value being weighted by the probability that X assumes it:

E(X) = X

x:p(x)>0

xp(x)

A simple fact:

E(X + Y ) = E(X) + E(Y ).

Example 7 Consider the experiment of rolling a die. Let X be the number on the face.

• Compute E(X).

• Consider rolling a pair of dice. Let Y be the sum of the numbers.

Compute E(Y ).

(10)

Example 8 Consider Example 6. What is the average number of courses per student at NCSU?

5 Expectation of Function of a Random Variable

Suppose we are given a discrete random variable X along with its pmf and that we want to compute the expected value of some function of X, say g(X).

One approach is to directly determine the pmf of g(X).

Example 9 Let X denote a random variable that takes on the values

−1, 0, 1 with respective probabilities

P (X = −1) = .2, P (X = 0) = .5, P (X = 1) = .3 Compute E(X²).

(11)

Although the procedure we used in the previous example will always enable us to compute the expected value of g(X) from knowledge of the pmf of X, there is another way of thinking about E[g(X)]. Noting that g(X) will equal g(x) whenever X is equal to x, it seems reasonable that should just be a weighted average of the values g(x) with g(x) being weighted by the probability that X is equal to x.

Proposition 1 If X is a discrete random variable that takes on one of the values x_i, i ≥ 1 with respective probabilities p(x_i), then for any real valued function g, E[g(X)] = P

ig(x_i)p(x_i).

Applying the proposition to Example 3,

E(X²) = (−1)²(.2) + 0²(.5) + 1²(.3) = .5.

Proof of Proposition 1.

P

i

g(x_i)p(x_i) =P

j

P

i:g(xi)=yj

g(x_i)p(x_i)

= P

j

y_j P

i:g(xi)=yj

p(x_i)

= P

j

y_jP {g(X) = y_j}

= E[g(X)]

Corollary 1 (The Rule of expected value.) If a and b are constants, then E(aX + b) = aE(X) + b.

Proof of Corollary:

E(aX + b) = X

x

(ax + b) · p(x)

= aX

x

x · p(x) + bX

x

p(x)

(12)

Special cases of Corollary 1:

• E(aX) = aE(X).

• E(X + b) = E(X) + b.

Example 10 A computer store has purchased three computers of a certain type at $500 apiece. It will sell them for $1000 apiece. The manufacturer has agreed to repurchase any computers still unsold after a certain period at

$200 apiece. Let X denote the number of computers sold, and suppose that P (X = 0) = 0.1, P (X = 1) = 0.2, P (X = 2) = 0.3 and P (X = 3) = 0.4.

Let h(X) denote the profit associated with selling X units. What is the expected profit?

(13)

6 Variance

Another useful summary of a random variable’s pmf besides its central tendency is its “spread”. This is a very important concept in real life. For example, in the quality control of the lifetimes of a hard disk, we not only want the lifetime of a hard disk is long, but also want the lifetimes not to be too variable. Another example is in finance where investors not only want the investments with good returns (i.e., have a high expected value) but also want the investment not to be too risky (i.e., have a low spread).

A commonly used measure of spread is the variance of a random variable, which is the expected squared deviation of the random variable from its expected value. Specifically, let X have pmf p(x) and expected value µ, then the variance of X, denoted by V (X), or just σ_X² , is

V (X) = E[(X − µ)²]

= X

D

(x − µ)² · p(x).

The second equality holds by applying Proposition 1.

Explanations and intuitions for variance:

• (X − µ)² is the squared deviation of X from its mean

• The variance is the weighted average of squared deviations, where the weights are probabilities from the distribution.

• If most values of x is close to µ, then σ² would be relatively small.

• If most values of x is far away from µ, then σ² would be relatively large.

Definition: the standard deviation (SD) of X is σ_X = p

V (X) = q

σ_X² .

(14)

Consider the following situations:

• The following three random variables have expected value 0 but very different spreads:

– X = 0 with probability 1

– Y = −1 with probability of 0.5, 1 with probability 0.5.

– Z = −100 with probability 0.5, 100 with probability 0.5.

Compare V (X), V (Y ) and V (Z).

• Suppose that the rate of return on stock A takes on the values of 30%, 10% and −10% with respective probabilities 0.25, 0.50 and 0.25 and on stock B the values of 50%, 10% and −30% with the same probabilities 0.25, 0.50 and 0.25. Each stock then has the expected rate of return of 10%. Obviously stock A has less spread in its rate of return. Compare V (A) and V (B).

(15)

An alternative formula for variance. V (X) = E(X²) − [E(X)]². Proof. Let E(X) = µ. Then

V (X) = E[(X − µ)²]

= X

x

(x − µ)²p(x)

= X

x

(x² − 2µx + µ²)p(x)

= X

x

x²p(x) − 2µX

x

xp(x) + µ²X

x

p(x)

= E(X²) − 2µ² + µ²

= E(X²) − µ²

= E(X²) − [E(X)]².

The variance of a linear function. Let a, b be two constants, then V (aX + b) = a² · V (X).

Proof. Note that from Corollary 1, we have

E(aX + b) = aE(X) + b.

Let E(X) = µ. Then

V (aX + b) = E[{(aX + b) − E(aX + b)}²]

= E[(aX + b − aµ − b)]²

= E[a²(X − µ)²]

= a²[E(X − µ)²]

= a²V (X)

(16)

Example 11 Let X denote the number of computers sold, and suppose that the pmf of X is

P (X = 0) = 0.1, P (X = 1) = 0.2, P (X = 2) = 0.3, P (X = 3) = 0.4.

The profit is a function of the number of computers sold:

h(X) = 800X − 900.

What are the variance and SD of the profit h(X)?