• No results found

ST 371 (IV): Discrete Random Variables

N/A
N/A
Protected

Academic year: 2021

Share "ST 371 (IV): Discrete Random Variables"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

ST 371 (IV): Discrete Random Variables

1 Random Variables

A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible out- come of the experiment. We denote random variables by uppercase letters, often X, Y or Z.

Examples for random variables (rv).

• Toss a coin. The sample space S = {H, T }. Define an rv X such that X({H}) = 1 and X({T }) = 0.

X is called a Bernoulli random variable.

• Toss a coin until a head is observed. The sample space S = {H, T H, T T H, · · · }.

Define X = number of tosses needed until a head is observed. Then X({T H}) = 2, X({T T T T H}) = 5.

• Roll a pair of dice. Define

– X= sum of the numbers on the dice

– Y =the difference between the two numbers on the dice – Z=the maximum of the two numbers on the dice

Consider outcome ω = (2, 3). Then

X(ω) = 5, Y (ω) = −1, Z(ω) = 3.

• Define Y = the height above sea level at the selected location in US.

(2)

Discrete and continuous random variables. A random variable that can take on a finite or at most countably infinite number of values is said to be discrete (countably infinite means that the members in a set can be listed in an infinite sequence in which there is a first element, second element and so on). Examples include:

• the gender of a randomly selected student in class

• the total number of coin tosses required for observing two heads

• the number of students who are absent on the first day of class or the number of people arriving for treatment at an emergency room.

A random variable that can take on values in an interval of real numbers is said to be continuous. Examples include:

• the depth at randomly chosen locations of a lake

• the amount of gas needed to drive to work on a given day

• the survival time of a cancer patient

We will focus on discrete random variables in Chapter 3 and consider continuous random variables in Chapter 4.

2 Probability Mass Function

Associated with each discrete random variable X is a probability mass func- tion (pmf) that gives the probability that X equals x:

p(x) = P ({X = x}) = P ({all s ∈ S : X(s) = x}).

(3)

Example 1 Consider whether the next customer buying a laptop at a uni- versity bookstore buys a Mac or a PC model. Let

X =

½ 1 if a customer purchases a Mac 0 if a customer purchases a PC

If 20% of all customers during that week select a Mac, what is the pmf of the rv X?

Example 2 Suppose two fair dice are tossed.

Let X be the random variable that is the sum of the two upturned faces.

X is a discrete random variable since it has finitely many possible values (the 11 integers 2, 3, ..., 12).

The probability mass function of X is

x 2 3 4 5 6 7 8 9 10 11 12

p(x) 361 362 363 364 365 366 365 364 363 362 361

It is often instructive to present the probability mass function in a graph- ical format plotting p(xi) on the y-axis against xi on the x-axis.

(4)

2 4 6 8 10 12

0.040.080.120.16

X

Probability Mass Function

Remarks: So far, we have been defining probability functions in terms of the elementary outcomes making up an experiment’s sample space. Thus, if two fair dice were tossed, a probability was assigned to each of the 36 possible pairs of upturned faces.

We have seen that in certain situations some attribute of an outcome may hold more interest for the experimenter than the outcome itself. A craps player, for example, may be concerned only that he throws a 7, not whether the 7 was the result of a 5 and a 2, a 4 and a 3 or a 6 and a 1. That, being the case, it makes sense to replace the 36-member sample space

S = {(i, j) : i = 1, · · · , 6; j = 1, · · · , 6}

with the more relevant (and simpler) 11-member sample space of all possible two-dice sums,

S0 = {x = i + j : i + j = 2, 3, · · · , 12}.

This redefinition of the sample space not only changes the number of out- comes in the space (from 36 to 11) but also changes the probability struc- ture. In the original sample space, all 36 outcomes are equally likely. In the revised sample space, the 11 outcomes are not equally likely.

(5)

Example 3 Three balls are to be randomly selected without replacement from an urn containing balls numbered 1 through 20. Let X denote the largest number selected. X is a random variable taking on values 3, 4, ..., 20. Since we select the balls randomly, each of the C3,20 combinations of the balls is equally likely to be chosen.

The probability mass function of X is P ({X = i}) = C2,i−1

C3,20 , i = 3, · · · , 20.

This equation follows because the number of selections that result in the event {X = i} is just the number of selections that result in the ball num- bered i and two of the balls numbered 1 through i − 1 being chosen.

5 10 15 20

0.000.050.100.15

X

Probability Mass Function

Suppose the random variable X can take on values {x1, x2, · · · }. Since the probability mass function is a probability function on the redefined sample space that considers values of X, we have that P

i=1

P (X = xi) = 1.

(6)

This follows from

1 = P (S)

= P ( [ i=1

{X = xi})

= X

i=1

P (X = xi).

Example 4 Independent trials, consisting of the flipping of a coin having probability p of coming up heads, are continually performed until a head occurs. Let X be the random variable that denotes the number of times the coin is flipped. The probability mass function for X is

P {X = 1} = P {H} = p

P {X = 2} = P {(T, H)} = (1 − p)p P {X = 3} = P {(T, T, H)} = (1 − p)2p

· · · ·

P {X = n − 1} = P {(T, T, . . . , T| {z }

n−2

, H)} = (1 − p)n−2p P {X = n} = P {(T, T, . . . , T| {z }

n−1

, T )} = (1 − p)n−1p

· · · ·

3 Cumulative Distribution Function

The cumulative distribution function (CDF) of a random variable X is the function

F (x) = P (X ≤ x) = X

y:y≤x

p(y).

(7)

Example 5 The pmf of a random variable X is given by

x 1 2 3 4 5

p(x) 0.3 0.3 0.2 0.1 c

• What is c?

• What is the cdf of X?

• Calculate P (2 ≤ X ≤ 4).

(8)

All probability questions about X can be answered in terms of the cdf F . Specifically for discrete random variables,

P (a < X ≤ b) = F (b) − F (a) P (a ≤ X ≤ b) = F (b) − F (a − 1)

for all a < b. This can be seen by writing the event {X ≤ b} as the union of the mutually exclusive events {X ≤ a} and {a < X ≤ b}. That is, {X ≤ b} = {X ≤ a} ∪ {a < X ≤ b}. Therefore, we have P {X ≤ b} = P {X ≤ a} + P {a < X ≤ b} and the result follows.

Example 6 Consider selecting at random a student who is among the 15,000 registered for the current semester at NCSU. Let X=the number of courses for which the selected student is registered, and suppose that X has the following pmf:

x 1 2 3 4 5 6 7

p(x) .01 .03 .13 .25 .39 .17 .02

What is the probability of a student chooses three or more courses?

(9)

4 Expected Value

Probability mass functions provide a global overview of a random variable’s behavior. Detail that explicit, though, is not always necessary - or even helpful. Often times, we want to focus the information contained in the pmf by summarizing certain of its features with single numbers.

The first feature of a pmf that we will examine is central tendency, a term referring to the “average” value of a random variable. The most frequently used measure for describing central tendency is the expected value.

Generally, for a discrete random variable, the expected value of a random variable X is a weighted average of the possible values X can take on, each value being weighted by the probability that X assumes it:

E(X) = X

x:p(x)>0

xp(x)

A simple fact:

E(X + Y ) = E(X) + E(Y ).

Example 7 Consider the experiment of rolling a die. Let X be the number on the face.

• Compute E(X).

• Consider rolling a pair of dice. Let Y be the sum of the numbers.

Compute E(Y ).

(10)

Example 8 Consider Example 6. What is the average number of courses per student at NCSU?

5 Expectation of Function of a Random Variable

Suppose we are given a discrete random variable X along with its pmf and that we want to compute the expected value of some function of X, say g(X).

One approach is to directly determine the pmf of g(X).

Example 9 Let X denote a random variable that takes on the values

−1, 0, 1 with respective probabilities

P (X = −1) = .2, P (X = 0) = .5, P (X = 1) = .3 Compute E(X2).

(11)

Although the procedure we used in the previous example will always enable us to compute the expected value of g(X) from knowledge of the pmf of X, there is another way of thinking about E[g(X)]. Noting that g(X) will equal g(x) whenever X is equal to x, it seems reasonable that should just be a weighted average of the values g(x) with g(x) being weighted by the probability that X is equal to x.

Proposition 1 If X is a discrete random variable that takes on one of the values xi, i ≥ 1 with respective probabilities p(xi), then for any real valued function g, E[g(X)] = P

ig(xi)p(xi).

Applying the proposition to Example 3,

E(X2) = (−1)2(.2) + 02(.5) + 12(.3) = .5.

Proof of Proposition 1.

P

i

g(xi)p(xi) =P

j

P

i:g(xi)=yj

g(xi)p(xi)

= P

j

yj P

i:g(xi)=yj

p(xi)

= P

j

yjP {g(X) = yj}

= E[g(X)]

Corollary 1 (The Rule of expected value.) If a and b are constants, then E(aX + b) = aE(X) + b.

Proof of Corollary:

E(aX + b) = X

x

(ax + b) · p(x)

= aX

x

x · p(x) + bX

x

p(x)

(12)

Special cases of Corollary 1:

• E(aX) = aE(X).

• E(X + b) = E(X) + b.

Example 10 A computer store has purchased three computers of a certain type at $500 apiece. It will sell them for $1000 apiece. The manufacturer has agreed to repurchase any computers still unsold after a certain period at

$200 apiece. Let X denote the number of computers sold, and suppose that P (X = 0) = 0.1, P (X = 1) = 0.2, P (X = 2) = 0.3 and P (X = 3) = 0.4.

Let h(X) denote the profit associated with selling X units. What is the expected profit?

(13)

6 Variance

Another useful summary of a random variable’s pmf besides its central ten- dency is its “spread”. This is a very important concept in real life. For example, in the quality control of the lifetimes of a hard disk, we not only want the lifetime of a hard disk is long, but also want the lifetimes not to be too variable. Another example is in finance where investors not only want the investments with good returns (i.e., have a high expected value) but also want the investment not to be too risky (i.e., have a low spread).

A commonly used measure of spread is the variance of a random variable, which is the expected squared deviation of the random variable from its expected value. Specifically, let X have pmf p(x) and expected value µ, then the variance of X, denoted by V (X), or just σX2 , is

V (X) = E[(X − µ)2]

= X

D

(x − µ)2 · p(x).

The second equality holds by applying Proposition 1.

Explanations and intuitions for variance:

• (X − µ)2 is the squared deviation of X from its mean

• The variance is the weighted average of squared deviations, where the weights are probabilities from the distribution.

• If most values of x is close to µ, then σ2 would be relatively small.

• If most values of x is far away from µ, then σ2 would be relatively large.

Definition: the standard deviation (SD) of X is σX = p

V (X) = q

σX2 .

(14)

Consider the following situations:

• The following three random variables have expected value 0 but very different spreads:

– X = 0 with probability 1

– Y = −1 with probability of 0.5, 1 with probability 0.5.

– Z = −100 with probability 0.5, 100 with probability 0.5.

Compare V (X), V (Y ) and V (Z).

• Suppose that the rate of return on stock A takes on the values of 30%, 10% and −10% with respective probabilities 0.25, 0.50 and 0.25 and on stock B the values of 50%, 10% and −30% with the same probabilities 0.25, 0.50 and 0.25. Each stock then has the expected rate of return of 10%. Obviously stock A has less spread in its rate of return. Compare V (A) and V (B).

(15)

An alternative formula for variance. V (X) = E(X2) − [E(X)]2. Proof. Let E(X) = µ. Then

V (X) = E[(X − µ)2]

= X

x

(x − µ)2p(x)

= X

x

(x2 − 2µx + µ2)p(x)

= X

x

x2p(x) − 2µX

x

xp(x) + µ2X

x

p(x)

= E(X2) − 2µ2 + µ2

= E(X2) − µ2

= E(X2) − [E(X)]2.

The variance of a linear function. Let a, b be two constants, then V (aX + b) = a2 · V (X).

Proof. Note that from Corollary 1, we have

E(aX + b) = aE(X) + b.

Let E(X) = µ. Then

V (aX + b) = E[{(aX + b) − E(aX + b)}2]

= E[(aX + b − aµ − b)]2

= E[a2(X − µ)2]

= a2[E(X − µ)2]

= a2V (X)

(16)

Example 11 Let X denote the number of computers sold, and suppose that the pmf of X is

P (X = 0) = 0.1, P (X = 1) = 0.2, P (X = 2) = 0.3, P (X = 3) = 0.4.

The profit is a function of the number of computers sold:

h(X) = 800X − 900.

What are the variance and SD of the profit h(X)?

References

Related documents

If the anthropological background of cultural values were considered to be relevant in forming legal order, it would be hard to establish the supremacy of constitutional values

This paper studies the problem of Stackelberg game-theoretic LPI performance optimization in multistatic radar system, whose purpose is to minimize the radiated power of each radar

Figure 6: Instantaneous wall angle vs base angle for 50° angled channel with no airflow In contrast to the lower airflow rates for which the droplet filled the corner, for

AGEING AND SENIOR LIFE IN A LIFE COURSE PERSPECTIVE.. Course and neurobiology of neuropsychiatric symptoms in Parkinson's disease; and Intervention to prevent use of restraint

Even for those who choose not to incorporate social or economic justice ideals into their clinical program, practice ready can, and should, mean preparing law students to

First, we study social and economic impact at the household level, classifying households according to the needs for care of older residents at the time of the baseline and

Speaking a Java idiom, methods are synchronized, that is each method of the same object is executed in mutual exclusion, and method invocations are asynchronous, that is the

The scales utilized in this were the following: 1-Disagree (The flood prevention and mitigation initiatives are not implemented), 2-Slightly Agree (The flood prevention