An Introduction to Probability

(1)

4. An Introduction to Probability

In the first scene of Tom Stoppard’s play Rosenkrantz and Guildenstern are Dead, the principal characters are on the road to Elsinore where they have been summoned by the King to attend to his son Hamlet whose mind, it seems, has come undone. It is several days’ journey, and they while away the time playing “pitchpenney” – flipping coins. Rosenkrantz has just tossed “heads”

for the 48^th time in a row, and continues to do so throughout the scene. This turn of events is making both characters profoundly uneasy. They’re right about that: the fix is in. Stoppard or Shakespeare, they will not escape their fate. If you were playing this game, you might be uneasy too. One doesn’t often get to see a law of nature come apart before one’s eyes.¹

Why do we consider these events so very unlikely? After all, a flipped coin could come up “heads” 48 times in a row; no physical law of nature prevents it. And yet, most of us would consider this improbable in the extreme. This chapter is about probability, the branch of mathematics that seeks to understand and quantify such intuitive concepts as “randomness” and “likelihood,”

including questions like the one just posed. After an initial discussion of what

“likelihood” and “probability” might mean, we will develop the mathematical theory that clarifies the concepts of probability and allows us to make precise quantitative predictions.

As a first step into these issues, let us return to the coin-flipping game.

When a fair coin is tossed, we all have some intuitive idea what will happen, embodied in statements like

• There is a 50% chance that the coin will come up “heads.”

• “Heads” will turn up “with probability 0.50.”

1Here’s what a mathematician would say about it. The probability that a flipped coin will turn up “heads” 48 times in a row is p = 3.553 × 10⁻¹⁵: we might expect this to happen about 3 times in a quadrillion (10¹⁵) runs of a game in which a coin is flipped 48 times. If we could execute one of these 48-flip games in a minute, that means we would expect to wait something like 500 million years before observing an “all heads” outcome, although it is possible for this to happen the first time we play the game.

413

(2)

In truth, statements like this attempt to predict the future, but in an elusive sort of way. They certainly do not say that “heads” will come up every other time in a series of coin tosses, or that exactly 5 out of 10 tosses will result in

“heads.” In fact, if we toss a single coin just once and never play the game again, they tell us nothing. To illustrate, suppose I offered to make a bet with you – my $5 against your $5 – on the outcome of a single coin toss; what would you bet on, “heads” or “tails”? Your intuition insists there is something to the statements made above, but the information they embody cannot be applied to a solitary event. The Laws of Probability begin to make themselves felt only when we consider a large number of repeatable experiments. If we were to repeatedly flip a coin – say a million times – then the fraction of the trials in which “heads” appeared would be very nearly equal to ¹₂,

(1) (Fraction, heads) = #(heads)

#(trials) =#(heads)

N ≈ 0.5000

as the number of trials N becomes very large. That is how “probabilistic”

statements are interpreted. They tell us what must happen in the long run, as we conduct more and more trials.

Some may object that this explanation of what “probability ¹₂” means begs the question, and just shoves the conceptual difficulty from one place to another. What is meant by “in the long run”? Will (fraction, heads) get really close to ¹₂ in 100 trials? In a thousand? In a million? How long does it take for the expected long run behavior to emerge? Real world experiments and extensive computer simulations show it quite unlikely that there will be a lopsided proportion of “heads” in 1000 trials, and that it is beyond plausibility to find a lopsided result in a million trials. For instance, around 1850 the French mathematician Buffon reported the results of his own experiment involving N = 4040 coin tosses; around 1915 the British probabilist Karl Pearson assembled a larger data set that eventually encompassed 24,000 trials. The outcomes are shown in Table 1. ¹

Number N Heads Ratio Deviation from of Tosses #(Heads)/N Expected Value ¹₂

Buffon 4040 0.5080 0.0080

K. Pearson 12000 0.5016 0.0016

K. Pearson 24000 0.5005 0.0005

Table 4.1.

And yet, it is always possible that 48 or 1000 consecutive tosses will come up

“heads,” or that ³₄ of the tosses will, etc. The Laws of Chance differ in a fundamental way from the laws of physics. They can tell us that something is virtually certain to happen, but can never exclude the possibility of a freak occurrence at odds with their “long run” predictions.

1You might wonder what possessed Pearson to conduct such a tedious experiment. In fact, he was a prisoner of war (World War I) and had a lot of time on his hands.

(3)

Nevertheless, many enterprises are based on mathematical predictions of what will happen in the long run, relying on extensive real world experience to decide how many trials suffice to produce results in accord with the “long run” predictions. For example, life insurance companies rely on mortality statistics – medical data that yield numerical probabilities such as this one:

The probability that a healthy 30-year old female U.S. citizen will live to age 75 or more is 0.521 .

The company may have thousands of female customers in their early 30’s. The course of each person’s life is regarded as a separate “trial,” just as each coin- flip constituted a “trial” in the previous commentary. And a cohort of several thousand individuals does seem large enough that insurance companies con- fidently base their rates on the presumption that the fraction of survivors at age 75 in this cohort will be very close to 52%. Despite misgivings insurance executives may have about the possibility of freak occurrences, insurance companies have done quite well for themselves over the centuries since the concept of probability-based insurance was introduced.

The laws of probability seem to operate quite forcefully in the real world.

The complex enterprises of our present civilization – manufacturing, science, medicine, finance, government – require a deep mathematical understanding of probability, the foundation of all statistical analyses. They could hardly exist without it.

Randomness and probability are slippery subjects. For one thing, once you start speaking about what will happen “in the long run” you are edging your way toward a confrontation with the notion of infinity – no one can conduct an infinite series of trails to see if the idea in (1) holds up. For another, our intuition about likelihood or probability is sometimes confused, or just plain wrong. The mathematical theory of probability and random processes, developed over the span of two centuries, reflects a lot of thought about these issues (including the meaning of infinity), and provides clear, computable answers to most questions.

4.1 Counting Problems and Probabilities

The mathematical theory of probability is based on serious counting. To apply it one must consider real world situations that are repeatable – for example: toss a coin 10 times, pull five cards from a shuffled deck, or throw a pair of dice. Then we identify all the possible outcomes, and count them. In any experiment certain outcomes are regarded as successful – for instance if a coin is flipped 10 times a successful outcome might be one in which #(heads) is greater than #(tails); once the meaning of “success” has been specified, we count the successful outcomes too. The mathematical theory interprets the fraction of successful outcomes

Pr(success) = (fraction successful outcomes)

= #(successful outcomes)

#(possible outcomes) (2)

(4)

as the mathematical probability of finding a successful outcome when the situation is allowed to play itself out repeatedly. Note that (2) always yields a fraction whose value lies between 0 and 1. If one of your probability calculations yields a result like Pr(success) = 2.14, you know you made a mistake.

At this point the mathematical probability Pr(success) is just some number obtained by counting, without reference to anything that might actually happen in real world experiments. That connection is established through the following frequency interpretation of mathematical probabilities.

Frequency Interpretation of Probabilities. ^The

mathematical probability Pr(success) computed in (2) has the following interpretation. If an experiment is repeated many times, we expect that the observed frequency of successful outcomes

(Success Frequency) = #(sucesses in N trials) N

will approach the limiting value Pr(success) as N becomes very large (3) Pr(success) = lim

N→∞

#(sucesses in N trials) N

The left side of (3) is a purely mathematical construct, while the right side is concerned with the outcomes of real experiments.

To put it differently, once the number N of trials is very large we expect to find

(4)

#(sucesses in N trials)

N ≈ Pr(success)

or

#(sucesses in N trials) ≈ N × Pr(success)

Example 4.1. Later on we will show how to compute the mathematical probability of drawing exactly three Aces when we are dealt five cards from a shuffled deck of 52 cards,

Pr(#Aces = 3) = 0.001736 What does this mean in practical terms?

(a) How many times would you expect to see this outcome in 5000 dealt hands? In 10,000 hands?

(b) Roughly how many hands of cards would you expect to play before seeing a hand containing exactly three Aces?

(5)

Solution: Given the computed probability Pr(#Aces = 3), equation (4) tells us that in 5000 trials we would expect to see something like

N × Pr(#Aces = 3) = 5000(0.001736) ≈ 8.68

such hands. In 10,000 trials we would expect to see twice as many, somewhere between 16 and 20.

The question in (b) is essentially the reverse of that in (a). Now we use (4) to solve for the value of N such that

1 = #(successes in N trials) ≈ N · Pr(#Aces = 3) Obviously that happens when

N ≈ 1

Pr(#Aces = 3) = 1

0.001736 = 576.04

so we might expect to receive between 550 and 600 hands before seeing our first hand with exactly three aces.

Of course it is always possible that the very first hand you are dealt contains three aces, or that such a hand will not be seen in the first 2000 deals.

Example 4.2. (Probability and Sampling). ^Suppose

you have a large jar containing 5000 plastic beads, 800 of which are red and the rest black.

(a) What is the probability Pr(red) that you will draw a red bead if you reach into the jar and pick one bead at random? What is the probability Pr(black) of picking a black bead?

(b) Consider an experiment in which you pick a bead at random, read its color, and toss it back into the jar before the next drawing. If you repeat this experiment 10,000 times, what fraction of the time do you expect to get a red bead?

(c) If you select 150 beads one-by-one, as in (b), how many red beads would you expect to find in this sample?

What has all this to do with the Basic Sampling Principle discussed in Chapter 1?

Solution: On one try you might pick any of the 5000 beads, therefore

#(outcomes) = 5000. Of these 800 are successes, so Pr(red) = #(successes)

#(outcomes) = 800

5000 = 0.160 (a 16% chance of drawing red). Similarly,

Pr(black) = 0.840

Notice that Pr(red) + Pr(black) = 1. We will have more to say about this “addition law” later on.

In (b) we apply the frequency interpretations (3) and (4) to see that the number of red outcomes in N = 10, 000 trials should be about

N × Pr(red) = 10, 000(0.160) = 1600

(6)

We expect to draw a red about 16% of the time in the long run.

In (c) we expect

#(red out of N = 150) ≈ N × Pr(red)

= 150(0.160) = 24 beads

On the other hand, the Basic Sampling Principle of Chapter 1 says that if we successively examine N beads using the protocol in (b), the fraction of these that are red

„ sample fraction red

«

= #(red in N tries) N should be approximately equal to

„ population fraction red

«

= #(red in population) 5000

= 800

5000 = Pr(red)

The frequency interpretation (4) asserts that these are nearly the same once N becomes sufficiently large. (Here the sample size is fairly large, N = 150). Thus the Sampling Principle is closely tied to the frequency interpretation of probabilities.

We illustrate the calculation of mathematical probabilities by analyzing a simple game of chance, a favorite of sidewalk hustlers. The frequency interpretation (3) yields a strategy for playing (or not playing) the game.

Example 4.3. (The Hustler’s Game). A die (singular of dice) is a small cube with spots marked on its faces to represent the integers {1, 2, . . . , 6}. The game consists of throwing a pair of dice, which we shall regard as the same as throwing a single die twice in succession. The hustler offers you a wager: if the sum of the face values is 7 or 11, he will pay you $3; otherwise you pay him $1. Should you play the game?

Discussion: We examine two questions:

• What is the “probability” of a successful (winning) outcome?

• If you played the game repeatedly, how often would you win?

Would you make or lose money in the long run?

To answer the first we have to describe and count all the possible outcomes of the game, and then count the number of successful outcomes among them. The diagram below shows all the possibilities, listed as pairs of numbers (m, n) where the first entry is the face value of die #1, and the second is the face value of die #2. The successful outcomes are marked by “boxes,” e.g. 5 + 2 = 7 and 6 + 5 = 11. It is evident that there are 36 possible outcomes. Note that we regard (1,4) as a different outcome from (4,1); this is a subtle point we shall examine later.

(7)

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

By counting boxes we see that there are 8 successful outcomes, so the “probability”of a “successful” outcome should be assigned in the following way

Pr(success) = #(successful outcomes)

#(possible outcomes)

= 8

36

= 0.2222

Our interpretation of this is: If we played this game many times, we would expect to see a successful outcome roughly

8 times out of 36 plays,

which is the same as saying we expect a successful outcome 22.22 times out of 100, or 222.2 times out of 1000, etc. That answers the question about probabilities. You can see that listing and counting outcomes is the heart of the matter.

What about the wager? One might analyze that as follows. In the long run you will win $3 in 8 out of 36 plays; in the other 28 plays you lose $1. Thus, on the average, in every 36 plays you will win 8×$3 = $24 and lose 28 × $1 = $28. The net result is a consistent loss of about $4 in every thirty six rounds of the game, or about $0.11 = ₃₆⁴ per game.

That shouldn’t surprise you; hustlers are out to make money.

Here’s something you might ask yourself about this game.

Example 4.4. How much would the hustler have to offer you to make this a fair game, one in which you would break even in the long run?

Solution: It’s not hard to answer this by re-examining what we did above. Suppose the amount the hustler puts up is $A; you still put up

$1. Then in the long run you win $A in 8 out of 36 games and lose $1 in the other 28. Averaged over many games, your net winnings in every 36 games will be about 8(A) − 28(1) dollars. You will neither win nor lose money in the long run if 8A − 28 = 0 or A = 28/8 = $3.50.

(8)

A Discussion Problem

A deck of 52 playing card contains four aces. If I stand in front of a class with a shuffled deck in hand, I might pose the following questions to illustrate the counting aspects of probability as well as the possible meanings of an

“outcome.”

• If I turn up the top card, what is the probability that card will be the ace of hearts A♥?

• If I pull a card from the middle of the deck, what is the probability that card will be the ace of hearts A♥?

• If I pull a card from the middle of the deck, what is the probability that card will be the queen of diamonds Q♦?

• If I pull a card from anywhere in the deck, what is the probability that card will be one of the four aces?

The answer in the first three cases is Pr = 1/52 = 0.01923. There are 52 possible card choices, only one of which is deemed a success. In the last case four of the possible choices are “successes,” so Pr(an ace) = 4/52 = 0.07692.

It doesn’t matter where in the deck the card came from.

• I turn over to top card, showing it to the class. It is the 10 of clubs 10♣. If I set it aside and then reveal the top card from the remaining deck, what is the probability it will be the A♥? What is the probability it will be one of the 13 hearts A♥, K♥, Q♥, J♥, 10♥, . . . , 3♥, 2♥? What is the probability it will be a club?

Answer: Pr(A♥) = 1/51 = 0.019608, because there are 51 cards in the remaining deck just one of which is the A♥. Since the revealed card was not a heart, there are 13 hearts in the remaining deck of 51 cards, so Pr(any heart)

= 13/51 = 0.25490. In the last case, there are only 12 clubs in the remaining deck, so Pr(any club) = 12/51 = 0.23529.

• This time I take the top card from the deck without re- vealing its face value, and set it aside on the table. Then I turn up the top card from the remaining deck. What is the probability that the revealed card is the A♥?

A common response is that Pr(A♥) = 1/51, but is it? Others claim that Pr = 1/52 as with the unaltered deck. You decide. ¹

Counting Techniques, Induction Arguments, and the Divide and Conquer Process

We now turn to a systematic development of the ideas sketched above. Since counting is the basis of everything, we begin by examining some basic counting

1Here’s a suggestion. Suppose I put the top card on the table and place the remaining deck right next to it on the table top. Would that change Pr(A♥)? Suppose I set aside the top card and place the remaining deck on top of it. Would that change the probability?

(9)

problems. Each is simple enough by itself, but taking them all together it’s not so easy to recognize the underlying pattern. In the rest of Section 4.1 we will show that most counting problems reduce to a small number of prototypes.

Then in Section 4.2 we will see how these counting problems become the basis of all probability calculations. We start with some notation common to all discussions of probability and statistics.

Notation

The following notation will be used throughout this chapter. If n is a nonnegative integer we make the following definitions.

1. The “n factorial” symbol n! is defined to be the product of all integers beginning with 1 and ending with n:

(5) n! = (n)(n − 1)(n − 2) · . . . · (3) · (2) · (1)

On a sharp calculator the keystrokes n 2ndF 4 = produce n! Note the fine print “n!” above the 4 key.

Thus, for example, 5! = (5)(4)(3)(2)(1) = 120. In some formulas it is very handy to allow n = 0. Our formula (5) doesn’t make sense then, so 0! must be defined by decree:

0! = 1 (by definition)

All calculators are equipped to evaluate n! if n is not too large, but the values of n! increase in size very rapidly. Mine will just manage 69! ; a call for 70!

produces an error message due to “overflow.” The keystrokes for n = 12 are

n!

12 2ndF 4 =

and the outcome is 12! = 479, 001, 600.

Using the notation n! we define the following symbol that combines two integers n and k with 0 ≤ k ≤ n:

2. The “n step k symbol” nPk is defined to be the product of the first k integers in descending order, starting from n:

(6) nPk = (n)(n − 1) · · · (n − k + 1) = n!

(n − k)!

This expression appears quite often in counting permutations, and we will see a lot of it in this chapter. The sharp calculator produces nPk via the keystrokes n 2ndF 6 k = . Note that the fine print above the 6 key reads “nPr”.

(10)

As an example, the product of the first 15 integers descending from n = 100 is 100P15= (100)(99) · · · (87)(86) = 3.3128 × 10²⁹. Do you understand why the 15^th term in this product is 86 and not 85?

Finally we define a symbol that pervades all discussions of probability.

3. The “n choose k symbol” (ⁿ_k)is defined to be:

(7) n

k

!

= n!

k!(n − k)!

In some accounts this symbol is written nCk instead of (ⁿ_k), but we will always use (ⁿ_k) in these notes. The keystrokes for finding nCr on the calculator are n 2nd F 5 k = . Note the “nCr” above the 5 key.

These symbols are collectively known as “binomial coefficients.” Generally you will find (ⁿk) using the calculator, but sometimes in algebraic computations you must deal with definition (7). Then it is useful to note that there is an enormous amount of cancellation between terms, as in the following example:

5 3

= 5!

3! 2!= (5)(4)(3)(2)(1)////////

(3)(2)(1)

//////// · (2)(1) = 5 · 4 2 · 1= 20

2 = 10

Some calculators take advantage of this to avoid dividing two humongous numbers, which might overload the memory. Nevertheless, the original definition (7) reflects the way these numbers arise in practice. It is also useful in algebra. For instance, a basic symmetry property of binomial coefficients

n k

=

n n − k

is obvious from (7), but less so if we compare these numbers after doing the cancellations.

Example 4.5. Can your calculator find the value of(¹⁰⁰⁰₉₉₈)? Mine crashes. Can you instead find the value of this binomial coefficient using cancellation?

Solution: By hand, 1000

998

!

= (1000)(999)(998)/ . . . (2)/ (1)/

(2)(1) · (998)/ . . . (2)/ (1)/ =999, 000

2 = 499, 500 Later on we will explain why ⁿk

is referred to as the “n choose k symbol.”

For the moment you only need to know how the symbols nPk and nCk = (ⁿk) are defined, and be able to calculate them.

The symbols n!, nPk, and (ⁿk) = nCk are standard in mathematics; the next bit of shorthand notation is also common, and will be used systematically

(11)

in this chapter. If (· · · ) indicates some set of objects, the symbol #(· · · ) stands for the number of objects in this set. Thus,

#(even integers between 1 and 9)

= #{2, 4, 6, 8} = 4 . That’s it for notation.

Mathematical Induction: “Bootstrap Logic”

As we will see, many counting methods are based on “inductive arguments”

in which we use what we have learned at level n to determine what must happen at the next level n + 1, pulling ourselves up by our bootstraps from one level to the next. Here we provide an informal introduction to inductive arguments, which will turn up repeatedly throughout this Section. Let’s start with a simple counting problem.

How many five-letter words are there?

This problem is the prototype for a host of related questions. For example, once we see how to count five-letter words little extra effort is required to count ten-letter words, etc.

We shall view this word-counting problem as a game with the following rules. We start with an “empty list”: five empty slots

1 2 3 4 5

To form a “code word” we may use any letter of the alphabet {A, B, C, · · · , Z}

in filling each slot. The same letter may be used more than once; the resulting five-letter “word” need not make sense in English, or even be pronounceable.

Thus, the following “words” are allowed under our rules

water mxpcz tttut

The order in which letters appear is important, so the following symbol strings represent different words

water artew

even though they contain the same letters.

Instead of attacking the five-letter word problem directly, we start with the simplest version of this problem and work our way up through words of increasing length.

1. How many one-letter words ∗ are there?

That’s easy. Twenty six letters, one slot to fill: there are 26 possible “words”

of length one.

2. How many two-letter words ∗ ∗ are there?

(12)

There are 26 ways to fill the first slot. Suppose we have placed an “A” there, so our partially filled list looks like this:

a

There are 26 ways to fill the second slot, so we get 26 words beginning with

“A.” But, obviously, there are also 26 words beginning with “B,” etc. Thus we can employ a “divide and conquer” strategy to count all two-letter words by dividing them into 26 groups:

Group 1: a ∗ 26 words in this group Group 2: b ∗ 26 words in this group

... ... ...

Group 26: z ∗ 26 words in this group











26 groups

There are 26 groups in all, with 26 words in each group, so the total number of two-letter words is 26 × 26 = 26²= 676.

3. How many three-letter words are there?

We have to fill three slots . But we have just counted all the ways to fill the last two slots: there are 26²= 676 ways to do that. So, we apply the divide-and-conquer principle again, breaking the set of words into groups according to their first letter. For instance, if the word has the general form

a ∗ ∗ , there are 26²ways to fill in the last two spaces, and thus There are 26² words of the form a ∗ ∗

However, the same is true for any choice of the first letter: we get 26² words starting with that letter. Thus we can group all three-letter words as follows:

a ∗ ∗ 26² words like this b ∗ ∗ 26² words like this

... ...

z ∗ ∗ 26² words like this











26 groups .

Once we know how many groups there are (26), and how many words in each group (26²= 676), we know how many words there are in all:

#(3-letter words) = #(groups) × #(words per group)

= 26 × 26²

= 26³

= 17, 576

(13)

Figure 4.1

The general counting problem. There are k slots to fill; entries for each slot are selected from different bags containing markers. The number of markers, or their type, may vary from one bag to another.

A pattern is beginning to emerge. It’s not hard to apply this reasoning to 4-letter words, 5-letter words, etc. to see that

Word Length Number of Words

1 26¹ = 26

2 26² = 676

3 26³ = 17, 576

4 26⁴ = 456, 976 ... ...

k 26^k

... ...

At each step we figure out what happens using the answer in the previous step. Once such an “inductive” process gets started, there is no stopping it;

the formula

#(words of length k) = 26^k

is valid for all integers k = 1, 2, 3, . . . In our original problem, the word length was k = 5 so

#(5-letter words) = 26⁵= 11, 881, 376 .

We have not only solved that problem, and many others as well. Our solution may seem roundabout, but this step-by-step “inductive” approach, pulling ourselves up by our bootstraps at each step, does the job quite nicely.

The Basic Counting Principle

The same technique can be used to solve a more general counting problem, illustrated in Figure 4.1. There we have k slots to fill. Entries for each slot are drawn from bags containing markers. The new wrinkle to the game is that

(14)

each bag may contain a different number of objects:

N1 markers available to place in Slot 1 N2 markers available to place in Slot 2

... ...

N^k markers available to place in Slot k ,

as indicated in the figure. The bags may even contain different types of objects, say numbers in Bag #1, names in Bag #2, etc.

Question: How many different lists can be formed this way?

The inductive process used above again provides the answer. Take a look at the initial steps shown in Figure 4.2. Can you see for yourself that the pattern emerges as shown there? Do you understand why the divide-and-

Figure 4.2

Counting the solutions to the general fill-the-slots game. First we see what happens when there is just one slot (k = 1), then when there are two slots (k = 2), etc. If we know the answer for lists of a certain size, we can use this to find the answer for lists containing one more slot.

List Length Number of Possible Lists

One slot: N1

↑ Number of markers: N1

Two slots: N1· N2

↑ ↑

Number of markers: N1 N2

Three slots: N1· N2· N3

↑ ↑ ↑

Number of markers: N1 N2 N3

Four slots: N1· N2· N3· N4

↑ ↑ ↑ ↑

Number of markers: N1 N2 N3 N4

... ...

conquer strategy works, and yields the answers shown? We can illustrate by examining what happens next, when there are five slots. Just as with the word counting, we have

#(lists; 5 slots) = #(ways to fill last slot) × #(ways to fill first 4 slots) We already know that there are N1· N2· N3· N4ways to fill the first four slots (Figure 4.2), so we get

#(lists; 5 slots) = (N1· N2· N3· N4) × (N5)

= N1· N2· N3· N4· N5

(15)

Continuing this way we may count the lists no matter how many slots there are. We end up with the Basic Counting Principle stated below.

Basic Counting Principle. If k slots are to be filled using the protocol shown in Figure 4.1:

slot# : 1 2 3 k

· · ·

↑ ↑ ↑ ↑

# markers: N1 N2 N3 · · · Nk

then the number of distinct lists we can create is (8) #(lists; k slots) = N1· N2· N3· . . . · Nk

Here is a simple illustration.

Example 4.6. An old-style California license plate, circa 1964, looked like this:

NDG 854 ,

three letters followed by three numbers. How many such license plates could there be? How many automobiles could have been registered in California in 1964?

Solution: A license plate is a list with 6 consecutive entries, the first three being letters and the last three numerals 0, 1, 2, . . . , 9. Repeated letters or numerals are allowed. Here we have an example of the game described in the Basic Counting Principle. The number of choices for symbols in each slot is indicated below:

slot#: 1 2 3 4 5 6

↑ ↑ ↑ ↑ ↑ ↑

# choices: 26 26 26 10 10 10

The number of lists (license plates) we can create is therefore

#(license plates) = 26 × 26 × 26 × 10 × 10 × 10

= 17, 576, 000

There cannot be more registered automobiles than there are plate numbers, so this is an upper bound on the number of registered vehicles in 1964. The State Division of Motor Vehicles (DMV) in its infinite wis- dom forbade the use of certain embarrassing letter combinations in the first three slots. Each excluded combination knocked out 1000 possible plate numbers. You might enjoy making a list of all the three-letter prefixes that could offend blue-noses in the DMV. Then figure out how many possible plate numbers this would exclude, and use this to revise the estimate of the number of registered vehicles in 1964.

(16)

Our count of plates gives an estimate for the number of registered autos. You might want to know that California plate numbers are issued forever and are not reassigned even if the original auto is junked. When the State finally ran out of license plates to assign in the late 1960’s, they issued a new series of license plates of the form 8 5 4 n d g . How many new plates became available? This series was exhausted by 1980. What do you think they did then?

Making Lists: Describing Outcomes as Lists

A list is created by drawing entries from a fixed supply of markers or objects. In most of this section we will consider lists whose entries are drawn from a supply consisting of n distinct objects; the nature of the objects does not matter as long as they can be distinguished.¹ If the objects in the “alphabet” used to fill list entries are not already numerals, we can still strip the creation of lists to its essentials by thinking of the objects as markers labeled with the numerals 1, 2, 3, . . . , n:

¹ ² ³ ^{· · ·} ⁿ (n distinct numbered markers) To create a list we start with several blank slots

· · ·

slot: #1 #2 #3 #k ,

the length of the list being denoted by k. Fill the slots from left to right, selecting markers one-by-one and recording the number on the marker in the appropriate spot in the list. Here is a typical list of length k = 5 that might be drawn from a supply of n = 20 markers:

3 17 6 11 18 .

There are two basic ways to draw markers from the pool:

(i) select markers with replacement (repeats allowed) (ii) select markers without replacement (no repeats)

In selection with replacement we choose a marker, record its value, and throw the marker back into the pool before making the next choice; all markers remain available each time we select an entry. In creating lists without replacement we choose a marker, record its value, and set the marker aside;

it cannot be used again. Thus we can create two types of lists:

ordered lists, selected with replacement ordered lists, selected without replacement

1We will say more about the issue of “distinct objects” later on. For the moment we just give an example of “objects not distinct.” Suppose a jar contains 100 balls, 94 of them redand 6 black. Draw 5 different balls in succession and list their colors. Almost all lists produced in this game will read rrrrr; because the 94 red balls are indistinguishable the number of outcomes (color lists) in this game will be very small, far fewer than if the balls were distinguishable (say by having id numbers form 1 to 100 written on them). Different counting methods are needed to deal with indistinguishable objects.

(17)

We call them ordered lists because the order in which the entries appear makes a difference. The following lists (of length k = 4) have the same entries in different order

1 5 3 9 1 9 5 3

and are regarded as different “ordered lists.” We will discuss “unordered lists”

in a later paragraph

Our problem is to count how many lists we can make, selecting with (or without) replacement. Here are some examples. The Basic Counting Princi- ple (8) will be used repeatedly. In each example we must decide what type of list we are talking about, and count the possibilities.

Example 4.7. Why would it be hard to guess a valid Credit Card number? How many possible credit cards could be issued? If you wrote down a 16-digit number at random, what would be the probability that it is one of the ≈ 50 billion valid credit card numbers ever issued?

Solution: A typical credit card number

0 2 3 0 - 1 7 3 9 - 9 2 8 6 - 7 3 1 5

is a list of length 16 with entries selected from the set of ten numerals {0, 1, 2, . . . , 9}. Any numeral can be placed in the first slot; the same integers are available to fill the second slot, etc. Therefore these lists are filled by selection with replacement from the pool of ten numerals. The order of the entries obviously makes a difference. Thus we are dealing with ordered lists of length k = 16, selected with replacement from a pool of n = 10 objects.

We count the possibilities using the Basic Counting Principle (8).

The situation is shown below:

slot# : 1 2 3 4 · · · 15 16

↑ ↑ ↑ ↑ ↑ ↑

#choices: 10 10 10 10 10 10

We have 16 slots and N1 = N2 = · · · = N16 = 10 possible entries in each slot, so the number of possible lists is

#(lists) = 10 × 10 × . . . × 10

| {z }

16 copies

= 1 × 10¹⁶= 10, 000 trillion card numbers . That’s a lot of potential card numbers. If you pick a number at random there are 10¹⁶possible outcomes. The successful outcomes are the roughly 50×10⁹cards that have been issued so far, so the probability of a successful guess is

Pr(success) = #(successes)

#(outcomes) = 50 × 10⁹

1 × 10¹⁶ = 5 × 10⁻⁶

By our frequency interpretation (4) you would have to make something like 200,000 attempts to come up with a valid card number. And even

(18)

then, you’d have no way to know it was a valid number.

Earlier we discussed ways of counting all the 5-letter words, or in fact words of any length.

#(words of length k) = 26^k for k = 1, 2, . . .

If we vary the problem, allowing either a letter A,...,Z or a numeral 0, 1, 2, . . . , 9 in each list entry, then 36 symbols would be available at each spot and the number of lists we could make with this enlarged “alphabet” would be 36^k.

These are all examples of ordered lists whose entries are selected with replacement because each of the letters in our alphabet of symbols is available at each spot in the list. Using the Basic Counting Principle in the same way we get a general formula for counting lists of this type.

Ordered Lists: Selected with Replacement. ^{If we}

form ordered lists of length k, selecting entries with replacement from a pool of n distinct objects, then the number of lists we can form this way is

(9) #

ordered lists; with replacement length = k; object pool = n

= n^k

To see this, consider the number of ways each slot can be filled:

slot# : 1 2 3 k

· · ·

↑ ↑ ↑ ↑

# entries: n n n · · · n

By the Basic Counting Principle, the number of lists is

#(lists) = n × n × n × · · · × n

| {z }

k times

= n^k ,

as stated in (9).

We now examine a problem in which list entries are selected without replacement.

Example 4.8. Here we consider four-letter words created subject to certain restrictions.

(a) How many four-letter words have no repeated letters?

(b) How prevalent are the four-letter words without repeats? If all possible four-letter words were written on slips of paper and placed in a box and you pick one at random, what would be the probability of drawing a word without repeats?

(19)

(c) How many four-letter words do not contain the letter “Z”? How many contain at least one copy of this letter? What would be the probability of drawing a word containing at least one copy of the letter “Z”?

Solution: We are dealing with lists such that type: ordered

length: k = 4

object pool: n = 26 (the letters of the alphabet)

If words were formed without any restrictions, we have seen that there would be 26⁴ = 456, 976 of them. If repeats are forbidden, then once a letter has been selected to fill a slot it is no longer available to fill any later slot. Therefore, the four-letter words without repeats are precisely the ordered lists obtained by selecting letters without replacement.

#(four-letter words, no repeats)

= #

„ ordered lists selected without replacement;

length: k = 4, object pool: n = 26

«

To determing how many such lists there are, consider a blank list and apply the Basic Counting Principle. There are n = 26 ways to fill the first slot:

∗

↑

26 ways to select

At the second step we have only n − 1 = 25 letters left to choose from for the second spot.

∗ ∗

↑ ↑

26 25 ways to select

At the third step we have n − 2 = 24 letters left to choose from:

∗ ∗ ∗

↑ ↑ ↑

26 25 24 ways to select

Continuing one more step we see that the number of ways to fill each slot is

∗ ∗ ∗ ∗

↑ ↑ ↑ ↑

26 25 24 23 ways to select

By the Basic Counting Principle, the number of words having no repeated letters is

#(four-letter words, no repeats) = (26)(25)(24)(23) = 358, 800 . That answers (a).

(20)

The fraction of all words that are free of repeated letters is

Fraction = #(words, no repeats)

#(all four-letter words)

= 358, 800 456, 976

= 0.7852

This is precisely the probability of drawing a word without repeats, so there is a 78.5% chance of getting a repeat-free word; the other 21.5%

have at least one repeated letter.

If a word contains no copies of the letter “Z” its entries must be drawn from the reduced alphabet of 25 letters A, ... ,Y. There are 25⁴ such words. By default,

#(words, at least one Z) = 26⁴− 25⁴= 66, 351 The probability of drawing such a word is therefore

Pr(#Z ≥ 1) = #(successes)

#(outcomes) =26⁴− 25⁴

26⁴ = 0.14519

The same idea gives a general formula for counting lists whose entries are selected without replacement.

Ordered Lists: Selected without Replacement.

If we form ordered lists of length k, selecting entries without replacement from a pool of n distinct objects, the total number of lists we can form is

(10)

#

ordered; without replacement length = k; object pool = n

= (n)·(n− 1)·. . .·(n− k + 1) Notice that we must have k ≤ n, otherwise we would run out of markers before all list entries have been filled. Notice also that the count (10) is just the “n step k” symbol introduced in (6), and can be written as nPk .

Formula (10) also follows from the Basic Counting Principle. The number of objects available to fill each slot decreases steadily, as shown below:

slot# : 1 2 3 k

· · ·

↑ ↑ ↑ ↑

# choices: n n − 1 n − 2 · · · n − k + 1

Do you understand why the number of objects available to fill the last slot is n − k + 1, rather than n − k?

(21)

Remarks: Getting this right. If you don’t understand why this happens, you might first try it yourself with lists of length k = 5, and n = 10 markers;

how many objects are left to fill the fifth slot? One way to deal with general lists is to note that the following pattern is valid for each slot:

(slot #) + #(markers available for this slot) = n + 1

For example, at the first slot this sum is n + 1; at the second slot it is 2 + (n − 1) = n + 1; at the third slot it is 3 + (n − 2) = n + 1; etc. This pattern tells us what must happen at the last slot:

k + #(markers available for k^thslot) = n + 1 . Subtracting k from both sides of this equality, we get

#(markers available for k^thslot) = (n + 1) − k = n − k + 1

Once we have the diagram worked out, the Basic Counting Principle tells us that

#(lists) = (n)(n − 1)(n − 2) · · · (n − k + 1) ,

and this is the counting formula (10). It is useful to notice that we can rewrite (10) in terms of factorials:

#

„ ordered lists, no repeats

«

= (n)(n − 1)(n − 2) · . . . · (n − k + 1) = n!

(n − k)!

You can verify that the two expressions on the right are equal by writing out the factorials and making the obvious cancellations:

n!

(n − k)! = (n)(n − 1) · · · (n − k + 1) · (n − k)///// · (n − k − 1)///////// · · · (3/) · (2/) · (1/) (n − k)///// · (n − k − 1)///////// · · · (3/ ) · (2/ ) · (1/ )

= (n)(n − 1) · · · (n − k + 1) = nPk .

As mentioned earlier, the expression n!/(n − k)! = nPk can be evaluated on most sharp calculators with the keystrokes

nPr

n 2ndF 6 k ⁼

We will use the calculator in the next example.

Example 4.9. A tennis tournament has 20 entrants. At the end prizes are awarded to the five top-ranking players. How many ways could the prizes be awarded?

Solution: A ranking of the top five players is an ordered list of k = 5 names selected from a pool of n = 20 players. List entries correspond to first prize, second prize, etc. so the order in which the names appear makes a big difference. Clearly names must be selected without replacement, since names cannot be repeated. Thus

#(rankings) = #

„ ordered lists selected without repeats;

length k = 5; object pool n = 20

«

= n!

(n − k)! =20!

15!

= (20)(19)(18)(17)(16)

= 1, 860, 480

(22)

In this example we are choosing 5 players as well as ranking them, so we must use ordered lists. If our objective was merely to pick 5 players in no particular order, then we would have to describe the possible outcomes using an entirely different kind of list.

Unordered Lists

An unordered list is a bunch of objects arranged in no particular order, selected with or without replacement from a supply of distinct items. If you are at a lunch counter and tell me what items you bought, you are presenting me with an unordered list; if you tell me the order in which those items were selected, then you have produced an ordered list. An ordered list provides extra information that is not available in an unordered list.

We will sometimes use braces {. . .} to indicate unordered lists. Thus, {a,l,p,h,a} (unordered list of 5 letters)

{3, 5, 7, 9, 11} (unordered list of 5 numerals)

{fries, big Mac, banana, soda, salad} (unordered lunch list)

signify unordered lists. The list {a,l,p,h,a} must have been selected with replacement (repeats allowed), since the letter “a” appears twice. We will continue to write ordered lists in the form a l p h a , using boxes to indicate the available slots.

Ordered vs Unordered Lists. An ordered list such as f o x “collapses”

to an unordered list {f,o,x} of the same length if we simply ignore the order of the items in it. (Think of stripping the contents out of the ordered list and dumping them in a paper bag, where they reside in no particular order.) Many ordered lists collapse to the same unordered list; the following diagram shows the six ordered lists that collapse to {f,o,x}

f o x o x f x f o

f x o o f x x o f

Are you convinced that no possibilities have been overlooked? We will return to the issue of counting unordered lists; right now we just want to provide an example of the “collapsing” process.

Whether a set of objects should be regarded as an ordered or unordered list depends on what you want to do with the information. For example, in many card games you are given five cards drawn from a deck of 52. The cards you get might be regarded in two ways, as either an ordered or an unordered list. To distinguish the two points of view we define:

• A deal: an ordered list of five cards. (We record the cards and the order in which we received them.)

• A hand: an unordered list of five cards. (Which five cards did we get?

Ignore the order in which the cards were received.)

(23)

In the game of poker, all you care about are the contents of your hand; it doesn’t matter whether the ace came in first or last. In contrast, blackjack players are quite concerned about the order of the cards, because their betting strategy depends on this information.

Once we understand the distinction between ordered and unordered lists, and the two ways we can choose their entries (with or without replacement), it is clear that there are four basic procedures for forming lists by drawing entries from a supply of distinct objects.

ordered list, selected with replacement

ordered list, selected without replacement

unordered list, selected with replacement

unordered list, selected without replacement

For each procedure we want to know how many lists of length k can be drawn from a supply of n distinct objects. The answers are shown in Figure 4.3. We have already dealt with ordered lists, in formulas (9) and (10). The count of unordered lists is more subtle, and will be explained after we give a few more examples. (In these examples we will use the formulas shown in Figure 4.3.)

Unordered lists whose entries are selected without repeats turn up all the time in probability, as in the card games just discussed. Here is another fa- miliar game whose outcomes are described by such lists.

Example 4.10. If you buy a Lotto ticket you get to pick 5 numbers from among the integers {1, 2, . . . , 54}.

(a) Should you regard a Lotto entry as a list that is (i) ordered or unordered?

(ii) selected with or without replacement?

(b) How many different Lotto tickets are there?

Solution: In Lotto, list entries must be chosen without replacement from among the integers {1, 2, 3, · · · , 54}; you don’t pick the same number twice. The order in which you write down your choices makes no difference to the managers of the lottery, so you may regard the outcome as an unordered list of length k = 5.

The formula for unordered lists given in Figure 4.3 says

#(unordered lists, selected without replacement)

= n

k

!

= n!

k!(n − k)!

List length is k = 5 and there are n = 54 objects to choose from, so the number of possible Lotto tickets is

#(entries) = 54 5

!

= 3, 162, 510