### Introduction to Probability

Given a fair coin, what can we expect to be the frequency of tails in a sequence of 10 coin tosses? Tossing a coin is an example of a chance experiment, namely a process which results in one and only one outcome from a set of mutually exclusive outcomes, where the outcomes cannot be predicted with certainty. A chance experiment can be real or conceptual.

Other examples of a chance experiment are: throwing a fair die 10 times and recording the number of times a prime number (namely 1, 2, 3 or 5) is obtained, or selecting 5 students at random and recording whether they are male or female, or randomly drawing a sample of voters from the U.S. population.

3.1 SAMPLE SPACES AND EVENTS

The most basic outcomes of a chance experiment are calledelementary outcomes or sample points. Any theory involves idealizations, and our ¯rst idealization concerns the elementary outcomes of an experiment. For example, when a coin is tossed, it does not necessarily fall head (H) or tail (T), for it can stand on its edge or roll away. Still we agree thatH andT are the only elementary outcomes.

Thesample spaceis the set of all elementary outcomes of a chance experiment. An outcome that can be decomposed into a set of elementary outcomes is called anevent. The simplest kind of sample spaces are the ones that are¯nite, that is, consist only of a ¯nite number of points. If the number of points is small, then these spaces are easy to visualize.

Example 3.1 Consider the chance experiment of tossing 3 coins or, equivalently, tossing the same coin 3 times. The sample space of this experiment is easily constructed by noticing that the ¯rst coin toss has two possible outcomes, H and T. Given the result of the ¯rst coin toss, the second also hasH andT as possible outcomes. Given the results of the ¯rst two coin tosses, the third also hasHandT as possible outcomes. Theoutcome treeof this experiment and its sample points are listed in Table 4. Taken together, these sample points comprise the sample space.

The event \at least 2 heads" consists of the following sample points

HHH; HHT; HT H; T HH:

Table 4 Outcome tree and sample space of the chance experiment of tossing 3 coins.

/ H HHH

H

/ _{n} T HHT

H

n / H HT H

T

n T HT T

/ H T HH

H

/ n T T HT

T

n / H T T H

T

n T T T T

Many important sample spaces are not ¯nite. Some of them contain countably many points, and some of them may even contain uncountably many points.

Example 3.2 Consider the chance experiment of tossing a coint until a head turns up. The points of this sample space are:

H; T; T H; T T; T T H; T T T; : : :

This sample space contains countably many points. 2

Example 3.3 Consider the chance experiment of picking a real number from the interval (0;1). This sample space contains uncountably many points. 2

3.2 RELATIONS AMONG EVENTS

LetS be a sample space, ean elementary outcome andE an event, that is, a set of elementary outcomes. Because the notions of elementary outcome and event are the same as those of point and point set in set theory, standard concepts and results from set theory also apply to probability theory.

Thus, _{;} denotes the impossible event, that is, the event that contains no sample
point. Given an eventE,Ec_{denotes the}_{complement}_{of}_{E}_{, that is, the event consisting}
of all points ofS that are not contained inE. Clearly,Sc_{=}_{;}_{and}_{;}c_{=}_{S}_{.}

Given two events A and B, we say that A is contained in B, written A _{µ} B, if
all points inA are also inB. In the language of probability, we say that \B occurs
wheneverA occurs". Clearly, for any event E, we have that _{; µ}E and E _{µ}S. We

say thatA andB areequal, written A=B, if A_{µ}B andB _{µ}A. We say thatA is
strictly contained inB, writtenA_{½}B, ifA_{µ}B but Ais not equal toB.

Given two eventsAandB, the eventA_{[}B(called theunionofAandB) corresponds
to the occurrence of either A or B, that is, it consists of all sample points that are
either inAor inB, or in both. Clearly,

A[B=B[A; Aµ(A[B); Bµ(A[B):

Given any eventE, we also have

E_{[}Ec_{=}_{S;} _{E}_{[}_{S} _{=}_{S;} _{E}_{[ ;}_{=}_{E:} _{(3.1)}
Given two events A and B, the eventA_{\}B (called theintersection ofA and B)
corresponds to the occurrence of bothAandB, that is, it consists of all sample points
that are in both A and B. When A_{\}B = _{;}, we say that the events A and B are
mutually exclusive, that is, they cannot occur at once. Clearly,

A\B=B\A; (A\B)µA; (A\B)µB:

Further

(A_{\}B)_{µ}(A_{[}B):

Given any eventE, we also have

E\Ec=;; E\S =E; E\ ;=;: (3.2) In fact, the relationship between (3.1) and (3.2) is a special case of the following results, known asde Morgan's laws. Given two eventsAandB

(A_{\}B)c =Ac_{[}Bc; (A_{[}B)c=Ac_{\}Bc:

De Morgan's laws show that complementation, union and intersection are not independent operations.

Given two events A and B, the event E =A¡B (called thedi®erence of A and

B) corresponds to all sample points inAthat are not inB. Clearly,A¡B=A\Bc_{.}
Notice thatA¡B andB¡A are di®erent events, that (A¡B)\(B¡A) =;and
that (A_{\}B)_{[}(A_{¡}B) =A.

Venn diagrams.

3.3 PROBABILITIES IN SIMPLE SAMPLE SPACES

Probabilities are just numbers assigned to events. These numbers have the same nature as lengths, areas and volumes in geometry. How are probability numbers assigned?

In the experiment of tossing a fair coin, whereS =_{f}H; T_{g}, we do not hesitate to
assign probability 1/2 to each of the two elementary outcomes H and T. From the
theoretical point of view this is merely a convention, which can however be justi¯ed
on the basis of actually tossing a fair coin a large number of times. In this case, the
probability 1/2 assigned to the event \Hoccurred" can be interpreted as the limiting

relative frequency of heads in the experiment of tossing a fair coinntimes asn_{! 1}.
The view of probabilities as the limit of relative frequencies is called the frequentist
interpretation of probabilities. This is not the only interpretation, however. Another
important one is the subjectivist interpretation, where probabilities are essentially
viewed as representing degrees of belief about the likelihood of an event.

A sample space consisting of a ¯nite number of points, where each point is equally probable, that is, receives the same probability, is calledsimple.

Example 3.4 The sample space corresponding to the chance experiment of tossing a fair coin 3 times is a simple sample space where each sample point receives the same

probability 1/8. 2

Given a simple sample spaceS, the probability of an eventE_{µ}S is
Pr(E) = number of sample points inE

total number of sample points:

Several important properties of probabilities follow immediately from this de¯nition:
(i) 0_{·}Pr(E)_{·}1;

(ii) Pr(S) = 1; (iii) Pr(;) = 0.

These three properties hold for general sample spaces as well.

Other properties are easy to understand using Venn diagrams. IfAµB, then Pr(A)·Pr(B):

IfE=A_{[}B, then

Pr(E) = sum of the probabilities of all sample points inA_{[}B

= Pr(A) + Pr(B)¡Pr(A\B) ·Pr(A) + Pr(B):

Clearly,

Pr(E) = Pr(A) + Pr(B)

if and only if Pr(A_{\}B) = 0, that is,AandB are mutually exclusive events.

For the complement Ec _{of} _{E}_{, since} _{E} _{[} _{E}c _{=} _{S} _{and} _{E} _{\} _{E}c _{=} _{;}_{, we have}
Pr(E) + Pr(Ec_{) = Pr(}_{S}_{) = 1 and so}

Pr(Ec) = 1_{¡}Pr(E):

Example 3.5 Consider the simple sample space corresponding to experiment of tossing a fair coin 3 times. The event \at least 2 heads" corresponds to the set of elementary outcomes

A=_{f}HHH; HHT; HT H; T HH_{g}:

Therefore, its probability is

The event \at least 1 tail" corresponds to the set of elementary outcomes

B =fHHT; HT H; HT T; T HH; T HT; T T H; T T Tg:

BecauseB is the complement of the event \no tails", its probability is
Pr(B) = 7=8 = 1_{¡}Pr(HHH):

The intersection ofAandB is the event

A\B=fHHT; HT H; T HHg;

whose probability is equal to 3=8. The probability of the union ofAandBis therefore equal to

Pr(A) + Pr(B)_{¡}Pr(A_{\}B) =1
2+

7 8¡

3 8= 1;

which ought not be surprising sinceA[B=S in this case. 2 3.4 COUNTING RULES

Calculations of probabilities for simple sample spaces is facilitated by a systematic use of a few counting rules.

3.4.1 MULTIPLICATION RULE

The experiment of tossing a fair coin twice has 4 possible outcomes: HH, HT, T H

andT T. This is an example of a chance experiment with the following characteristics: 1. The experiment is performed in 2 parts.

2. The ¯rst part hasn possible outcomes, sayx1; : : : ; xn. Regardless of which of these outcomes occurred, the second part has m possible outcomes, say

y1; : : : ; ym.

Each point of the sample spaceS is therefore a paire= (xi; yj), wherei= 1; : : : ; n andj= 1; : : : ; m, andS consists of themnpairs

(x1; y1) (x1; y2) ¢ ¢ ¢ (x1; ym) (x2; y1) (x2; y2) ¢ ¢ ¢ (x2; ym)

..

. ... ...

(xn; y1) (xn; y2) ¢ ¢ ¢ (xn; ym):

The generalization to the case of an experiment with more than 2 parts is
straightforward. Consider an experiment that is performed ink parts (k_{¸}2), where
thehth part of the experiment hasnh possible outcomes (h= 1; : : : ; k) and each of
the outcomes in any part of the experiment can occur regardless of which speci¯c
outcome occurred in any of the other parts. Then each sample point in S will be a

k-tuplee= (u1; : : : ; uk), whereuh is one of thenh possible outcomes in thehth part of the experiment. The total number of sample points inS is therefore equal to

Example 3.6 Suppose one can choose between 10 speaker types, 5 receivers and 3 CD players. The number of di®erent stereo systems that can be put together this way

is 10_{£}5_{£}3 = 150. 2

The next two subsections provide important examples of application of the multiplication rule.

3.4.2 SAMPLING WITH REPLACEMENT

Consider a chance experiment which consists of k repetitions of the same basic
experiment or trial. If each trial has the same number n of possible outcomes, then
the total number of sample points inS is equal tonk_{.}

Example 3.7 Consider tossing a coin 4 times. The total number of outcomes is

24_{= 16.} _{2}

Example 3.8 Consider a box containing 10 balls numbered 1;2; : : : ;10. Suppose that
we repeat 5 times the basic experiment of selecting one ball at random, recording its
number and then putting the ball back in the urn. Since the number of possible
outcomes in each trial is equal to 10, the total number of possible outcomes of the
experiment is equal to 105_{= 100}_{;}_{000. This experiment is an example of}_{sampling with}

replacement from a ¯nite population. 2

3.4.3 SAMPLING WITHOUT REPLACEMENT

Sampling without replacement corresponds to successive random draws, without replacement, of a single population unit. In the example of drawing balls from a box (Example 3.8), after a ball is selected, it is left out of the box.

Example 3.9 Consider a deck of 52 cards. If we select 3 cards in succession, then
there are 52 possible outcomes at the ¯rst selection, 51 at the second, and 50 at the
third. This is an example of sampling without replacement from a ¯nite population.
The total number of possible outcomes is therefore 52_{£}51_{£}50 = 132;600. 2
Ifkelements have to be selected from a set ofnelements, then the total number of
possible outcomes is

Pn;k =n(n¡1)(n¡2)¢ ¢ ¢(n¡k+ 1);

called the number ofpermutations ofnelements takenkat a time. Ifk=n, then the number of possible outcomes is the number of permutations of allnelements

Pn;n =n(n¡1)(n¡2)¢ ¢ ¢2¢1; calledn factorial and denoted by n!. By convention 0! = 1. Thus

Pn;k =

n(n_{¡}1)(n_{¡}2)_{¢ ¢ ¢}(n_{¡}k+ 1)(n_{¡}k)_{¢ ¢ ¢}2_{¢}1
(n¡k)(n¡k¡1)¢ ¢ ¢2¢1 =

n! (n¡k)!:

Example 3.10 Given a group of k people (2 _{¸} k _{¸} 365), what is the probability
that at least 2 people in the group have the same birthday? To simplify the problem,
assume that birthdays are unrelated (there are no twins) and that each of the 365
days of the year are equally likely to be the birthday of any person. The sample space

S then consists of 365k _{possible outcomes. The number of outcomes in}_{S} _{for which}
allk birthdays are di®erent isP365;k. Therefore, if E denotes the event \all kpeople
have di®erent birthdays", then

Pr(E) =P365;k 365k :

Because the event \at least 2 people have the same birthday" is just the complement ofE, we get

Pr(Ec) = 1¡P_{365}365_{k};k:

We denote this probability byp(k). The table below summarizes the value ofp(k) for di®erent values ofk:

k p(k) 5 .027 10 .117 20 .411 40 .891 60 .994

Notice that, in a class of 100 people, the event that at least 2 people have the same

birthday is almost certain. 2

3.4.4 COMBINATIONS

As a motivation, consider the following example.

Example 3.11 Consider combining 4 elementsa,b,cand d, taken 2 at a time. The total number of possible outcomes is equal to the permutation of 4 objects taken 2 at a time, namely

P4;2= 4¢3 = 12:

If the order of the elements of each pair is irrelevant, the table below shows that 6 di®erentcombinations are obtained:

12 permutations 6 combinations

a; b

a; c fa; bg

a; d

b; a _{f}a; c_{g}

b; c

b; d _{f}a; d_{g}

c; a

c; b _{f}b; c_{g}

c; d

d; a _{f}b; d_{g}

d; b

d; c _{f}c; d_{g}

2

Let Cn;k denote the number of di®erent combinations of n objects taken k at a time. To determineCn;k notice that the list ofPn;k permutations may be constructed as follows. First select a particular combination of k objects. Then notice that this particular combination can producek! permutations. Hence

Pn;k =Cn;k¢k!; from which we get

Cn;k=

Pn;k

k! =

n! (n¡k)!k!:

The numberCn;k is also calledbinomial coe±cient and denoted

Cn;k=

µ_{n}

k

¶

:

Clearly _{µ}

n k

¶

= n!

(n¡k)!k! =

µ

n n¡k

¶

:

Example 3.12 In Example 3.11,n= 4, k= 2 and soC4;2= 12=2 = 6. 2 Example 3.13 Given a hand of 5 cards, randomly drawn from a deck of 52, the probability of a \straight °ush" is

p= Pr(\straight °ush") = no. of di®erent straight °ushes no. of di®erent hands : The number of di®erent hands is equal to

µ_{52}

5

¶

= 52!

Because there are 10 straight °ushes for each suit, the total number of straight °ushes
is 10_{¢}4 = 40. Therefore, the desired probability is

p= 40

2;598;960 =:000015:

Not a high one! 2

When a set contains only elements of 2 distinct types, a binomial coe±cient may be used to represent the number of di®erent arrangements of all the elements in the set.

Example 3.14 Suppose thatkred balls andn¡kgreen balls are to be arranged in a row. Since the red balls occupykpositions, the number of di®erent arrangements of thenballs corresponds to the numberCn;k of combinations ofnobjects takenkat a

time. 2

Example 3.15 Given a hand of 5 cards, randomly drawn from a deck of 52, the probability of a \poker" is

p= Pr(\poker") = no. of di®erent pokers no. of di®erent hands;

where the denominator is the same as in Example 3.13. To compute the denominator, notice that 13 types of poker are possible: A, K, Q, . . . , 2, and that 5 cards can be divided in 2 groups, one of 4 and one of 1 cards, in

C5;4=

µ_{5}

4

¶

= 5! 1! 4!= 5

possible ways. Therefore, the number of possible pokers in one hand of 5 cards is
13_{¢}5 = 65 and so

p= 65

2;598;960 =:000025;

which is higher than the probability of a straight °ush. 2

3.5 CONDITIONAL PROBABILITIES

Suppose that we have a sample spaceS where probabilities have been assigned to all
events. If we know that the event B _{½}S occurred, then it seems intuitively obvious
that this ought to modify our assignment of probabilities to any other eventA_{½}S,
because the only sample points inAthat are now possible are the ones that are also
contained inB. This new probabilitiy assigned toAis called theconditional probability
of the eventAgiven that the eventBhas occurred, or simplythe conditional probability
of AgivenB, and denoted by Pr(AjB).

Example 3.16 Consider again the experiment of tossing a fair coin 3 times. Let A

= \at least oneT" andB = \H in the ¯rst trial". Clearly

If we know thatB occurred, then the relevant sample space becomes

S0=_{f}HHH; HHT; HT H; HT T_{g}:

Therefore

Pr(AjB) = 3 4=

3=8 1=2=

Pr(A_{\}B)
Pr(B) :

Notice that Pr(AjB)<Pr(A) in this case. 2 De¯nition 3.1 IfAandB are any two events, then the conditional probability ofA

givenB is

Pr(A_{j}B) =Pr(A\B)
Pr(B)

if Pr(B)>0, and Pr(A_{j}B) = 0 otherwise. 2

The conditional probability ofB givenA is similarly de¯ned as

Pr(BjA) =Pr(A\B) Pr(A) provided that Pr(B)>0.

The frequentist interpretation of conditional probabilities is as follows. If a chance experiment is repeated a large number of times, then the proportion of trials on which the eventB occurs is approximately equal to Pr(B), whereas the proportion of trials in which bothA andB occur is approximately equal to Pr(A\B). Therefore, among those trials in whichBoccurs, the proportion in whichAalso occurs is approximately equal to Pr(A\B)=Pr(B).

De¯nition 3.1 may be re-expressed as

Pr(A_{\}B) = Pr(A_{j}B) Pr(B): (3.3)
This result, called the multiplication law, provides a convenient way of ¯nding
Pr(A\B) whenever Pr(AjB) and Pr(B) are easy to ¯nd.

Example 3.17 Consider a hand of 2 cards randomly drawn from a deck of 52. Let

A= \second card is a king" andB = \¯rst card is an ace". Then Pr(B) = 4=52 and
Pr(A_{j}B) = 4=51. Hence

Pr(A_{\}B) = Pr(\ace and then king")
= Pr(A_{j}B) Pr(B) = 4

51 4

52 =:0060:

2 We now consider a useful application of the multiplication law (3.3). Notice that

A= (A\B)[(A\Bc);

where A_{\}B and A_{\}Bc _{are disjoint events because}_{B} _{and its complement} _{B}c _{are}
disjoint. Hence

where, by the multiplication law,

Pr(A\B) = Pr(AjB) Pr(B) and

Pr(A_{\}Bc) = Pr(A_{j}Bc) Pr(Bc):

Therefore

Pr(A) = Pr(AjB) Pr(B) + Pr(AjBc) Pr(Bc); (3.4) which is sometimes called thelaw of total probabilities.

Example 3.18 Consider a hand of 2 cards randomly drawn from a deck of 52. Let

A = \second card is a king" andB = \¯rst card is a king". We have Pr(B) = 4=52,
Pr(Bc_{) = 48}_{=}_{52 and}

Pr(A_{j}B) = 3=51; Pr(A_{j}Bc) = 4=51:

Hence, by the law of total probabilities

Pr(A) = 3 51¢

4 52+

4 51¢

48 52 =

4 52:

Thus Pr(A) and Pr(B) are the same. 2

3.6 STATISTICAL INDEPENDENCE

LetAandBbe two events with non-zero probability. If knowing thatBoccurred gives no information about whether or notAoccurred, then the probability assigned to A

should not be modi¯ed by the knowledge thatB occurred, that is, Pr(A_{j}B) = Pr(A).
Hence, by the multiplication law,

Pr(A_{\}B) = Pr(A) Pr(B):

We take this as our formal de¯nition of statistical independence.

De¯nition 3.2 Two eventsAandB are said to bestatistically independent if

Pr(A_{\}B) = Pr(A) Pr(B):

2
Notice that this de¯nition of independence is symmetric in A and B, and also
covers the case when Pr(A) = 0 or Pr(B) = 0. It is easy to show that ifA andB are
independent, thenAandBc _{as well as}_{A}c _{and}_{B}c _{are independent.}

It is clear from De¯nition 3.2 that mutually exclusive events cannot be independent. The concept of statistical independence is di®erent from other concepts of independence (logical, mathematical, political, etc.). When there is no ambiguity, the term independence will be taken to mean statistical independence.

Example 3.19 The sample space associated with the experiment of tossing a fair
coin twice is a simple sample space consisting of 22_{= 4 points. De¯ne the events}_{A}_{=}
\Hin the ¯rst toss" andB = \T in the second toss". BecauseA_{\}B=HT we have

Pr(A_{\}B) = 1
4=

1 2¢

1

2= Pr(A) Pr(B):

This result seems fairly intuitive, because the occurrence of H in the ¯rst coin toss has no relation to, and no in°uence on the occurrence of T in the second coin toss,

and viceversa. 2

It is natural to assume that events that are physically unrelated (such as successive coin tosses) are also statistically independent. However, physically related events may also satisfy the de¯nition of statistical independence.

Example 3.20 Consider the chance experiment consisting of throwing a fair die. The sample space of this experiment is the simple sample space:

1 2 3 4 5 6

LetA= \an even number is obtained" andB= \the number 1, 2, 3 or 4 is obtained". It is easy to verify that Pr(A) = 1=2 and Pr(B) = 2=3. Further

Pr(A\B) = Pr(\2 or 4") = 1=3 = Pr(A) Pr(B):

Hence,Aand B are independent even though their occurrence depends on the same

roll of a die. 2

3.7 BAYES LAW

Suppose that you want to determine whether a coin is fair (F) or unfair (U). You have no information on the coin, and so you are willing to believe thatF and U are equally likely, that is,

Pr(F) = Pr(U) = 1=2:

If the coin is fair, then

Pr(H_{j}F) = 1=2:

Further suppose that you know that, if the coin is unfair, thenH is more likely than

T, say

Assume that tossing the coin once gives youH. What is now the probability that the
coins is unfair? This is called theposterior probability of F givenH, and denoted by
Pr(F_{j}H). Intuitively, the occurrence ofH (the most likely event if the coin is unfair)
should modify your initial beliefs, leading you to view the event that the coin is fair
as less likely than initially thought, whereas the occurrence of T should lead you to
view the event that the coin is fair as more likely than initially thought.

One way of computing the posterior probabilities Pr(F_{j}H) and Pr(F_{j}T) is to draw
the outcome tree for this problem.

/ H Pr(H_{\}F) =:25

F

n T Pr(T_{\}F) =:25
/ H Pr(H_{\}U) =:45

U

n T Pr(T\U) =:05

It is then clear that the eventsU andF are mutually exclusive and that the eventH

is the union of the two disjoint eventsH_{\}F andH_{\}U. Hence
Pr(H) = Pr(H\F) + Pr(H\U) =:25 +:45 =:70:

Therefore

Pr(FjH) = Pr(H\H) Pr(H) =

:25

:70 =:357;

which is indeed less than the original assignement of probability to F, namely Pr(F) = 1=2. By a similar argument we have

Pr(FjT) = Pr(T \F) Pr(T) =

:25

:30 =:833:

We can also compute the posterior probability Pr(F_{j}H) without the need of a tree
diagram, by using the fact that

Pr(H_{\}F) = Pr(H_{j}F) Pr(F)
by the multiplication law, and

Pr(H) = Pr(H_{j}F) Pr(F) + Pr(H_{j}U) Pr(U)
by the law of total probabilities. Hence,

Pr(FjH) = Pr(HjF) Pr(F)

This formula is known asBayes law. For Pr(F_{j}T), Bayes law gives
Pr(FjT) = Pr(TjF) Pr(F)

Pr(T_{j}F) Pr(F) + Pr(T_{j}U) Pr(U);
where Pr(T_{j}F) = 1_{¡}Pr(H_{j}F) and Pr(T_{j}U) = 1_{¡}Pr(H_{j}U).

Notice that we can regard Pr(F) as our prior information about whether the coin is fair. Bayes law then gives us a way of updating this information in the light of the new information contained in the fact thatH was obtained.