Introduction to Probability

Full text


Introduction to Probability

Given a fair coin, what can we expect to be the frequency of tails in a sequence of 10 coin tosses? Tossing a coin is an example of a chance experiment, namely a process which results in one and only one outcome from a set of mutually exclusive outcomes, where the outcomes cannot be predicted with certainty. A chance experiment can be real or conceptual.

Other examples of a chance experiment are: throwing a fair die 10 times and recording the number of times a prime number (namely 1, 2, 3 or 5) is obtained, or selecting 5 students at random and recording whether they are male or female, or randomly drawing a sample of voters from the U.S. population.


The most basic outcomes of a chance experiment are calledelementary outcomes or sample points. Any theory involves idealizations, and our ¯rst idealization concerns the elementary outcomes of an experiment. For example, when a coin is tossed, it does not necessarily fall head (H) or tail (T), for it can stand on its edge or roll away. Still we agree thatH andT are the only elementary outcomes.

Thesample spaceis the set of all elementary outcomes of a chance experiment. An outcome that can be decomposed into a set of elementary outcomes is called anevent. The simplest kind of sample spaces are the ones that are¯nite, that is, consist only of a ¯nite number of points. If the number of points is small, then these spaces are easy to visualize.

Example 3.1 Consider the chance experiment of tossing 3 coins or, equivalently, tossing the same coin 3 times. The sample space of this experiment is easily constructed by noticing that the ¯rst coin toss has two possible outcomes, H and T. Given the result of the ¯rst coin toss, the second also hasH andT as possible outcomes. Given the results of the ¯rst two coin tosses, the third also hasHandT as possible outcomes. Theoutcome treeof this experiment and its sample points are listed in Table 4. Taken together, these sample points comprise the sample space.

The event \at least 2 heads" consists of the following sample points



Table 4 Outcome tree and sample space of the chance experiment of tossing 3 coins.



/ n T HHT


n / H HT H


n T HT T

/ H T HH


/ n T T HT


n / H T T H


n T T T T

Many important sample spaces are not ¯nite. Some of them contain countably many points, and some of them may even contain uncountably many points.

Example 3.2 Consider the chance experiment of tossing a coint until a head turns up. The points of this sample space are:

H; T; T H; T T; T T H; T T T; : : :

This sample space contains countably many points. 2

Example 3.3 Consider the chance experiment of picking a real number from the interval (0;1). This sample space contains uncountably many points. 2


LetS be a sample space, ean elementary outcome andE an event, that is, a set of elementary outcomes. Because the notions of elementary outcome and event are the same as those of point and point set in set theory, standard concepts and results from set theory also apply to probability theory.

Thus, ; denotes the impossible event, that is, the event that contains no sample point. Given an eventE,Ecdenotes thecomplementofE, that is, the event consisting of all points ofS that are not contained inE. Clearly,Sc=;and;c=S.

Given two events A and B, we say that A is contained in B, written A µ B, if all points inA are also inB. In the language of probability, we say that \B occurs wheneverA occurs". Clearly, for any event E, we have that ; µE and E µS. We


say thatA andB areequal, written A=B, if AµB andB µA. We say thatA is strictly contained inB, writtenA½B, ifAµB but Ais not equal toB.

Given two eventsAandB, the eventA[B(called theunionofAandB) corresponds to the occurrence of either A or B, that is, it consists of all sample points that are either inAor inB, or in both. Clearly,

A[B=B[A; Aµ(A[B); Bµ(A[B):

Given any eventE, we also have

E[Ec=S; E[S =S; E[ ;=E: (3.1) Given two events A and B, the eventA\B (called theintersection ofA and B) corresponds to the occurrence of bothAandB, that is, it consists of all sample points that are in both A and B. When A\B = ;, we say that the events A and B are mutually exclusive, that is, they cannot occur at once. Clearly,

A\B=B\A; (A\B)µA; (A\B)µB:



Given any eventE, we also have

E\Ec=;; E\S =E; E\ ;=;: (3.2) In fact, the relationship between (3.1) and (3.2) is a special case of the following results, known asde Morgan's laws. Given two eventsAandB

(A\B)c =Ac[Bc; (A[B)c=Ac\Bc:

De Morgan's laws show that complementation, union and intersection are not independent operations.

Given two events A and B, the event E =A¡B (called thedi®erence of A and

B) corresponds to all sample points inAthat are not inB. Clearly,A¡B=A\Bc. Notice thatA¡B andB¡A are di®erent events, that (A¡B)\(B¡A) =;and that (A\B)[(A¡B) =A.

Venn diagrams.


Probabilities are just numbers assigned to events. These numbers have the same nature as lengths, areas and volumes in geometry. How are probability numbers assigned?

In the experiment of tossing a fair coin, whereS =fH; Tg, we do not hesitate to assign probability 1/2 to each of the two elementary outcomes H and T. From the theoretical point of view this is merely a convention, which can however be justi¯ed on the basis of actually tossing a fair coin a large number of times. In this case, the probability 1/2 assigned to the event \Hoccurred" can be interpreted as the limiting


relative frequency of heads in the experiment of tossing a fair coinntimes asn! 1. The view of probabilities as the limit of relative frequencies is called the frequentist interpretation of probabilities. This is not the only interpretation, however. Another important one is the subjectivist interpretation, where probabilities are essentially viewed as representing degrees of belief about the likelihood of an event.

A sample space consisting of a ¯nite number of points, where each point is equally probable, that is, receives the same probability, is calledsimple.

Example 3.4 The sample space corresponding to the chance experiment of tossing a fair coin 3 times is a simple sample space where each sample point receives the same

probability 1/8. 2

Given a simple sample spaceS, the probability of an eventEµS is Pr(E) = number of sample points inE

total number of sample points:

Several important properties of probabilities follow immediately from this de¯nition: (i) 0·Pr(E)·1;

(ii) Pr(S) = 1; (iii) Pr(;) = 0.

These three properties hold for general sample spaces as well.

Other properties are easy to understand using Venn diagrams. IfAµB, then Pr(A)·Pr(B):

IfE=A[B, then

Pr(E) = sum of the probabilities of all sample points inA[B

= Pr(A) + Pr(B)¡Pr(A\B) ·Pr(A) + Pr(B):


Pr(E) = Pr(A) + Pr(B)

if and only if Pr(A\B) = 0, that is,AandB are mutually exclusive events.

For the complement Ec of E, since E [ Ec = S and E \ Ec = ;, we have Pr(E) + Pr(Ec) = Pr(S) = 1 and so

Pr(Ec) = 1¡Pr(E):

Example 3.5 Consider the simple sample space corresponding to experiment of tossing a fair coin 3 times. The event \at least 2 heads" corresponds to the set of elementary outcomes


Therefore, its probability is


The event \at least 1 tail" corresponds to the set of elementary outcomes

B =fHHT; HT H; HT T; T HH; T HT; T T H; T T Tg:

BecauseB is the complement of the event \no tails", its probability is Pr(B) = 7=8 = 1¡Pr(HHH):

The intersection ofAandB is the event

A\B=fHHT; HT H; T HHg;

whose probability is equal to 3=8. The probability of the union ofAandBis therefore equal to

Pr(A) + Pr(B)¡Pr(A\B) =1 2+

7 8¡

3 8= 1;

which ought not be surprising sinceA[B=S in this case. 2 3.4 COUNTING RULES

Calculations of probabilities for simple sample spaces is facilitated by a systematic use of a few counting rules.


The experiment of tossing a fair coin twice has 4 possible outcomes: HH, HT, T H

andT T. This is an example of a chance experiment with the following characteristics: 1. The experiment is performed in 2 parts.

2. The ¯rst part hasn possible outcomes, sayx1; : : : ; xn. Regardless of which of these outcomes occurred, the second part has m possible outcomes, say

y1; : : : ; ym.

Each point of the sample spaceS is therefore a paire= (xi; yj), wherei= 1; : : : ; n andj= 1; : : : ; m, andS consists of themnpairs

(x1; y1) (x1; y2) ¢ ¢ ¢ (x1; ym) (x2; y1) (x2; y2) ¢ ¢ ¢ (x2; ym)


. ... ...

(xn; y1) (xn; y2) ¢ ¢ ¢ (xn; ym):

The generalization to the case of an experiment with more than 2 parts is straightforward. Consider an experiment that is performed ink parts (k¸2), where thehth part of the experiment hasnh possible outcomes (h= 1; : : : ; k) and each of the outcomes in any part of the experiment can occur regardless of which speci¯c outcome occurred in any of the other parts. Then each sample point in S will be a

k-tuplee= (u1; : : : ; uk), whereuh is one of thenh possible outcomes in thehth part of the experiment. The total number of sample points inS is therefore equal to


Example 3.6 Suppose one can choose between 10 speaker types, 5 receivers and 3 CD players. The number of di®erent stereo systems that can be put together this way

is 10£5£3 = 150. 2

The next two subsections provide important examples of application of the multiplication rule.


Consider a chance experiment which consists of k repetitions of the same basic experiment or trial. If each trial has the same number n of possible outcomes, then the total number of sample points inS is equal tonk.

Example 3.7 Consider tossing a coin 4 times. The total number of outcomes is

24= 16. 2

Example 3.8 Consider a box containing 10 balls numbered 1;2; : : : ;10. Suppose that we repeat 5 times the basic experiment of selecting one ball at random, recording its number and then putting the ball back in the urn. Since the number of possible outcomes in each trial is equal to 10, the total number of possible outcomes of the experiment is equal to 105= 100;000. This experiment is an example ofsampling with

replacement from a ¯nite population. 2


Sampling without replacement corresponds to successive random draws, without replacement, of a single population unit. In the example of drawing balls from a box (Example 3.8), after a ball is selected, it is left out of the box.

Example 3.9 Consider a deck of 52 cards. If we select 3 cards in succession, then there are 52 possible outcomes at the ¯rst selection, 51 at the second, and 50 at the third. This is an example of sampling without replacement from a ¯nite population. The total number of possible outcomes is therefore 52£51£50 = 132;600. 2 Ifkelements have to be selected from a set ofnelements, then the total number of possible outcomes is

Pn;k =n(n¡1)(n¡2)¢ ¢ ¢(n¡k+ 1);

called the number ofpermutations ofnelements takenkat a time. Ifk=n, then the number of possible outcomes is the number of permutations of allnelements

Pn;n =n(n¡1)(n¡2)¢ ¢ ¢2¢1; calledn factorial and denoted by n!. By convention 0! = 1. Thus

Pn;k =

n(n¡1)(n¡2)¢ ¢ ¢(n¡k+ 1)(n¡k)¢ ¢ ¢2¢1 (n¡k)(n¡k¡1)¢ ¢ ¢2¢1 =

n! (n¡k)!:


Example 3.10 Given a group of k people (2 ¸ k ¸ 365), what is the probability that at least 2 people in the group have the same birthday? To simplify the problem, assume that birthdays are unrelated (there are no twins) and that each of the 365 days of the year are equally likely to be the birthday of any person. The sample space

S then consists of 365k possible outcomes. The number of outcomes inS for which allk birthdays are di®erent isP365;k. Therefore, if E denotes the event \all kpeople have di®erent birthdays", then

Pr(E) =P365;k 365k :

Because the event \at least 2 people have the same birthday" is just the complement ofE, we get

Pr(Ec) = 1¡P365365k;k:

We denote this probability byp(k). The table below summarizes the value ofp(k) for di®erent values ofk:

k p(k) 5 .027 10 .117 20 .411 40 .891 60 .994

Notice that, in a class of 100 people, the event that at least 2 people have the same

birthday is almost certain. 2


As a motivation, consider the following example.

Example 3.11 Consider combining 4 elementsa,b,cand d, taken 2 at a time. The total number of possible outcomes is equal to the permutation of 4 objects taken 2 at a time, namely

P4;2= 4¢3 = 12:

If the order of the elements of each pair is irrelevant, the table below shows that 6 di®erentcombinations are obtained:


12 permutations 6 combinations

a; b

a; c fa; bg

a; d

b; a fa; cg

b; c

b; d fa; dg

c; a

c; b fb; cg

c; d

d; a fb; dg

d; b

d; c fc; dg


Let Cn;k denote the number of di®erent combinations of n objects taken k at a time. To determineCn;k notice that the list ofPn;k permutations may be constructed as follows. First select a particular combination of k objects. Then notice that this particular combination can producek! permutations. Hence

Pn;k =Cn;k¢k!; from which we get



k! =

n! (n¡k)!k!:

The numberCn;k is also calledbinomial coe±cient and denoted





Clearly µ

n k

= n!

(n¡k)!k! =


n n¡k


Example 3.12 In Example 3.11,n= 4, k= 2 and soC4;2= 12=2 = 6. 2 Example 3.13 Given a hand of 5 cards, randomly drawn from a deck of 52, the probability of a \straight °ush" is

p= Pr(\straight °ush") = no. of di®erent straight °ushes no. of di®erent hands : The number of di®erent hands is equal to



= 52!


Because there are 10 straight °ushes for each suit, the total number of straight °ushes is 10¢4 = 40. Therefore, the desired probability is

p= 40

2;598;960 =:000015:

Not a high one! 2

When a set contains only elements of 2 distinct types, a binomial coe±cient may be used to represent the number of di®erent arrangements of all the elements in the set.

Example 3.14 Suppose thatkred balls andn¡kgreen balls are to be arranged in a row. Since the red balls occupykpositions, the number of di®erent arrangements of thenballs corresponds to the numberCn;k of combinations ofnobjects takenkat a

time. 2

Example 3.15 Given a hand of 5 cards, randomly drawn from a deck of 52, the probability of a \poker" is

p= Pr(\poker") = no. of di®erent pokers no. of di®erent hands;

where the denominator is the same as in Example 3.13. To compute the denominator, notice that 13 types of poker are possible: A, K, Q, . . . , 2, and that 5 cards can be divided in 2 groups, one of 4 and one of 1 cards, in




= 5! 1! 4!= 5

possible ways. Therefore, the number of possible pokers in one hand of 5 cards is 13¢5 = 65 and so

p= 65

2;598;960 =:000025;

which is higher than the probability of a straight °ush. 2


Suppose that we have a sample spaceS where probabilities have been assigned to all events. If we know that the event B ½S occurred, then it seems intuitively obvious that this ought to modify our assignment of probabilities to any other eventA½S, because the only sample points inAthat are now possible are the ones that are also contained inB. This new probabilitiy assigned toAis called theconditional probability of the eventAgiven that the eventBhas occurred, or simplythe conditional probability of AgivenB, and denoted by Pr(AjB).

Example 3.16 Consider again the experiment of tossing a fair coin 3 times. Let A

= \at least oneT" andB = \H in the ¯rst trial". Clearly


If we know thatB occurred, then the relevant sample space becomes

S0=fHHH; HHT; HT H; HT Tg:


Pr(AjB) = 3 4=

3=8 1=2=

Pr(A\B) Pr(B) :

Notice that Pr(AjB)<Pr(A) in this case. 2 De¯nition 3.1 IfAandB are any two events, then the conditional probability ofA

givenB is

Pr(AjB) =Pr(A\B) Pr(B)

if Pr(B)>0, and Pr(AjB) = 0 otherwise. 2

The conditional probability ofB givenA is similarly de¯ned as

Pr(BjA) =Pr(A\B) Pr(A) provided that Pr(B)>0.

The frequentist interpretation of conditional probabilities is as follows. If a chance experiment is repeated a large number of times, then the proportion of trials on which the eventB occurs is approximately equal to Pr(B), whereas the proportion of trials in which bothA andB occur is approximately equal to Pr(A\B). Therefore, among those trials in whichBoccurs, the proportion in whichAalso occurs is approximately equal to Pr(A\B)=Pr(B).

De¯nition 3.1 may be re-expressed as

Pr(A\B) = Pr(AjB) Pr(B): (3.3) This result, called the multiplication law, provides a convenient way of ¯nding Pr(A\B) whenever Pr(AjB) and Pr(B) are easy to ¯nd.

Example 3.17 Consider a hand of 2 cards randomly drawn from a deck of 52. Let

A= \second card is a king" andB = \¯rst card is an ace". Then Pr(B) = 4=52 and Pr(AjB) = 4=51. Hence

Pr(A\B) = Pr(\ace and then king") = Pr(AjB) Pr(B) = 4

51 4

52 =:0060:

2 We now consider a useful application of the multiplication law (3.3). Notice that

A= (A\B)[(A\Bc);

where A\B and A\Bc are disjoint events becauseB and its complement Bc are disjoint. Hence


where, by the multiplication law,

Pr(A\B) = Pr(AjB) Pr(B) and

Pr(A\Bc) = Pr(AjBc) Pr(Bc):


Pr(A) = Pr(AjB) Pr(B) + Pr(AjBc) Pr(Bc); (3.4) which is sometimes called thelaw of total probabilities.

Example 3.18 Consider a hand of 2 cards randomly drawn from a deck of 52. Let

A = \second card is a king" andB = \¯rst card is a king". We have Pr(B) = 4=52, Pr(Bc) = 48=52 and

Pr(AjB) = 3=51; Pr(AjBc) = 4=51:

Hence, by the law of total probabilities

Pr(A) = 3 51¢

4 52+

4 51¢

48 52 =

4 52:

Thus Pr(A) and Pr(B) are the same. 2


LetAandBbe two events with non-zero probability. If knowing thatBoccurred gives no information about whether or notAoccurred, then the probability assigned to A

should not be modi¯ed by the knowledge thatB occurred, that is, Pr(AjB) = Pr(A). Hence, by the multiplication law,

Pr(A\B) = Pr(A) Pr(B):

We take this as our formal de¯nition of statistical independence.

De¯nition 3.2 Two eventsAandB are said to bestatistically independent if

Pr(A\B) = Pr(A) Pr(B):

2 Notice that this de¯nition of independence is symmetric in A and B, and also covers the case when Pr(A) = 0 or Pr(B) = 0. It is easy to show that ifA andB are independent, thenAandBc as well asAc andBc are independent.

It is clear from De¯nition 3.2 that mutually exclusive events cannot be independent. The concept of statistical independence is di®erent from other concepts of independence (logical, mathematical, political, etc.). When there is no ambiguity, the term independence will be taken to mean statistical independence.


Example 3.19 The sample space associated with the experiment of tossing a fair coin twice is a simple sample space consisting of 22= 4 points. De¯ne the eventsA= \Hin the ¯rst toss" andB = \T in the second toss". BecauseA\B=HT we have

Pr(A\B) = 1 4=

1 2¢


2= Pr(A) Pr(B):

This result seems fairly intuitive, because the occurrence of H in the ¯rst coin toss has no relation to, and no in°uence on the occurrence of T in the second coin toss,

and viceversa. 2

It is natural to assume that events that are physically unrelated (such as successive coin tosses) are also statistically independent. However, physically related events may also satisfy the de¯nition of statistical independence.

Example 3.20 Consider the chance experiment consisting of throwing a fair die. The sample space of this experiment is the simple sample space:

1 2 3 4 5 6

LetA= \an even number is obtained" andB= \the number 1, 2, 3 or 4 is obtained". It is easy to verify that Pr(A) = 1=2 and Pr(B) = 2=3. Further

Pr(A\B) = Pr(\2 or 4") = 1=3 = Pr(A) Pr(B):

Hence,Aand B are independent even though their occurrence depends on the same

roll of a die. 2


Suppose that you want to determine whether a coin is fair (F) or unfair (U). You have no information on the coin, and so you are willing to believe thatF and U are equally likely, that is,

Pr(F) = Pr(U) = 1=2:

If the coin is fair, then

Pr(HjF) = 1=2:

Further suppose that you know that, if the coin is unfair, thenH is more likely than

T, say


Assume that tossing the coin once gives youH. What is now the probability that the coins is unfair? This is called theposterior probability of F givenH, and denoted by Pr(FjH). Intuitively, the occurrence ofH (the most likely event if the coin is unfair) should modify your initial beliefs, leading you to view the event that the coin is fair as less likely than initially thought, whereas the occurrence of T should lead you to view the event that the coin is fair as more likely than initially thought.

One way of computing the posterior probabilities Pr(FjH) and Pr(FjT) is to draw the outcome tree for this problem.

/ H Pr(H\F) =:25


n T Pr(T\F) =:25 / H Pr(H\U) =:45


n T Pr(T\U) =:05

It is then clear that the eventsU andF are mutually exclusive and that the eventH

is the union of the two disjoint eventsH\F andH\U. Hence Pr(H) = Pr(H\F) + Pr(H\U) =:25 +:45 =:70:


Pr(FjH) = Pr(H\H) Pr(H) =


:70 =:357;

which is indeed less than the original assignement of probability to F, namely Pr(F) = 1=2. By a similar argument we have

Pr(FjT) = Pr(T \F) Pr(T) =


:30 =:833:

We can also compute the posterior probability Pr(FjH) without the need of a tree diagram, by using the fact that

Pr(H\F) = Pr(HjF) Pr(F) by the multiplication law, and

Pr(H) = Pr(HjF) Pr(F) + Pr(HjU) Pr(U) by the law of total probabilities. Hence,

Pr(FjH) = Pr(HjF) Pr(F)


This formula is known asBayes law. For Pr(FjT), Bayes law gives Pr(FjT) = Pr(TjF) Pr(F)

Pr(TjF) Pr(F) + Pr(TjU) Pr(U); where Pr(TjF) = 1¡Pr(HjF) and Pr(TjU) = 1¡Pr(HjU).

Notice that we can regard Pr(F) as our prior information about whether the coin is fair. Bayes law then gives us a way of updating this information in the light of the new information contained in the fact thatH was obtained.


Table 4 Outcome tree and sample space of the chance experiment of tossing 3 coins. / H HHH H / n T HHT H n / H HT H T n T HT T / H T HH H / n T T HT T n / H T T H T n T T T T

Table 4

Outcome tree and sample space of the chance experiment of tossing 3 coins. / H HHH H / n T HHT H n / H HT H T n T HT T / H T HH H / n T T HT T n / H T T H T n T T T T p.2



Related subjects : Introduction to Probability