Lecture4

(1)

Lecture 4

Probability of random event. Representation of probability.

Plan of the lecture: 1. Introduction

1.1 Definition of probability

2. Probability axioms (axiomatic approach to probability) 3. Classical definition of probability

4. Frequency probability, statistical probability 5. Geometric probability

(2)

1 Introduction

Probability is a way of expressing knowledge or belief that an event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of complex systems.

The word “probability” has been used in a variety of ways since it was first coined in

relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we interpret the probability values of probability theory.

There are two broad categories of probability interpretations which can be called “physical” and “evidential” probabilities. Physical probabilities, which are also called objective

or frequency probabilities, are associated with random physical systems such as roulette wheels, rolling dice and radioactive atoms. In such systems, a given type of event (such as the dice yielding a six) tends to occur at a persistent rate, or “relative frequency”, in a long run of trials. Physical probabilities either explain, or are invoked to explain, these stable frequencies. Thus talk about physical probability makes sense only when dealing with well defined random experiments. The two main kinds of theory of physical probability are frequentist accounts (such as those of Venn, Reichenbach and von Mises) and propensity accounts (such as those of

Popper, Miller, Giere and Fetzer).

Evidential probability, also called Bayesian probability, can be assigned to any statement whatsoever, even when no random process is involved, as a way to represent its subjective plausibility, or the degree to which the statement is supported by the available evidence. On most accounts, evidential probabilities are considered to be degrees of belief, defined in terms of dispositions to gamble at certain odds. The four main evidential interpretations are the classical (e.g. Laplace's) interpretation, the subjective interpretation

(de Finetti and Savage), the epistemic or inductive interpretation (Ramsey, Cox) and the

logical interpretation (Keynes and Carnap).

Some interpretations of probability are associated with approaches to statistical inference, including theories of estimation and hypothesis testing. The physical interpretation, for example, is taken by followers of “frequentist” statistical methods, such as R. A. Fisher,

Jerzy Neyman and Egon Pearson. Statisticians of the opposing Bayesian school typically accept

(3)

1.1 Definition of probability

What is probability? Mathematicians attempted to define this seemingly simple term without much success in reaching a consensus for a long time until Kolmogorov presented his celebrated theory referred to as the “axiomatic approach.” The power of the axiomatic approach is in its simplicity.

First, consider the debate that went on before Kolmogorov. A probability was defined as a frequency of occurrence. Consider 1000 trials in the coin throwing experiment. If the head shows up 400 times, it is concluded that the “probability” of a head is 0.4. The dilemma of this

definition of probability is that unless the coin is thrown many times and the outcomes are observed, there is no way of telling the probability.

Some would say that the probability of head should be 0.5 but then others would argue that, unless the coin is minted “perfectly” with identical sides, no one can say that its probability is 0.5 even though it may be “close,” etc., etc. Mathematicians had difficulty overcoming the

arguments such as this and, as a result, probability theory could not be developed into a useful discipline that could be applied to practical problems.

Most reasonable persons could agree, deep in their hearts, that it should be good enough to take the probability of, for example, a particular face in die throwing is 1/6 and move on to solve other probability problems associated with die throwing. If the 1/6 probability for a face is

accepted, then one can find, for example, the probability of a face with an even number of spots, which would be 0.5, etc. With the frequency definition of probability, this simple solution would

not be possible. Such an approach is possible because human beings are given this innate capability of a priori reasoning.

Kolmogorov presented this simple idea based on a priori reasoning that freed everyone interested in probability from the endless arguments. His approach is referred to as the “axiomatic probability theory” and is based on set theory and measure theory. His idea was that there was no need to determine whether a coin was minted perfectly to discuss its probability. He simply turned the table around and asserted that one could “assign” probabilities to the outcomes based on the a priori knowledge of the outcomes and let the probabilities initially assigned be the starting point for developing more complex probability theory just like accepting 1/6 as the

probability of a face in die throwing.

The key concept is in the word “assign”. In this approach, probability “begins” with the

(4)

die-throwing experiment. Once this initial assignment of probability is “accepted” (as an axiom, so to speak), it is now possible to solve all kinds of complex and interesting probability problems associated with die-throwing.

For example, what is the probability of getting an even number of spots? Since the 1/6 probability is “accepted,” one can proceed to find its answer, which is 0.5. What is the probability of getting a face with more than four spots? Since either five or six spots would make this event happen, the answer would be 2/6.

2 Probability axioms (axiomatic approach to probability)

A mathematical system, e.g., linear algebra, set theory, and group theory, is simply an artifact that is useful because it provides a structure for drawing meaningful inferences. The axiomatic probability theory is such a mathematical system.

Consider a random experiment with 𝑛 possible outcomes, 𝜔₁, 𝜔₂, …, 𝜔_𝑛. The probability space Ω is defined as the set of all possible random outcomes of a random experiment

as follows:

Ω = 𝜔₁, 𝜔₂, … , 𝜔_𝑛 . (1)

A “measure” is “assigned” to each outcome, 𝜔_𝑖. This measure is referred to as “probability”. Denote this measure by 𝑝_𝑖. The measure chosen is a real number between 0 and 1

as follows:

0 ≤ 𝑝_𝑖 ≤ 1, (2) 𝑝_𝑖 = 𝑃 𝜔_𝑖 − 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑟𝑎𝑛𝑑𝑜𝑚 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝜔_𝑖. (3)

The word “probability” was difficult to define because of the attempts to define its

meaning semantically and in some instances philosophically. In the axiomatic probability theory,

its definition is simply a “measure” that is assigned to an outcome. In fact, this measure does not

have to be a number between 0 and 1. It is conventional though to use a number between 0 and 1

as a probability measure.

(5)

experiment of (3), the axiomatic probability theory is based on the following three simple axioms:

𝐴𝑥𝑖𝑜𝑚 𝐼: 𝑃 𝐴 ≥ 0 (𝒏𝒐𝒏𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒊𝒕𝒚); (4) 𝐴𝑥𝑖𝑜𝑚 𝐼𝐼: 𝑃 Ω = 1 (𝒏𝒐𝒓𝒎𝒂𝒍𝒊𝒛𝒂𝒕𝒊𝒐𝒏); (5) 𝐴𝑥𝑖𝑜𝑚 𝐼𝐼𝐼: 𝐼𝑓 𝐴 ∩ 𝐵 = ∅, 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 (𝒂𝒅𝒅𝒊𝒕𝒊𝒗𝒊𝒕𝒚). (6)

In the above equations, Ω is a set referred to as the probability space defined earlier. 𝐴 and 𝐵 are subsets of Ω and define the random events of interest. Since 𝐴 and 𝐵 define the events,

they are sometimes simply referred to as “events.” Ω is also a set and, as such, also an event. Since Ω includes all possible outcomes, any outcome will make Ω happen and so Ω is referred to

as a certain event (sure event). Similarly, ∅ is a set that contains no element. No outcome will

make ∅ happen, and is referred to as an impossible event. Two set operations are used in these axioms. 𝐴 ∩ 𝐵 is an intersection of 𝐴 and 𝐵, a set of elements belonging to both 𝐴 and 𝐵. 𝐴 ∪ 𝐵

is a union of 𝐴 and 𝐵, a set of elements belonging to either 𝐴 or 𝐵.

Axiom I states that any event defined in the probability space is assigned a non-negative

measure or probability. Axiom I defines the starting point of development of a probabilistic framework of a random experiment under consideration. First, define the elementary events 𝜔𝑖

and assign probabilities to them, 𝑃 𝜔𝑖 . Note the distinction between 𝑃 𝜔𝑖 and 𝑃 𝜔𝑖 . The

former is the probability of the elementary event 𝜔𝑖 and the latter, that of a random outcome

𝜔_𝑖. It is important to note that the starting point of the axiomatic framework, i.e., Axiom I, is 𝑃 𝜔𝑖 and not 𝑃 𝜔𝑖 .

Axiom II states that the probability of the space Ω is one. The space Ω is a set that

contains all possible outcomes under consideration and it would be reasonable to accept as a basic truth that the probability of all possible outcomes is one.

In effect, Axiom II simply states that the probability of certainty is one. One may then ask what about the probability of impossibility, i.e., a null event. Don’t we need an axiom, say Axiom IIa that states 𝑃 ∅ = 0? It can be shown that the three axioms cover this axiom and

adding it would be superfluous because it can be derived from Axioms II and III as follows. From set theory, the union of the space Ω and the null set ∅ is the space Ω and the intersection of the space Ω and the null set ∅ is the null set ∅:

(6)

From Equation (7), it follows that:

𝑃 Ω = 𝑃 Ω ∪ ∅ . (9)

Equation (8) satisfies the condition for Axiom III. Hence, from Axiom III and Equation (9), it follows that:

𝑃 Ω = 𝑃 Ω ∪ ∅ = 𝑃 Ω + 𝑃 ∅ . (10)

From Axiom II and Equation (10), it follows that:

𝑃 Ω = 𝑃 Ω ∪ ∅ = 𝑃 Ω + 𝑃 ∅ = 1. (11)

Finally, from Equation (11), it follows that:

𝑃 ∅ = 1 − 𝑃 Ω = 0. (12)

Note that Axiom I states 𝑃 𝐴 ≥ 0 but it does not include 𝑃 𝐴 ≤ 1. Once again, the reason is because it can be derived from other axioms and including 𝑃 𝐴 ≤ 1 would be superfluous.

Example 1

A box contains a total of 10 balls of different colors as follows: two white balls, three red

balls and five black balls. A player is to withdraw a ball, and, if the ball withdrawn is either red or black, the player wins a piece of candy. What is the probability of winning a piece of candy by playing this game?

Solution

There are eight red or black balls out of a total of 10 balls, and so the probability of winning the grand prize is 0.8. This is a simple problem and one can get the answer quickly in the head without going through the rigor of axiomatic formulation.

(7)

complex problems, the disciplined way of dealing with the problem using the axiomatic approach is helpful.

First define the random experiment. There are two alternative ways of defining the space and random outcomes for this problem. Either method should yield the same answer.

Formulation 1. A more direct way of formulation is to define the outcomes of ball

drawing like the outcomes of die throwing. Imagine that the individual balls can be distinguished (e.g., by numbering them) as the faces of a die are distinguished. Then there are ten possible outcomes with an equal probability as follows:

Ω = 𝜔₁, 𝜔₂, 𝜔₃, 𝜔₄, 𝜔₅, 𝜔₆, 𝜔₇, 𝜔₈, 𝜔₉, 𝜔₁₀ ; (13)

𝑝𝑖 = 𝑃 𝜔𝑖 =₁₀1 ; 𝑖 = 1, … , 10, (14)

where 𝜔₁ and 𝜔₂ are drawing a white ball, 𝜔₃, 𝜔₄ and 𝜔₅ a red ball and 𝜔₆ through 𝜔₁₀ a black

ball.

The next step is to define the event. The event of interest is “winning a candy” and is defined as a set denoted by 𝑊. In set theory, a set is defined by its members or a member is “qualified” to be included in the event set, if it makes that event happen. 𝑊 in turn depends on

the following two events:

𝑅 = 𝑏𝑎𝑙𝑙 𝑤𝑖𝑡𝑕𝑑𝑟𝑎𝑤𝑛 𝑖𝑠 𝑟𝑒𝑑 = 𝜔3, 𝜔4, 𝜔5 ; (15)

𝐵 = 𝑏𝑎𝑙𝑙 𝑤𝑖𝑡𝑕𝑑𝑟𝑎𝑤𝑛 𝑖𝑠 𝑏𝑙𝑎𝑐𝑘 = 𝜔₆, 𝜔₇, 𝜔₈, 𝜔₉, 𝜔₁₀ . (16)

Since 𝜔𝑖 ’s are mutually exclusive, i.e., 𝜔𝑖 ∩ 𝜔𝑗 = ∅ 𝑓𝑜𝑟 𝑖, 𝑗 = 3, … , 8, it follows

that:

𝑅 = 𝜔3, 𝜔4, 𝜔5 = 𝜔3 ∪ 𝜔4 ∪ 𝜔5 = ( 𝜔3 ∪ 𝜔4 ) ∪ 𝜔5 . (17)

Applying Axiom III twice, it follows that:

𝑃 𝑅 = 𝑃 𝜔3, 𝜔4, 𝜔5 = 𝑃 𝜔3 ∪ 𝜔4 + 𝑃 𝜔5 = 𝑃 𝜔3 + 𝑃 𝜔4 +

𝑃 𝜔₅ = 0.3. (18)

(8)

𝑃 𝐵 = 𝑃 𝜔₆, 𝜔₇, 𝜔₈, 𝜔₉, 𝜔₁₀ = 0.5. (19)

𝑊 would occur if the ball withdrawn is either red or black: 𝑊 would occur if either 𝑅 or 𝐵 occurs. Since 𝑅 and 𝐵 are mutually exclusive events, it follows that:

𝑅 ∩ 𝐵 = ∅; (20) 𝑊 = 𝑅 ∪ 𝐵. (21)

Hence, from Axiom III, it follows that:

𝑃 𝑊 = 𝑃 𝑅 ∪ 𝐵 = 𝑃 𝑅 + 𝑃 𝐵 = 0.3 + 0.5 = 0.8. (22)

Formulation 2. As long as the axiomatic approach is followed, different definitions of outcomes are possible. The above formulation can be simplified by defining the experimental outcomes as the colors of the balls as follows:

Ω = 𝜔𝑤, 𝜔𝑟, 𝜔𝑏 , (23)

where 𝜔𝑤, 𝜔𝑟 and 𝜔𝑏 are random outcomes of white, red and black color.

Then from the problem, the probabilities of the random outcomes can be assigned as follows:

𝑃 𝜔_𝑤 = 0.2; 𝑃 𝜔𝑟 = 0.3; 𝑃 𝜔𝑏 = 0.5. (24)

𝑊 would occur if 𝜔_𝑟 or 𝜔_𝑏 shows up. Hence,

𝑊 = 𝜔𝑟, 𝜔𝑏 = 𝜔𝑟 ∪ 𝜔𝑏 . (25)

Since 𝜔𝑟 ∩ 𝜔𝑏 = ∅ from Axiom III and Equations (24) and (25), it follows that:

𝑃 𝑊 = 𝑃 𝜔_𝑟, 𝜔_𝑏 = 𝑃 𝜔𝑟 ∪ 𝜔𝑏 = 𝑃 𝜔𝑟 + 𝑃 𝜔𝑏 = 0.3 + 0.5 = 0.8. (26)

(9)

The classical definition of probability is identified with the works of Pierre Simon Laplace. As stated in his Théorie analytique des probabilities.

The probability of an event is the ratio of the number of cases favorable to it, to the number of all cases possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible.

This can be represented mathematically as follows:

If a random experiment can result in 𝑁 mutually exclusive and equally likely outcomes

and if 𝑁_𝐴 of these outcomes result in the occurrence of the event 𝐴, the probability of 𝑨 is

defined by

𝑃 𝐴 =𝑁𝐴

𝑁.

There are two clear limitations to the classical definition. Firstly, it is applicable only to situations in which there is only a “finite” number of possible outcomes. But some important random experiments, such as tossing a coin until it rises heads, give rise to an infinite set of outcomes. And secondly, you need to determine in advance that all the possible outcomes are equally likely without relying on the notion of probability to avoid circularity – for instance, by symmetry considerations.

This definition is essentially a consequence of the principle of indifference. If elementary events are assigned equal probabilities, then the probability of a disjunction of elementary events is just the number of events in the disjunction divided by the total number of elementary events.

The classical definition of probability was called into question by several writers of the nineteenth century, including John Venn and George Boole. The frequentist definition of probability became widely accepted as a result of their criticism, and especially through the works of R.A. Fisher. The classical definition enjoyed a revival of sorts due to the general interest in Bayesian probability.

4 Frequency probability

"Statistical probability" is a term sometimes used informally as a synonym for frequency probability, which identifies probability with relative frequency over a long series of events or the proportion of an event in a large population.

(10)

account overcomes some of the problems of the previously dominant viewpoint, the classical interpretation.

Frequentists talk about probabilities only when dealing with well-defined random experiments. The set of all possible outcomes of a random experiment is called the sample space of the experiment. An event is defined as a particular subset of the sample space that you want to consider. For any event only one of two possibilities can happen; it occurs or it does not occur. The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.

Thus, if 𝑁𝑡 is the total number of trials and 𝑁𝐴 is the number of trials where the event 𝐴

occurred, the probability 𝑃 𝐴 of the event occurring will be approximated by the relative frequency as follows:

𝑃 𝐴 ≈𝑁𝐴

𝑁𝑡.

A further and more controversial claim is that in the “long run”, as the number of trials approaches infinity, the relative frequency will converge exactly to the probability:

𝑃 𝐴 = lim𝑛𝑡→∞

𝑁𝐴

𝑁𝑡.

One objection to this is that we can only ever observe a finite sequence, and thus the extrapolation to the infinite involves unwarranted metaphysical assumptions. This conflicts with the standard claim that the frequency interpretation is somehow more “objective” than other theories of probability.

Empirical probability, also known as relative frequency, or experimental probability, is the ratio of the number favorable outcomes to the total number of trials, not in a sample space but in an actual sequence of experiments. In a more general sense, empirical probability estimates probabilities from experience and observation. The phrase a posteriori probability

has also been used as an alternative to empirical probability or relative frequency.

5 Geometric probability

(11)

"favorable" outcomes, meaning what you want to happen, by the number of total outcomes, meaning all of the things that might happen.

𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠_{𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠} .

For example, if we want to know the probability of rolling a 5 with a standard six-sided die, we know that there is only one favorable outcome (5), while there are 6 total outcomes (one

for each side of the die). This means that the probability of rolling a 5 is 1/6. If we wanted to know the probability of rolling an odd number, we first count the favorable outcomes – 1, 3, or 5. There are 3. There are still 6 total possible outcomes, so the probability of rolling an odd number is 3/6, or 1/2.

"Geometric probability", which is what our target problem is about, is exactly the same idea, except that we are dealing with the areas of regions instead of the "number" of outcomes. The equation becomes

𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =𝑎𝑟𝑒𝑎 𝑜𝑓 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑟𝑒𝑔𝑖𝑜𝑛_{𝑎𝑟𝑒𝑎 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑟𝑒𝑔𝑖𝑜𝑛} .

A typical problem might be this: If you are throwing a dart at the rectangular target below and are equally likely to hit any point on the target, what is the probability that you will hit the small square?

Fig. 1

To solve this, we need to find the area of the favorable region, which is the small square, and the area of the total region, which is the rectangle.

The area of the square is (5 𝑐𝑚)2_{, or 25 𝑐𝑚}2_.

(12)

𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =𝑎𝑟𝑒𝑎 𝑜𝑓 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑟𝑒𝑔𝑖𝑜𝑛_{𝑎𝑟𝑒𝑎 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑟𝑒𝑔𝑖𝑜𝑛} = _{250 𝑐𝑚} 25 𝑐𝑚2₂ = ₁₀1.

This means that there is a 1 in 10 chance that a dart thrown at the rectangle will hit the