Introduction to CEE 3030 page 1

(1)

Uncertainty in Engineering Analysis – An Introduction

By Gilberto E. Urroz, August 2003/updated August 2005

This course on uncertainty in engineering analysis can also be referred to as probability and statistics for engineers. In particular, we will deal with the applications of probability and statistics to problems related to civil and environmental engineering.

Where does this class fit in the curriculum?

To understand and apply concepts of probability and statistics you need to have taken basic math classes such as Algebra and Calculus (uni-variate differential and integral Calculus is essential, multi-variate Calculus is useful). This course belongs in the intermediate level of the curriculum for both Civil and Environmental Engineering. We will use, as much as we can, problems taken from those disciplines to illustrate the applications of probability and statistics concepts.

Systems

Civil engineers and environmental engineers deal with a number of physical, biological, chemical, and social systems in their practice. For example, a civil engineer may be involved in determining the steady-state water discharge (Q) passing through a pressurized water pipeline of given length (L) and diameter (D) made of a specific material under a specified pressure difference (Δp) between the inlet and outlet of the pipeline. The pipeline, in this case, constitutes a physical system. Inputs to this system will be the pipeline material, the diameter, the length, and the pressure difference. Typically, the engineer will use a well known equation or formula to calculate the discharge. The equation or formula used can be thought of as the system’s response to the inputs, and the discharge as the output to the system of interest. These ideas are illustrated in the following diagram.

Deterministic and stochastic (random) systems

You have solved many problems of a similar nature in your physics, chemistry, and engineering mechanics classes, among others. In most of those problems the inputs are well defined (e.g.,

…the pipeline is made out of cast iron, and has a length of 100 feet and a diameter of 5 inches, while subjected to a pressure difference of 200 psi between its inlet and outlet …), the system

(2)

Introduction to CEE 3030 – page 2

roughness of the pipe – a function of the material --, and ν = kinematic viscosity of water.

These values are readily available in textbooks and engineering tables].

A system like the one just described, is known as a deterministic system. Thus, a rough definition of a deterministic system is: a system whose input and system response are well know and accurately determined. In such a deterministic system, the output is uniquely determined by the values of the inputs and by the system response. You will notice that most of the problems you solve in assignments and tests related to physical sciences and

technologies are of this nature, i.e., most of your assignments are deterministic.

Many systems in civil and environmental engineering, however, incorporate uncertainty in either the input or the system response. Furthermore, you may want to have the uncertainty related with a certain quantity be the output of a system of interest. In such cases we will refer to this system as a stochastic system or random system.

For example, in studying the internal stresses in a loaded beam, uncertainty may be incorporated if the beam is subject to an earthquake. The study of earthquakes involve at least two types of uncertainty: (1) uncertainty in the timing of the earthquake, i.e., how likely is it that an earthquake of a certain magnitude will hit the area where the beam is located within, say, the next year?, and (2) uncertainty in the characteristics of the earthquake, i.e., what is the predominant direction of seismic accelerations? what is the maximum acceleration?

what is the dominant frequency of oscillation? etc.

Examples from Civil Engineering

Our Civil Engineering program is divided into four sub-specialties including structural

engineering, geotechnical engineering, water engineering, and transportation engineering. I will next provide examples of application of the concepts of probability and statistics to those sub-specialties.

Structural Engineering

The example given above on earthquake response of structural elements is a clear case of uncertain behavior in a system. Although the beam response is well know if the load is given, the presence of earthquake accelerations will introduce uncertainty in the system analysis, since we don’t know, a priori, what the magnitude of the maximum load will be. Also, the oscillatory nature of the load caused by an earthquake will require the vibrational analysis of the beam to better understand its response. Thus, it may be necessary to breakdown the oscillatory load into its many components, identifying the dominant frequencies of vibration -- a technique where statistic analysis is relevant.

While pouring concrete into forms, for example, it is necessary to monitor the quality of the concrete so that it will satisfy the requirements in terms of tension and compression loads for which it was designed. Thus, cylindrical specimens of concrete will be cured separately from that poured into the forms and tested, at the appropriate time, in a tri-axial machine to verify its compressive strength. The values thus obtained can be then summarized using statistical techniques.

Geotechnical Engineering

Geotechnical engineers use their knowledge to analyze soils to use them as the basis of structures. Through analysis of soil specimens they can determine certain soil properties that can be used to predict, for example, the bearing capacity of the soil (i.e., the maximum load that a soil can take). Thus, to better understand the properties of a soil formation, the properties of those soil specimens can be determined in the laboratory and their values summarized using statistical techniques.

(3)

Earthquakes – sources of much uncertainty – are of interest to geotechnical engineers because seismic waves going through soil formations may alter the bearing capacity of the soil, at times making the soil loose all bearing capacity through a phenomenon known as liquefaction (In liquefaction, the sudden shaking of the soil due to a seismic wave passing through may loosen the soil particles allowing the soil matrix to temporarily behave as a liquid formation).

Water Engineering

Our Water Engineering sub-specialty is divided into four programs, namely, hydraulics and fluid mechanics, hydrology, groundwater, and water resources. Examples of uncertainty analysis from these four programs are presented below.

Hydraulics and fluid mechanics

The example of the pipeline system given earlier is a clear example of a deterministic system.

Uncertainty, however, may be included in such a system if the pipeline is used, for example, to supply water to a small city since the water demand varies randomly from minute to minute, depending on the time of the day and on emergencies such as a domestic fire or a broken distribution line.

Most flows in pipelines and open channel are turbulent flows, meaning that the flow velocity varies randomly about an average value at a very high frequency. For most practical

applications we are only interested in the mean flow. However, in some cases we may be interested in the instantaneous velocity values with need to be measured at a high frequency using specialized instruments. The enormous amount of data thus generated can be

summarized (or reduced) by using statistical analysis with computers.

Hydrology

Hydrology deals with the sources and fate of water on the surface of the ground, and

underground. Precipitation of atmospheric water on the ground, in the form of rain, hail, or snow, is the main source of water supply to hydrologic systems. The prediction of

precipitation, i.e., the prediction of weather, is, by nature, highly uncertain. Meteorologists use sophisticated computer models, a complex network of weather stations and satellites, as well as probability algorithms to produce an estimate of the probability of precipitation on a daily basis. Even with the state-of-the-art hardware and software, there is a lot of

uncertainty involved in such predictions.

Precipitation on the ground finds its way into channels in the form of water runoff and it also infiltrates into the groundwater. The runoff discharges in streams and rivers are highly uncertain because they depend on the precipitation values. Institutions such as the US Geological Survey (USGS) keep track of discharge values in streams that, when processed through statistical techniques, provide data relevant to specific water bodies. These data can then be used by engineers, water users, etc., to make decisions related to the use of those water bodies. For example, the people that operates Glenn Canyon Dam in the Colorado River need to know the current Lake Powell levels to determine how much water needs to be released in the downstream section of the river based on different criteria, e.g., how much water is needed for hydroelectric power production, for water supply, for recreation (in Lake Powell, or for rafting downstream of Glenn Canyon Dam), for sustaining fish populations, etc.

(4)

Introduction to CEE 3030 – page 4

Hydrologists may be interested in cataloging the discharges that produce floods in certain reaches of rivers or streams. They may be interested in the flow that produces a flood, say, once every 50 years on the average. This discharge is known as the 50-year flood, and it is said that the period of return of such a flow is 50 years. The probability of occurrence of such flow, on a yearly basis, will be 1/50. They may also be interested in the 100-year and the 500- year flood values (typically, the larger the period of return of a flood, the larger the discharge needed to produce such flood).

Groundwater

Although the water that flows underground is part of the general hydrologic cycle, the analysis of groundwater may be a specialty of its own because of the peculiar nature of groundwater flows. These are very slow flows occurring in the pores in the ground. Groundwater is an important resources in some areas where atmospheric precipitation is limited, such as in our own Cache Valley. Wells drilled through the water-bearing formations (known as aquifers) are used to supply water to cities and industries, as well as to agricultural operations.

The supply of water to underground aquifers is subject to uncertainties similar to those of surface water, namely, availability of precipitation or the levels of water bodies that serve as sources of groundwater flows. Other uncertainties of interest in the analysis of groundwater flows are the variations in aquifer properties, because such variations affect the local water flows in the aquifers.

Contamination of the groundwater is another source of uncertainty in the operation of groundwater aquifers. The analysis of groundwater contaminant transport must account for the uncertainties inherent to the aquifer as well as those associated with the source and amount of contaminant in the ground.

Transportation Engineering

Transportation engineering deals with the physical and systemic analysis of transportation systems, i.e., highways, airports, trains, ships, harbors, rivers, etc. Many of the physical systems involved can be designed by structural engineers (e.g., airport buildings), geotechnical engineers (e.g., retaining walls in highways), or hydraulic engineers (e.g., drainage culverts in highways). There are some design procedures, such as the geometric design of highways, that are a specialty of its own. Uncertainties, however, are more likely to show up in the systemic analysis of transportation systems, for example, in determining the number of users of a toll booth, or the number of passengers being served at an airport terminal, or the number of trains stopping at a given subway station on a given day, or the volume of commercial trucks utilizing a section of a highway on a given day or season. The systemic analysis of

transportation systems, thus, utilize the concepts of probability and statistics on a daily basis.

It is virtually impossible to analyze transportation systems without taking into account uncertainty.

Examples from Environmental Engineering

Uncertainty in measurements is typical in natural systems (e.g., rivers, lakes, groundwater aquifers), which are, many a time, the subject of study of environmental engineers. Thus an environmental engineer monitoring the coliform levels in a segment of the Logan River may find the concentration of such microbes varying somewhat randomly during the different seasons and even with the hour of the day.

From the point of view of its water quality, a river is very complicated system since the origin of the contaminant loads may not be easy to pinpoint, nor it may be possible to know, a priori, the amount of contaminant provided by a given source, even if the source location is known.

Thus, we may not have a clear picture of the river as a system in reference to its water quality.

This lack of knowledge introduces uncertainty in understanding the contamination of the river.

Continuous monitoring of the water quality, and the use of proper statistical analysis can

(5)

provide, however, needed information to improve our understanding of the river contamination patterns and how to provide solutions – if possible – to such problem.

Uncertainties abound also in the monitoring and analysis of air quality in cities or industries.

Air contaminants can be released accidentally in an enclosed area or in open areas. The prediction of such accidental releases involves statistical and probability analysis.

Environmental engineers and scientists may be also interested in monitoring animal populations in ranges and forests, or fish populations in lakes and rivers, as well as on determining the effects of air-borne or water-borne contaminants in those habitats. There are large number of uncertainties involved in such analysis, such as uncertainties in the population counts,

uncertainties on food supplies, uncertainties on the distribution of diseases in the population, uncertainties in the migration of such populations, etc.

Random variables

The description of systems typically involved the use of variables that describe measured or predicted quantities of the systems. If there is uncertainty associated with the variables in a system, the variables are referred to as random variables. Thus, the amount of precipitation in a hydrological basin, the number of cars utilizing a ramp in a highway per unit time, the strength of concrete cylinders, the intensity of earthquakes, etc., are examples of random variables.

Summary on examples

Uncertainties in the input and system response of systems is available in all sub-specializations of Civil Engineering, particularly in hydrology and transportation engineering, and in

Environmental Engineering. Thus the need for civil and environmental engineering students to learn the basics of probability and statistics. This course is aimed at introducing you to those concepts, at applying them to engineering systems, and at showing you the techniques and software that can help you accomplish such applications.

What is statistics?

Statistics can be roughly defined as the science and technology of data analysis. Statistical techniques are aimed at processing large amounts of data and reducing the data to a more manageable set of measures or graphics of easy interpretation. The data thus reduced can be used by decision-makers in providing solutions to specific problems.

The use of statistics originated with the needs of city governments to keep track of population sizes for the purposes of taxation and health services.

What is probability?

A probability, as defined in subsequent lectures, is a number between 0 and 1 that provides an estimate of how likely an event is to occur. Roughly defined, therefore, probability theory is the science of prediction of the behavior of random systems. In this course we will present the concepts of probability and present commonly used probability distributions applied to engineering systems.

(6)

Introduction to CEE 3030 – page 6

collection of all possible values of a measurement or the collection of all individuals of interest in a study. A sample is a sub-set of the population used with the purpose of obtaining

statistical measurements. Statistical inference is the process by which statistical

measurements from a sample are used to infer information about the population from which the sample was taken.

Some basic techniques of statistical inference will be presented in this class, including:

• Estimation: use of sample measurements to estimate population parameters

• Hypothesis testing: use of sample measurements to verify statements about the population

• Regression: ways to relate two variables involved in a process based on statistical analysis.

(7)

Sample space and events, set theory

by Gilberto E. Urroz, January 2003/updated August 2005

Sample space and events

A sample space, Ω, is the set of all possible outcomes of an experiment (the term experiment is used here to mean occurrences of any event, and not necessarily a scientific experiment).

For example, tossing a coin can produce only the events “head” (H) or “tail” (T), therefore, the sample space corresponding to this experiment will be Ω = {H, T}. Casting a die, on the other hand, produces 6 possible outcomes, thus, its sample space is the set Ω = {1, 2, 3, 4, 5, 6}.

An event is a sub-set of the sample space, for example, the event described as “obtaining an even number while casting a die” can be described as the set A = {2, 4, 6}. On the other hand, the event described as “obtaining an odd number while casting a die” can be written as B = {1, 3, 5}.

Sets

In terms of mathematical set theory, we say, for example, that element 2 belongs to set A, and write it as 2∈A. To indicate, for example, that element 3 does not belong to A, we use the notation: 3∉A. We also say that set A is a sub-set of the sample space Ω, and write this statement as A ⊂ Ω. For the case under consideration we can also write B ⊂ Ω.

The sample space, Ω, which contains all possible outcomes of an experiment, is also referred to as the “universe” or “universal set”. A set that contains no elements is referred to as the empty set, ∅ = { }. The definition of the empty set implies that it is a subset of all other sets, i.e., ∅ ⊂ Ω, ∅ ⊂ A, and ∅ ⊂ B.

Given a set A⊂ Ω,A ≠ Ω, we define the complement of set A as the set A’ such that the elements of A’ are those elements of Ω not contained in A. For example, for the case of the sample space for casting a die, the complement of set A is A’ = {1, 3, 5} = B. For that particular experiment we can also write B’ = A.

By definition Ω’= ∅, and ∅’= Ω .

Set operations

The union of two sets is the set that results from incorporating the elements that belong to A or B, or to both. For example, consider as the universe the set of digits in the decimal system, i.e., Ω = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, and consider the sets A = {0, 1, 2, 3, 4}, and B = { 3, 4, 5, 6}. The union of A and B, written as A ∪ B, is the set, C = A∪B = {0, 1, 2, 3, 4, 5, 6}. Notice that the elements that are common to both A and B, i.e., 3,4, are included in the union only

(8)

Sample spaces and event – Page 2

A = set of natural numbers (i.e., positive integers) that are multiples of 2 = {2, 4, 6, 8, 10, …}

A is the set of the even numbers, and it is an infinite set. Also,

B = set of natural numbers that are multiples of 3 = {3, 6, 9, 12, 15, ….}.

Now, define the event C as follows:

C = set of natural numbers that are multiples of 2 or multiples of 3 = {2, 3, 4, 6, 8, 9, 10, 12, 14, 15, …}. The logical particle “or” in this case suggests a union, i.e., C = A∪B.

The event D is defined as

D = set of natural numbers that are multiples of 2 and 3, simultaneously = {6, 12, 18, 24, …}.

In this case, the logical particle “and” indicates an intersection.

From the definition of the universal set, Ω, and the empty set, ∅, for any set A ⊂ Ω, the following properties hold true:

A∪Ω = Ω, A∩Ω = A, A∪∅=A, A∩∅=∅ , A∪A’ = Ω, A∩A’ = ∅.

Venn diagrams

Venn diagrams are geometrical figures used to represent mathematical set. The universal set, Ω, is typically represented as a rectangle, while its subsets are represented as circles within the rectangle. The figures below illustrate the operations of union and intersection of sets A and B using Venn diagrams.

The complements of sets A and B are shown as shaded areas in the following Venn diagrams:

Venn diagrams can be used to obtain general results for set operations. For example, you can use Venn Diagrams to prove the following set identity: (A∪B)’ = A’∩B’. The set (A∪B)’ is the complement of A∪B, represented by the shadowed area in the following Venn diagram:

(9)

The same area represents the intersection of the complements of A and B as follows from superimposing the corresponding Venn diagrams for A’ and B’.

Examples

Example 1 – Work hours in hydraulics repair shop. Suppose that you are analyzing the number of hours per day that a technician spends fixing a standard centrifugal pump in a hydraulics outfit. Since you have to pay the technician by the hour, you are only interested in integer number of hours. And since you do not pay overtime, you allow only a maximum of 8 hours per day. The sample space for these observations is, therefore, given by the following set:

S = {0, 1, 2, 3, 4, 5, 6, 7, 8}

Suppose that you define the following events:

A = {the technician spends an even number of hours} = {2, 4, 6, 8}

B = {the technician spends an odd number of hours } = {1, 3, 5, 7}

C = {the technician spends from 3 to 6 hours per day} = {3, 4, 5, 6}

D = {the technician spends from 1 to 4 hours per day} = {1, 2, 3, 4}

E = {the technician spends 7 or 8 hours per day} = {7, 8}

Venn Diagram. The following Venn diagram illustrates the sample space and the events shown above:

(10)

0 ∈ S, 1 ∈ B, 1 ∈ D, 2 ∈ A, 2 ∈ D, 3 ∈ B, 3 ∈ D, 4 ∈ A, 4 ∈ D, 5 ∈ B, 5 ∈ C, 6 ∈ A, 6 ∈ C, 7 ∈ B, 7 ∈ E, 8 ∈ A, and 8 ∈ E.

Unions - The union of events A and B is A ∪ B = {1, 2, 3, 4, 5, 6, 7, 8}. In words, this new set can be described as A ∪ B = {the technician works an odd or an even number of hours per day}.

This particular result is illustrated in the figure below. Other union operations possible are the following:

A ∪ E = {2, 4, 6, 7, 8}, A ∪ D = {1, 2, 3, 4, 6, 8}, A ∪ C = {2, 3, 4, 5, 6, 8}, E ∪ D = {1, 2, 3, 4, 7, 8}, E ∪ C = {3, 4, 5, 6, 7, 8}, D ∪ C = {1, 2, 3, 4, 5, 6}, B ∪ E = {1, 3, 5, 7, 8}, B ∪ D = {1, 2, 3, 4, 5, 7}, B ∪ C = {1, 2, 4, 5, 6, 7}

Figure illustrating the operation A ∪ B.

Interceptions. Sets A and B have no common elements, therefore, their interception is the null event (or empty set), i.e., A ∩ B = ∅. Sets A and C have common elements {4, 6}, therefore, A ∩ C = {4, 6}, i.e., elements that belong to A and C, simultaneously. This operation is illustrated in the figure below.

Figure illustrating the operation A ∩ C

(11)

Other interception operations are shown below:

A ∩ E = {8}, A ∩ D = {2, 4}, A ∩ C = {4, 6}, E ∩ D = ∅, E ∩ C = ∅, D ∩ C = {3, 4}, B ∩ E = {7}, B ∩ D = {1, 3}, B ∩ C = {3, 5}

Because the interception of sets A and B is empty, A and B are said to be “mutually exclusive events”. In other words, the number of hours cannot be odd or even at the same time. Events E and D, and E and C are also mutually exclusive.

Subtractions. The set (B-C) is represented by the shaded area below. It consists of the elements in B that are not contained in C, i.e., B - C = {1, 7}.

Other subtraction operations are shown below:

A - E = {2, 4, 6}, A - D = {8, 6}, A - C = {2, 8}, B - E = {1, 3, 5}, B - D = {5, 7}, C - D = {5, 6}, A - B = A = {2, 4, 6, 8}, E - D = {7, 8}, E - A = {7}, D - A = {1, 3}, C - A = {3, 5} , E - B = {8}, D - B = {2, 4}, C - B = {4, 6}, D - C = {1, 2}, B - A = B = {1, 3, 5, 7}, D - E = D = {1, 2, 3, 4}.

Complements. The complement of set A is A’, consisting of elements of S not contained in A, i.e., A’ = {0, 1, 3, 5, 7}. This is illustrated in the figure below.

(12)

Sample spaces and event – Page 6 Here are other complement operations for this example:

B’ = {0, 2, 4, 6, 8}, C’ = {0, 1, 2, 7, 8}, D’ = {0, 5, 6, 7, 8}, E’ = {0, 1, 2, 3, 4, 5, 6}.

Example 2 -Service station load. A service station for the state department of transportation includes 6 service bays for trucks. A transportation engineer analyzing the service station performance is asked the following questions:

(a) Describe the space event in terms of the number of trucks being serviced at the service station at any given time.

(b) Describe the event A = {there are 3 or more trucks being serviced}.

(c) Describe the event B = {there is an even number of trucks being serviced}.

(d) Describe the event A’ (complement of A).

(e) Describe the event B’ (complement of B).

(f) Describe the event A∪B.

(g) Describe the event A∩B.

Solution

(a) Since the service station has a maximum of 6 truck service bays, the number of trucks that can be serviced at any given time is 0, 1, 2, 3, 4, 5, or 6. Thus, the sample space, S, is given by S = {0, 1, 2, 3, 4, 5, 6}.

(b) The event A = {there are 3 or more trucks being serviced} = {3, 4, 5, 6}.

(c) The event B = {there is an even number of trucks being serviced} = {2, 4, 6}.

The sample space, and the events A and B, described above, can be illustrated in the following Venn diagram for mathematical sets:

(d) The event A’ is composed of the elements of the sample space not contained in A, i.e., A’ = {0, 1, 2}.

(e) The event B’ is given by B’ = {0, 1, 3, 5}.

Events A’ and B’ are illustrated, respectively, by the shaded areas in the following figures:

(13)

(f) The event A∪B consists of those elements in A or B or in both, thus, A∪B = {2, 3, 4, 5, 6}.

(g) The event A∩B consists of those elements contained simultaneously in both A and B, i.e., A∩B = {4, 6}.

Events A∪B and A∩B are illustrated, respectively, by the shaded areas in the following figures:

Example 2. Design of a auditorium balcony. An structural engineer is preparing a design for an auditorium balcony under the assumption that a maximum of 10 persons will be allowed, and that the weight of a person is either 50 kg or 100 kg. (a) Describe the sample space in terms of the number of people occupying the balcony at any given time. (b) Describe the event A = {there are more than 6 people in the balcony}. (c) Describe the event B = {the total load on the balcony is 750 kg}.

Let x be the number of people weighting 50 kg and let y be the number of people weighting 100 kg. The design problem requires that x + y ≤ 10, x ≥ 0, y ≥ 0. Also, for a given set of values x, y, the weight on the balcony is given by W = 50x + 100y. (a) The sample space can be illustrated using a x-y graph as follows:

(14)

Each circle in the graph represents the intersection of integer values (x,y) such that x + y ≤ 10, x ≥ 0, y ≥ 0. The sample space could be written explicitly as follows:

{(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (0, 10), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (7, 0), (7, 1), (7, 2), (7, 3), (8, 0), (8, 1), (8, 2), (9, 0), (9, 1),

(10, 0)}

(b) To describe the event A = {there are more than 6 people in the balcony}, we can use the following inequalities A = {x + y ≤ 10, x + y > 6, x ≥ 0, y ≥ 0}. This is equivalent to the following graph:

Alternatively, A = {(0, 7), (0, 8), (0, 9), (0, 10), (1, 6), (1, 7), (1, 8), (1, 9), (2, 5), (2, 6), (2, 7), (2, 8), (3, 4), (3, 5), (3, 6), (3, 7), (4, 3), (4, 4), (4, 5), (4, 6), (5, 2), (5, 3), (5, 4), (5, 5), (6, 1), (6, 2), (6, 3), (6, 4), (7, 0), (7, 1), (7, 2), (7, 3), (8, 0), (8, 1), (8, 2), (9, 0), (9, 1), (10, 0)}.

(c) To describe the event B = {the total load on the balcony is 750 kg}, we can write B = { x + y

≤ 10, 50x + 100y = 750, x ≥ 0, y ≥ 0 }, since the weight is calculated by using W = 50x + 100y.

Checking every point in the original sample space [see part (a)], it turns out that B = {(1, 7), (3, 6), (5, 5)}, corresponding to weights of 750 kg. Event B is illustrated in the following graph:

(15)

NOTE: Examples 2 and 3 are taken from Kottegoda, N. T. and R. Rosso, 1997, Statistics, Probability, and Reliability for Civil and Environmental Engineers, The McGraw-Hill Companies, Inc., New York

Exercises

[1]. Referring to the sample space for the number of hours per day used in fixing a centrifugal pump defined above (Example 1), suppose that you define the following events:

A = {2, 4, 6, 8} (even numbers) B = {1, 2, 5, 7} (odd numbers) C = {3, 6} (multiples of 3) D = {4, 8} (multiples of 4)

(a) Draw a Venn diagram showing the four events within the sample space

(b) Determine the complements of each of the events A, B, C, and D, i.e., A’, B’, C’, D’

(c) Determine the following union operations: A∪B, A∪C, A∪D, B∪C, B∪D, C∪D (d) Determine the following intersection operations: A∩B, A∩C, A∩D, B∩C, B∩D, C∩D (e) Determine the following subtraction operations: A - B, A - C, A - D, D - A, C- A, B - A (f) Determine the following union operations: A∪B’, A’∪C, B∪D’, B’∪C

(g) Determine the following intersection operations: A’∩B, A∩C’, B’∩D, B∩C’

(h) Verify that (A∪B’) = A’∩B’, and (A∩B)’ = A’∪B’

(i) Check whether the following statements are true or false: 4∈D, 4∉A, 2∈A, 2∉D, 6∉A, 6∈C, 3∈C, 3∉B, D ⊂ A, C ⊂ A, B ⊄ A

[2]. A small computer room includes only 7 computers. The computer room monitor is performing a study of computer usage and is asked to determine the following:

(a) Describe the space event in terms of the number of computer users at any given time.

(b) Describe the event A = {there are 4 or less computer users at the computer room}.

(c) Describe the event B = {there is an odd number of computer users}.

(d) Describe the event A’ (complement of A).

(e) Describe the event B’ (complement of B).

(f) Describe the event A∪B.

(g) Describe the event A∩B.

[3]. An structural engineer is preparing a design for a ramp at a harbor under the assumption that a maximum of 6 container-trucks will be allowed in the ramp at any given time. It is also assumed that the weight of a container-truck is either 5 kips or 10 kips. (a) Describe the sample space in terms of the number of container-trucks occupying the ramp at any given time. (b) Describe the event A = {there are more than 3 container-trucks in the ramp}. (c) Describe the event B = {the total load on the ramp is 35 kips}.

(16)

Probability Definitions – Page 1

Definitions, axioms and theorems of probability

By Gilberto E. Urroz, January 2003

Classical approach to probability. The classical approach to probability requires that the sample space, S, of the experiment or observation being performed be known. The number of total outcomes of the experiment or observation, n, will be the number of elements in the sample space. If an event A, with h outcomes, is defined, the probability of event A is defined as

n A h P ( ) =

.

Please notice that this definition assumes that each outcome in S or A is equally likely to occur.

Example 1. Throwing a coin. Defining the sample space is relatively easy for experiments related to games of chance. For example, throwing a coin has only two possible outcomes, S = {H, T}, or n = 2. The event “head” is given by {H}, having only one element, or h = 1. Thus, the probability of “head” is P(“head”) = h/n = ½ = 0.5. Similarly, the probability of “tail” is P(“tail”) = 0.5.

In this experiment, if we are interested in the number of “tails” the sample space can be described as S = {0, 1}. We can, thus, write P(0) = ½ and P(1) = ½ .

Example 2. Throwing a single die. The experiment consisting of throwing an unloaded die has 6 possible outcomes, i.e., S = {1, 2, 3, 4, 5, 6}. Thus, n = 6. Suppose you define the following events:

• A = “I get an even number” = {2, 4, 6}

• B = “I get an odd number” = {1, 3, 5}

• C = “I get a number that is a multiple of 3” = {3, 6}

The corresponding number of outcomes for these events are hA = hB = 3, and hC = 2. Thus, the corresponding probabilities are:

• P(A) = h_A/n = 3/6 = ½ = 0.5

• P(B) = hB/n = 3/6 = ½ = 0.5

• P(C) = h_C/n = 2/6 = 1/3 = 0.3333…

These results are interpreted by saying “the probability of getting an even number when throwing an unloaded die is one half”, “the probability of getting an odd number when throwing an unloaded die is one half”, and “the probability of getting a number that is a multiple of 3 when throwing an unloaded die is one third”.

Example 3. Throwing two coins. The experiment of throwing two coins has the following sample space, S = {(H,H), (H,T), (T,H), (T,T)}. This result can be developed by putting together the following tree diagram of the experiment. The first branching of the tree corresponds to throwing the first coins, producing two possible outcomes, H and T. Out of each of those branches two possible outcomes are possible corresponding to throwing the second coin. The combinations of outcomes are shown to the right of the tree diagram.

(17)

Since each of the possible outcomes is equally likely (unless the coins are loaded), we can write:

P[(H,H)] = P[(H,T)] = P[(T,H)] = P[(T,T)] = 1/4

Define now the following events:

• A = “we get one tail when throwing two coins” = {(H,T), (T,H)}

• B = “we get two heads when throwing two coins” = {(H,H)}

• C = “we get two tails when throwing two coins” = {(T,T)}

• D = “we get two tails or two heads when throwing two coins” = {(H,H),(T,T)}

The number of equally likely outcomes in the sample space is n = 4, while the number of equally likely outcomes in events A, B, C, and D, are hA = 2, hB = 1, hC = 1, and hD = 2, respectively. The corresponding probabilities for the four events listed above are:

P(A) = h_A/n = 2/4 = 0.50 P(B) = h_B/n = 1/4 = 0.25 P(C) = hC/n = 1/4 = 0.25 P(D) = hD/n = 2/4 = 0.50

Example 4. Throwing two dice. The following table illustrates the possible outcomes when throwing two fair dice. We describe the outcome as the sum of the values obtained in the two dice, i.e., if you get a 2 in the first die and a 4 in the second die, the outcome is the value 6 (=

2+4). The sample space for throwing two dice can be represented then as S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

First die

1 2 3 4 5 6

1 2 3 4 5 6 7

Second 2 3 4 5 6 7 8

die 3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

The color coding in the table allows us to easily count the number of outcomes corresponding to the different events associated with each value in S, i.e., h₂= h₁₂ = 1, h₃ = h₁₁ = 2, h₄ = h₁₀ =

(18)

Probability Definitions – Page 3 P(6) = P(8) = 5/36 = 0.13888…

P(7) = 6/36 = 1/6 = 0.1666….

Getting a 7 has the highest probability of all outcomes. That is the reason why most people will bet for 7 in the game of craps.

The probability results can be summarized in the following table:

X P(X) 2 0.0278 3 0.0556 4 0.0833 5 0.1111 6 0.1389 7 0.1667 8 0.1389 9 0.1111 10 0.0833 11 0.0556 12 0.0278

In a subsequent chapter we’ll learn that the different outcomes of throwing two dice can be treated as the different values of a discrete random variable (i.e., a variable representing the outcome of a random experiment that can take only a limited number of values).

Example 5. Computer system. To set up a computer room in a small chemistry lab the director is considering buying either 1, 2, or 3 Apple Macintosh desktop computers and 1 or 2 PC desktop computers. All possible combinations of computers are to be considered. The following diagram shows the sample space of the computer combinations under consideration:

We can also write out the sample space as a set of ordered pairs,

S = {(1M,1P), (1M,2P), (2M,1P), (2M,2P), (3M,1P), (3M,2P)}

Assuming that each element in the sample space is equally likely to be selected before final approval, then the total number of outcomes in the sample space is n = 3x2 = 6.

(19)

We now define the following events:

M1 = “1 Mac is selected” = {(1M,1P),(1M,2P)}

M2 = “2 Macs are selected” = {(2M,1P),(2M,2P)}

M3 = “3 Macs are selected” = {(3M,1P),(3M,2P)}

PC₁ = ‘1 PC is selected” = {(1M,1P),(2M,1P),(3M,1P)}

PC2 = “1 PC is selected” = {(1M,2P),(2M,2P),(3M,2P)}

The number of outcomes in each of these events is given by h(M1) = h(M2) = h(M3) = 2, h(PC1) = h(PC2) = 3. The corresponding probabilities are:

P(M1) = P(M2) = P(M3) = 2/6 = 1/3 P(PC1) = P(PC2) = 3/6 = 1/2

Frequency approach to probability. In many instances it is not possible to enumerate all possible outcomes of an experiment or observation. For example, suppose that we are studying the occurrence of floods during the spring meltdown in a low-lying area of a river basin. The outcome of this observation each year is either “flood” or “no flood”. The sample space for this observation basically entails an infinite number of these outcomes, including those recorded in previous years as well as those in the future of which we know anything. In this case it is impossible to construct a sample space, instead, we rely on records to come up with an estimate of the probability of flood.

The frequency approach to calculate the probability of an event A entails determining the number of times in which event A occurs, h, out of the total number of recorded outcomes available, n. In this case, the probability is approximated by

n A h P ( ) =

.

Example 6. River flooding. You are hired by a construction company to determine the probability of flood in the low-lying areas of the Blacksmith Fork River near Providence. After charging them a hefty amount of money, you request the records of flood levels in the area from the Utah Division of Water Resources and find that, out of the flood observations kept for the last 150 years (n = 150) there has been a flood recorded at the location of interest during 20 of those years (h = 20). The event of interest is A = “a flood occurs at the low-lying areas of the Blacksmith Fork River near Providence”. The probability of this event is determined as

P(A) = h/n = 20/150 = 2/15 = 0.1333…

i.e., there is a flood at the location of interest approximately 13 out of 100 years.

(20)

25). The event of interest is A = “a car turns left at the intersection of Main Street and Center Street in Richmond, Utah, during the work day hours of 8 am and 4 pm on a Wednesday”. The corresponding probability is:

P(A) = h/n = 25/550 = 1/10 = 0.1

i.e., one out of every 10 cars turned left during the period of observation.

NOTES:

(1) The larger the number of records available, the better the approximation to the probability of the event of interest. In many cases, the availability of records is limited (e.g., reliable weather records for the western U.S. go back only 100 to 150 years). In these cases, we have to work with what is available and use the best approximation to the probability of interest.

(2) Frequency approximations to the probability of an event for which a sample space exist can be used to verify the probability calculated by the classical approach. For example, when throwing a die, the probability of getting any of the 6 numbers in the die faces is 1/6 (See Example 1, above). However, a die may be loaded so that one of the numbers is more likely to show up than the others. You can verify the fairness of the die by throwing it a large number of times and recording the numbers that come up in every throw. If, after throwing the die, say, 1000 times, you find that all numbers come up with about the same frequency, you may conclude that the die is fair.

Personal probability. On a personal level we use the concept probability on a daily basis when making decisions about every little thing in life. For example, we select to attend a movie because we believe there is a high probability that we will be entertained in the process. Or one may decided against sending a job application for a particular position because he or she decides the probability of getting hired is small. These types of probability estimates (personal probability) will not be very reliable in engineering applications because they are very difficult to quantify.

In some instances, however, personal probability is the only value we have to work with. For example, suppose that you are doing a risk analysis of failure of a system of small dams where no records have been kept. Suppose, however, that you have access to the people that have been operating the dam or who live in the nearby area. You could ask them what are their estimates of the frequency of overtopping of the dam (i.e., when the water levels were higher than the height of the dam), and use those estimates as the probability of overtopping. The source of this information should be provided in the risk analysis report to give the decision makers (that will use this information) the opportunity of using their own sense of personal probability to decide whether to accept or not these results in their decision making activities.

Probability axioms and theorems. This is a summary of the axioms and theorems typically used in calculations of probabilities. In these formulas S represents the sample space, ∅ is the null event or empty set, and A, B, C, etc., are events of interest.

• 0 ≤ P(A) ≤ 1, P(S) = 1, P(∅) = 0

• P(A’) = 1 - P(A)

• If A∩B = ∅ (mutually exclusive events), P(A or B) = P(A∪B) = P(A) + P(B)

• For any events A and B, P(A or B) = P(A∪B) = P(A) + P(B) - P(A∩B)

• Since A = (A∩B)∪(A∩B’), and (A∩B) and (A∩B’) are mutually exclusive, P(A) = P(A∩B)+ P(A∩B’)

(21)

Example 8. Erosion control testing. Suppose that we conduct an experiment in which we evaluate how effective an erosion control product is in controlling erosion in a hill slope of a given slope and subject to a specific rainfall event. In a small rainfall simulator we run 100 tests of the same soil built into a simulated hill slope, subjecting it to 2 in/hr of rain for 1 hour. During the tests we are able to identify three mechanisms of failure, which we describe as the following events (also included is an event indicating that no failure was detected):

• A = hill slope fails by sliding (15 tests out of 100) P(A) = 15/100 = 0.15

• B = hill slope fails by riling (20 tests out of 100) P(B) = 20/100 = 0.20

• C = hill slope fails by sinkage (5 tests out of 100) P(C) = 5/100 = 0.05

• D = hill slope does not fail (60 tests out of 100) P(D) = 60/100 = 0.60

The experiments are conducted so that only one mechanism is identified as responsible for the hill slope failure. This makes events A, B, and C mutually exclusive (i.e., if a hill slope fails because of sliding it did not fail because of riling nor because of sinkage). Obviously, event D is mutually exclusive with respect to events A, B, and C, since the experiment either shows failure (A, B, or C) or not (D). Some probabilities that can be calculated are the following:

P(D’) = P(failure) = 1 - P(D) = 1 - 0.60 = 0.40,

Also, failure = (A or B or C) = A∪B∪C, and since A, B, and C are mutually exclusive, then P(failure) = P(A∪B∪C) = P(A) + P(B) + P(C) = 0.15 + 0.20 + 0.05 = 0.40

P(hill slope fails by sliding or riling) = P(A or B) = P(A∪B) = P(A) + P(B) = 0.15 + 0.20 = 0.35 P(hill slope fails by sliding or sinkage) = P(A or C) = P(A∪C) = P(A) + P(C) = 0.15 + 0.05 = 0.20

P(hill slope fails by riling or sinkage) = P(B or C) = P(B∪C) = P(B) + P(C) = 0.20 + 0.05 = 0.25

Example 9. Water quality sampling. Water specimens are taken, for 100 consecutive days, at the outlet of a sewage processing plant in a factory right at the point in which the outlet discharges on a small stream. The water specimens are then taken to the chemistry laboratory to determine the level of concentrations of two particular metals, Cadmium (Cd) and Mercury (Hg). The following events are defined:

A = a significant concentration of Cd was detected (5 out of 100) P(A) = 0.05 B = a significant concentration of Hg was detected (10 out of 100) P(B) = 0.10 A∩B = significant concentrations of Cd and Hg were detected (3 out of 100) P(A∩B) = 0.03 Since P(A∩B) ≠ 0, A∩B ≠ ∅, i.e., events A and B are not mutually exclusive (this is interpreted as saying that finding a significant level of Cd does not preclude finding a significant level of Hg). Thus, if we were to calculate the probability that a sample contains significant levels of Cd or of Hg, we write:

(22)

P(A∩B’) = P(A) - P(A∩B) = 0.05 - 0.03 = 0.02 = P(significant Cd but not significant Hg)

Since (A∩B’)’ = A’∪(B’)’ = A’∪B, [here we use the fact that (B’)’ = B] then

P(A’∪ B) = P[(A∩B’)’] = 1 - P(A∩B’) = 1 - 0.02 = 0.98 = P(not significant Cd or significant Hg)

Also, P(A’∪ B) = P(A’) + P(B) - P(A’∩B), thus,

P(A’∩B) = P(A’) + P(B) - P(A’∪ B) = 0.95 + 0.10 - 0.98 = 0.07 = P(not significant Cd but significant Hg)

Since (A’∩B)’ = (A’)’ ∪B’ = A∪B’ [Using (A’)’ = A], then

P(A∪B’) = P[(A’∩B)’] = 1 - P(A’∩B) = 1 - 0.07 = 0.93 = P(significant Cd or not significant Hg)

Exercises

[1]. A game of chance consists on throwing a fair coin three times.

(a) Draw a tree diagram showing all possible outcomes of this experiment.

(b) Write out the sample space as a set of ordered triplets [An ordered triplet will be something like (H,T,H)].

(c) Write out the following events:

• A = “there is exactly one head (H) in the outcome”

• B = “there are exactly two heads (HH) in the outcome”

• C = “there are exactly three heads (HHH) in the outcome”

(d) Determine the probabilities of events A, B, and C.

(e) Suppose that the variable X represents the number of tails (T) in the outcomes from this game. What are the possible values that X can take?

(f) Determine the probabilities associated with the different values of X from this game.

[2]. A special type of die is shaped as a tetrahedron (a four-faced pyramid) as shown below.

Suppose that you design a game of chance in which two tetrahedral dice are thrown.

(a) Construct a table similar to that of Example 3 if the outcome of interest is the sum of the outcomes of each die.

(b) Determine the probabilities of all possible outcomes of the game.

(c) Suppose that the game is changed so that the outcome of interest is the product of the two numbers shown in the tetrahedral dice.

Construct a table similar to that of Example 3 for this case.

(d) Determine the probabilities for the outcomes found in (c).

(23)

[3]. In the process of building a pumping station you are faced with the choices of including one, or two centrifugal pumps, and one, two or three axial pumps.

(a) Construct a diagram similar to that of Example 5, indicating all possible combinations of pumps.

(b) Write out the sample space for this pumping station from the diagram in (a) (c) Write out the sets of ordered pairs describing the following events:

C1 = “1 centrifugal pump is selected”

C2 = “2 centrifugal pumps are selected”

A₁ = “1 axial pump is selected”

A2 = “2 axial pumps are selected”

A3 = “3 axial pumps are selected”

(d) Determine the probabilities corresponding to each of the events in (c).

[4]. The water content in a plot next to a stream is monitored on a daily basis to determine if irrigation is needed to produce a crop. The observations are recorded as “above requirements” or “below requirements”, indicating whether the water content is satisfactory or not for the crop under consideration. Records from an entire year of observations indicate that the water content was “below requirements” for 120 days. Let A = “water content is above requirements”. Determine,

(a) P(A)

(b) P(“water content is below requirements”)

[5]. To prepare the management policy of a 30-ft-high hydroelectric dam the water level behind the dam is monitored on a daily basis. The 30-ft range of water depths is divided into three regions: A = “depth between 0 and 10 ft”, B = “depth between 10 and 20 ft”, and C =

“depth between 20 and 30 ft”, as illustrated in the figure below.

Records maintained for the last 10 years (3650 values) indicate that water was recorded in

(24)

(e) What is the probability that the water level on a given day is recorded in a region other than A?

(f) What is the probability that the water level on a given day is recorded in a region other than B?

(g) What is the probability that the water level on a given day is recorded in a region other than C?

[6]. While monitoring a small basin in the Cascade Mountains, you find that, on a typical year of 365 days, rain only (i.e., rain and no snow) is observed during 150 days, snow only (i.e., no rain and snow) is observed during 50 days, and snow and rain together are observed during 30 days of the year. Define the following events: A = “rain was observed” (regardless of whether snow was observed), B = “snow was observed” (regardless of whether rain was observed), C =

“no precipitation observed”.

(a) Determine the probabilities P(A), P(B)

(b) Determine the probability that “rain was observed or snow was observed”

(c) Determine the probability that “no rain was observed”

(d) Determine the probability that “no snow was observed”

(e) Determine the probability that “rain was not observed or not snow was observed”

(f) Determine the probability that “no precipitation was observed”

(25)

Conditional probability/Bayes theorem/

Combinatorial analysis

By Gilberto E. Urroz, January 2003

Conditional probability. Conditional probability is the probability associated with an event, say A, given the occurrence of a related event, say B. For example, when throwing a fair die you may be interested in determining what is the probability of getting a 3 given that the number selected is odd. In this example of conditional probability, the event of interest is A =

“getting a 3”, and the condition is B = “the number is odd”. The notation for conditional probability is P(A|B) interpreted as “the probability of A given B” or “the probability of the occurrence of A given that B has occurred.” The corresponding definition is

) . (

) ) (

|

( P B

B A B P

A

P = ∩

For the events A and B defined above, we have P(A∩B) = P(“3 and odd”) = 1/6, P(B) = P(“odd number”) = 3/6, thus,

P(A|B) = P(“3 given odd”) = (1/6)/(3/6) = 1/3.

Suppose that events A and B belong to a sample space S. The probabilities P(A∩B) and P(B) are calculated by the classical approach, i.e., P(A∩B) = hA∩B/n, and P(B) = hB/n, where n is the total number of outcomes in S, and hA∩B and hB are the number of outcomes in events A∩B and B, respectively. Thus, the conditional probability can also be written as

B B A B

B A

h h n h

n B h

A

P = ^∩ = ^∩

/ ) /

| (

Notice that this latter result is similar to the classical definition of probability if we think of B as a “reduced” sample space, i.e., the condition (B) becomes the new sample space in the definition of conditional probability.

Example 1. Highway closing under snow conditions. Suppose we are interested in determining the probability that a high-elevation highway will remain open under snow conditions. Our records indicate that for the 3-month winter period (n = 90 days), snow in amounts significant enough to affect the traffic conditions in the highway of interest is recorded during 30 days in a typical year. Also, records show that during 15 days in the winter period snow conditions produce closure of the highway. If we define the events B = “significant snow observed in the winter months” and A = “highway closure”, the data indicates that

P(B) = P(“significant snow observed in the winter months”) = 30/90 = 1/3,

(26)

Conditional probability – Page 2

Theorems on conditional probability. The following are two important theorems related to conditional probability:

(a) For any three events A1, A2, and A3 the following relationship holds true:

P(A₁∩A₂∩A₃) = P(A₁)P(A₂|A₁)P(A₃|A₁∩A₂).

Example 2. Defective computer chips. Suppose you are in the process of fixing a computer by replacing three identical computer chips and you have a container with 20 computer chips from which to select the replacements. The chips are selected at random.

Unbeknownst to you, 5 of the computer chips in the container are defective. What is the probability that you would select three defective chips for your computer repair?

Solution. Let A1, A2, A3 be the events that you select a defective computer chip in the 1^st, 2^nd, and 3^rd picks out of the container. Thus, you are interested in calculating

P(A1∩A2∩A3) = P(A1)P(A2|A1)P(A3|A1∩A2)

In this formula, P(A1) is the probability that a defective chip is selected in the first trial.

Since there are 5 defective chips out of 20 chips, P(A1) = 5/20 = ¼ = 0.25. P(A2|A1) is the probability that the second chip is defective given that the first was defective. Now, if the first chip was defective then there remain 4 defective chips out of 19 chips in the

container, thus, P(A₂|A₁) = 4/19. Finally, P(A₃|A₁∩A₂) is the probability that you get a defective chip in the third pick given that first two chips were defective. If chips 1 and 2 were defective, there remain 3 defective chips out of 18 in the container, thus,

P(A3|A1∩A2) = 3/18 = 1/6. Finally,

P(A₁∩A2∩A3) = P(A₁)P(A₂|A₁)P(A₃|A₁∩A2) = (1/4)(4/19)(1/6) = (1x4x1)/(4x19x6) = 1/114 = 0.00877

(b) If an event A must result in one of the mutually exclusive events A1, A2, …, An, then P(A) = P(A1)P(A|A1) + P(A2)P(A|A2) + … + P(An)P(A|An)

The event A and its relation to the mutually exclusive events A1, A2, …, An, is illustrated in the following Venn diagram:

Notice the equation for P(A) is equivalent to

P(A) = P(A∩A1) + P(A∩A2) + … + P(A∩An).

(27)

Example 3. Irrigation methods. While conducting a study on the effects of different irrigation methods on a given crop, you define the following events:

• A₁ = sprinkler irrigation

• A2 = steady furrow irrigation

• A₃ = surge furrow irrigation

• A4 = drip irrigation

Based on your evaluation of 50 experimental plots, you find that 20 plots use sprinkler irrigation, 15 plots use steady furrow irrigation, 8 plots use surge furrow irrigation, and 7 plots use drip irrigation to irrigate the same type of crop. Thus,

P(A1) = 20/50 = 2/5, P(A2) = 15/50 = 3/10, P(A3) = 8/50, and P(A4) = 7/50.

You also find that the crop is successful if using sprinkler irrigation 85% of the time, if using steady furrow irrigation 90% of the time, if using surge furrow irrigation 70% of the time, and if using drip irrigation 60% of the time. Thus, if event A represents “a successful crop”, then we have that

P(A|A1) = 0.85, P(A|A2) = 0.90, P(A|A3) = 0.70, and P(A|A4) = 0.60.

Thus, the probability of a successful crop (event A) in the experimental plots that use four different irrigation systems is

P(A) = P(A1)P(A|A1) + P(A2)P(A|A2) + P(A3)P(A|A3) + P(A4)P(A|A4) = (2/5)x0.85 + (3/10)x0.90 + (8/50)x0.70 + (7/50)x0.60 = 0.806

Example 4. Highway traveling. To reach Grenoble (France) from Turin (Italy) one can follow either of two routes. The first connects Turin and Grenoble, whereas the second passes through Chambery (France), i.e., the second route is Turin-Chambery-Grenoble. During extreme weather conditions in winter, travel between Turin and Grenoble is not always possible because some parts of the highway may not be open to traffic.

Define the following events:

• A = the highway from Turin to Grenoble is open

• B = the highway from Turin to Chambery is open

• C = the highway from Chambery to Grenoble is open

(28)

In anticipation of driving from Turin to Grenoble, a traveler listens to the next day’s weather forecast. If snow is forecast for the next day over the southern Alps, one can assume (on the basis of past records) that P(A) = 0.6, P(B) = 0.7, P(C) = 0.4, P(C|A) = 0.5, and P(A|B∩C) = 0.4.

(a) What is the probability that the traveler will be able to reach Grenoble from Turin?

(b) What is the probability the traveler will be able to drive from Turin to Grenoble by way of Chambery?

(c) Which route should be taken in order to maximize the chance of reaching Grenoble?

Solution. Given P(A) = 0.6, P(B) = 0.7, P(C) = 0.4, P(C|A) = 0.5, and P(A|B and C) = 0.4, we need to find:

(a) P(A ∪ (B ∩ C) ) = P(A) + P(B ∩ C), since A and (B ∩ C) are mutually exclusive (b) P(B ∩ C)

(c) Is P(A) > P(B ∩ C) or vice versa?

First, we solve (b), i.e., find P(B ∩ C) using the data given P(C|A) = P(A ∩ C)/P(A),

by definition. From which

P(A ∩ C) = P(A)⋅P(C|A) = 0.6x0.5 = 0.3 Next, use the following equation

P(C) = P(C∩A) + P(C∩B) = P(A)*P(C|A) + P(B)*P(C|B), resulting in

0.4 = 0.6 x 0.5 + 0.7 x P(C|B), from which

P(C|B) = (0.4-0.6 x 0.5)/0.7 = 1/7 Then,

P(B ∩ C) = P(B) P(C|B) = 0.7 x (1/7) = 0.1.

Thus, (b) P(B ∩ C) = 0.1

(a) P(A ∪ (B ∩ C)) = P(A) + P(B ∩ C) = 0.6 + 0.1 = 0.7

(c) Since, P(A) > P(B ∩ C) [0.6 > 0.1], to maximize the probability of getting to Grenoble, use path A

Independent events. From the definition of conditional probability, namely, P(A|B) = P(A∩B)/P(B), it follows that

P(A∩B) = P(B)⋅ P(A|B).

This statement is valid for any two events A and B.

If events A and B are independent, the occurrence of B has no effect on the occurrence of A and P(A|B) = P(A). Replacing this result in the equation above we find that, for independent events A and B,

P(A∩B) = P(B)⋅ P(A).

(29)

It turns out that the inverse statement is also true, i.e., if P(A∩B) = P(B)⋅ P(A), then events A and B are independent. These two statements can be summarized by saying that events A and B are independent, if and only if, P(A∩B) = P(B)⋅ P(A).

Example 5. Highway construction. Highway construction in a remote area is dependent on the availability of construction workers in the area and on the weather conditions. Suppose that event A represents “availability of construction workers” and event B represents “favorable weather conditions”, and that these events are independent of each other. Previous data from the area indicates that P(A) = 0.8 and P(B) = 0.75. Suppose that we need to determine the probability of “availability of construction workers or favorable weather conditions”, i.e., P(A∪B). We will use the formula

P(A∪B) = P(A) + P(B) - P(A∩B),

and since events A and B are independent, we can also write, P(A∩B) = P(A)P(B) = 0.8x0.75 = 0.60 Thus,

P(A∪B) = P(A) + P(B) - P(A)P(B) = 0.8 + 0.75 - 0.8x0.75 = 0.95.

Notice that the probability P(A∩B) = P(“availability of construction workers AND favorable weather conditions”) is only 60%, i.e., construction will be possible only 60% of the time.

The result P(A∪B) = 0.95 simply indicates that one of the two conditions for highway

constructions (either, availability of construction workers OR favorable weather conditions, but not necessarily both) are found 95% of the time. From the point of view of predicting the ability to carry out construction activities the probability P(A∩B) = 0.60 is more important.

Example 6. Wave heights in a lake. In the process of re-designing a harbor in a lake, data is collected on wind velocity in the area as well as water temperature to check what effect these two variables have on wave height in the harbor. Of interest for the designer are the conditions A = “strong wind velocity” (registered when wind velocity is larger than 15 mph) and B = “warm waters” (registered when water temperature is larger than 70^oF). Records indicate that P(A) = 0.350, P(B) = 0.150, and P(A∩B) = 0.052. Are the events A and B, i.e., “strong wind velocity” and “warm waters”, independent?

Solution. To check for independence, we need to check that P(A∩B) = P(A)P(B). We know that P(A∩B) = 0.052, and we find that P(A)P(B) = 0.0525. We notice that P(A∩B) ≈ P(A)P(B), and we may conclude that the events A and B are indeed independent.

Bayes’ theorem or Bayes’ rule. Suppose that A1, A2, …, An, are mutually exclusive events whose union is the sample space, S (i.e., one of the events must occur). Then, if A is any

(30)

for k = 1, 2, …, n. This formula allows us to find the probabilities of the different events, A1, A2, …, An, that can cause event A to occur. This fact explains the alternative name given to Bayes’ rule: the theorem on the probability of causes.

Example 7. Irrigation methods (part 2). The events A1, A2, A3, and A4 for an irrigation study were defined in Example 3 as follows: A₁ = sprinkler irrigation, A₂ = steady furrow irrigation, A₃

= surge furrow irrigation, and A4 = drip irrigation. The corresponding probabilities found were

P(A1) = 20/50 = 2/5, P(A2) = 15/50 = 3/10, P(A3) = 8/50, and P(A4) = 7/50.

Also, if event A represents “a successful crop”, we found that

P(A|A1) = 0.85, P(A|A1) = 0.90, P(A|A1) = 0.70, and P(A|A1) = 0.60.

Bayes’ rule can be used to determine the probability that a given irrigation method was used given that a successful crop was detected, e.g., for sprinkler irrigation we can calculate

, 4218 . 806 0 . 0

34 . 0

0.60 (7/50) 0.70

(8/50) 0.90

(3/10) 0.85

(2/5)

85 . 0 ) 5 / 2 (

)

| ( ) (

)

| ( ) ) (

| (

1

1 1

1

=

× = +

× +

×

= ×

=

∑

= n

j

j P A A

A P

A A P A A P

A P

Also,

P(A2|A) = 0.335, P(A3|A) = 0.1389, and P(A2|A) = 0.1042.

These results indicate that the probability that sprinkler irrigation was used to produce a successful crop is 42.18%, etc.

Combinatorial analysis. In the evaluation of probabilities it is often necessary to determine the number of outcomes of a given event as well as that of the sample space. Combinatorial analysis provide mathematical approaches for determining those numbers particularly when the number of outcomes is large. Some simple techniques of combinatorial analysis are shown next.

Tree diagrams and the fundamental principle of counting. Tree diagrams are sketches of choices of elements showing all possible combinations. The following example illustrates this idea and introduces the fundamental principle of counting for situations to determine the total number of configurations depicted in tree diagrams.

Example 8. Putting together a computer system. Suppose that you are putting together a computer system and you can choose three different motherboards (M1, M2, M3), two different hard disks (H1, H2), and three different CD-R drives (C1, C2, C3). A three diagram can be used to illustrate all possible configurations of the computer system. Start by showing the three branches corresponding to the motherboards as the first step in the tree. Then, branch out of each tip (M1, M2, M3) with two branches corresponding to the two hard disks (H1, H2). Finally,