Constructing Box-and-Whisker Plot - Course Notes Statistics

1. Obtain the 5-letter summary of the batch and the fourth spread. The cut-offs are defined as the two point 1.5 midspreads (fourth-spread) away from the two fourths.

df = FU – FL lower cut-off = FL – 1.5df upper cut-off = FU + 1.5df 2. Construct the box. The left (bottom) end of the box is located at the lower fourth, the right

(top) end is located at the upper fourth. These are called the hinges. The median is a line located inside the box, between the two fourths.

3. The tips/caps of the whiskers are lines located at the points in the batch farthest from the hinges, but within the defined cut-offs.

4. Any cases beyond these marks are marked individually:

- outliers are points between 1.5 and 3 midspreads from the hinges, denoted by an x-mark.

- extremes are points beyond 3 midspreads from the hinges, denoted by a circle.

Example:

Population of the 15 largest US cities in 1990

City Pop'n (in 10,000s) City Pop'n (in 10,000s)

New York 778 Washington D.C. 76

Chicago 355 St. Louis 75

Los Angeles 248 Milwaukee 74

Philadelphia 200 San Francisco 74

Detroit 167 Boston 70

Baltimore 94 Dallas 68

Houston ₉₄ New Orleans ₆₃

Cleveland 88

# 15 Population of 15 Largest US

Cities in 1960

M 8 88

F 3h 74 183.5

1 63 778

dF = FU – FL = 109.5 Outside Cut-offs: (-90.25, 347.75)

Outliers: New York (778), Chicago (355)

An abstract model is a description of the essential properties of a phenomenon.

A deterministic model is a type of abstract model that describes a phenomenon through known relationships among the states and events, in which a given input will always produce the same output.

The development of probability theory was not originally intended to be used in solving inferential problems. It was first developed to give answers to professional gambler’s questions on the systematic pattern of outcomes of games involving dice or cards that will allow them to adjust their bets to the “odds” of success. This is the reason why most of the basic examples on probability theory are die-throwing experiments and the selection in a deck of cards.

Today, many important phenomena that are of interest to humankind share something in common with these games of chance. It is impossible to predict with certainty when such a phenomenon will occur. By studying patterns, we can learn more about the behavior of the phenomenon of interest and then be able to predict an occurrence of a phenomenon with a certain degree of confidence.

The use of abstract models is actually not new to many of us. We apply the mathematical formula provided by an abstract model to come up with an approximation of reality

The deterministic model is a model that we commonly encounter during the application part of Elementary Math. One example is the computation of the area of a certain rectangular piece of land. The area that you would get would always be the same every time you compute for it.

6.1 Probabilistic Models

A probabilistic/stochastic model is a type of abstract model that describes a phenomenon by assigning a likelihood of occurrence to the different possible outcomes of the process.

A random experiment is a process that can be repeated under similar conditions but whose outcome cannot be predicted with certainty beforehand.

The sample space, denoted by  (Greek letter, omega), is the collection of all possible outcomes of a random experiment. An element of the sample space is called a sample point.

An example of a stochastic model is the game that involves tossing a coin. The results of the tosses would not be certain even if it is loaded (unfair coin). In fact, no matter how many times we repeat the process, it is impossible to predict with certainty what the next outcome will be.

In inferential statistics, the process of selecting a sample of size n from a population of size N using probability sampling is one of the random experiments of interest. It is just like selecting n cards at random from a deck of N=52 cards.

Even if we use exactly the same sample selection procedure, there is no way we can predict, without any error, what the composition of the next sample will be.

We can show the sample space by using any of the various methods of listing. One example is the roster method, where we list all the possible outcomes of the experiment.

Example: Specify the sample space of the experiment of tossing a coin twice.

First we use H to denote the result of getting a head in a toss and T to denote the result of getting a tail in a toss. Then there are just four possible results therefore:

 = {HH, HT, TH, TT}

6.2 Basic Concepts of Probability

An event is a subset of the sample space whose probability is defined. We say that an event occurred if the outcome of the experiment is one of the sample point belonging in the event; otherwise, the event did not occur.

Aside from the roster method, we can specify the sample space using the rule method which is usually more preferred when the experiment has a lot of results to list entirely.

Example: Specify the sample space of the experiment of tossing a coin 1000 times.

Again, first we use H to denote the result of getting a head in a toss and T to denote the result of getting a tail in a toss. Then there are 2¹⁰⁰⁰ = 1.0715E+301 possible results therefore we would use the rule method:

 = {(x1, x2, , )| xi  {H, T} for all i }

In set theory, a subset of the universal set is a set. Since an event is a subset of the sample space (our universal set), then we can use the same notation to denote a set which is any capital Latin letter to denote an event of interest.

Example: Consider the experiment of tossing a pair of colored dice, one is red and the other one is green. Let  = {(x, y) | x  {1, 2, …, 6} and y  {1, 2, …, 6}} where x is for the red die and y for the green die. This sample space contains 36 sample points by rules of counting.

Again, first we use H to denote the result of getting a head in a toss and T to denote the result of getting a tail in a toss. Then there are 2¹⁰⁰⁰ = 1.0715E+301 possible results therefore we would use the rule method:

Some examples of events are listed below:

A = event of having the same number of dots on both dice = { (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6) }

B = event of 3 dots on the red die

= { (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6) } C = event of getting a sum of 5 dots on both dice

= { (1, 4), (2, 3), (3, 2), (4, 1) }

D = event of 7 dots on the green die = 

The impossible event is the empty set, . The sure event is the sample space, 

Two events A and B are mutually exclusive events if and only if AB=; that is, A and B have no common elements.

The event D is an example of an impossible event where we know that this event would never happen. Also, sometimes two events could occur simultaneously but also sometimes two events could never occur simultaneously. The easiest way to check if two events could happen simultaneously is to look at the sample points of both events, if they have at least one common sample point, this means that the two events can happen simultaneously otherwise, if they do not have any common sample points, these events cannot occur simultaneously which is also called mutually exclusive events.

The concept of mutually exclusive events can be extended to more than two events.

For example, three events, say A, B, and C are mutually exclusive events whenever it is impossible for any pair of these events to occur simultaneously. Mathematically speaking, AB=, AC=, and BC= must all be true.

In document Course Notes Statistics (Page 78-82)