201q_lect1.pdf

(1)

Unit 1: Data and Descriptive Statistics

QBA 201 – Summer 2013

Instructor: Michael Malcolm

1.1: Statistical inference and data

1.2: Measures of central tendency

1.3: Percentiles and quartiles

1.4: Measures of variability

1.5: Distribution of data

1.6: Measures of association

(2)

1.1: Statistical inference and data

Broadly, statistics is about collecting, analyzing and interpreting data.

In a narrower sense, the classical statement is that the objective of statistics is to make an inference about a population based on information contained in a sample. An inference, then, is a generalization about a population based on a sample. The large body of data that we are interested in is called the population and the particular elements that we measure from it are called the sample. For example, a politician might want to make a guess about his level of support by taking a survey of 1000 voters from his district. In this case, the population is all

voters from the district, but the sample is the 1000 voters who are surveyed.

Data refers to the information that are collected and analyzed, and the collection of data for an entire study is called the data set. The elements are the entities on which data are collected – the members of the sample. For our politician above, the elements are individual voters. If you were doing a study about corporate profits, your elements might be companies.

A variable is some feature of interest for the elements. For example, our politician might ask a voter whether he is going to vote for the Democrat or for the Republican. In a study about corporate profits, you might be interested in the CEO’s salary and the level of profits that are earned by the company. The set of information collected from each member of the sample is called an observation.

Data fall into three varieties.

 Quantitative data are real numbers. Some authors call this interval data to emphasize the fact that there is a natural ordering and the amount of distance between data points is meaningful in a real sense.

o EXAMPLES: SAT score; stock price; income; height

 Qualitative data are not numerical, but rather fall into categories. Some authors call this nominal data or categorical data.

o EXAMPLES: which candidate a voter supported in the last election; whether a person is single, married, divorced or widowed

(3)

in any meaningful sense.

o EXAMPLES: whether a customer orders a small, medium or large soda; whether a student rates a professor as poor, fair, good or excellent.

Another important distinction is the manner in which data is collected.

 Cross-section data are collected over different elements at approximately the same point in time. The observations are multiple elements at approximately the same time.

o EXAMPLES: a survey asking 100 different students their hours of sleep and their GPA; a study of different countries’ immigration rates and their rates of economic growth

 Time-series data are collected for the same entity at different points in time. The observations are multiple observations of a single element

o EXAMPLE: daily data on the closing price of Microsoft stock; a study of a single country’s immigration rate and its rate of economic growth over many years

 Panel data are collected for multiple entities at different points in time. There are multiple observations of multiple elements.

(4)

EXERCISES

1. A travel magazine studied gathered the following data for each of nine hotels in Europe

COUNTRY

ROOM RATE (in EUR) NUMBER OF ROOMS

QUALITY (EXCELLENT, HIGH, MODERATE or LOW)

a. How many elements are in the data set? b. How many variables are in the data set?

c. Which variables are quantitative, qualitative and ordinal? d. Is this time-series or cross-section data?

2. A firm in Denver, Colorado is testing the advertising effectiveness of its new television commercials. As part of the test, the commercial is shown after a news broadcast. Two days later, a market research firm calls 100 customers to ask them if they remember the commercial. What are the population and sample for this study?

(5)

1.2: Measures of central tendency

For the next several sections, we will be dealing with quantitative data sets involving a single variable. In section 1.6 we will deal with data sets that contain multiple variables. It is important to be able to succinctly describe various features of the data set. For notation, we will assume that there are 𝑛 observations in our sample. The observation of our variable for entity 𝑖 will be denoted 𝑥𝑖, so our sample is {𝑥1, 𝑥2, … , 𝑥𝑛}

One of the most important features of a data set is an idea of the “average” level of some variable in a data set. We call this “central tendency”, but there is more than one way to think about central tendency.

The most common measure of average is the mean, which is defined as follows.

𝑥̅ =∑ 𝑥𝑖 𝑛

For example, suppose you sampled 5 workers on their monthly starting salaries, and obtained the following sample, in USD: {4500, 6000, 3250, 2000, 5300}. The mean starting salary is:

𝑥̅ =4500+6000+3250+2000+5300₅ = 4210

A second measure of central tendency is the median, which is calculated as follows.

Arrange the data in ascending order from smallest to largest. If the number of observations is odd, then the median is the middle value. If the number of observations is even, then the median is the mean of the two middle values.

For our salary data given above, we first arrange in ascending order as such:

{2000, 3250, 4500, 5300, 6000}

The middle value is 4500, so this is the median starting salary.

Which is a more reliable measure of central tendency? One important difference between the mean and the median is that the mean is more susceptible to outliers than the median is. An outlier is an observation with an unusually small or large value.

(6)

𝑥̅ =4500+6000+3250+2000+5300+400000₅ = 70175

But $70,175 seems rather misleading as a measure of central tendency for worker’s salaries in this sample since five out of six workers make $6000 or less! Including one extreme value in the calculation of a mean can significantly distort the answer.

But what about the median? In ascending order, the data are:

{2000, 3250, 4500, 5300, 6000, 400000}

Since there are six data points, the median is the mean between the third and the fourth data point. So the median salary for this data set is 4500+5300

2 = 4900.

This example illustrates the point that the mean is extremely susceptible to outliers, but the median is not. Adding the CEO to our sample increased the mean salary from $4210 to $70,715. But it increased the median salary only from $4500 to $4900.

We can go a little bit deeper. We will first define the skew of a data set.

 A data set displays a right skew if the data are bunched together, but with a few extremely high values. Our data set including the CEO salary displayed right skew since the other workers earned between $2000 and $6000 per month, but the CEO’s salary was much higher. Income distributions, generally, display right skew. Most workers have salaries in a moderate range, but then a few workers have extremely high salaries.

 A data set displays a left skew if the data are bunched together, but with a few extremely low values. Data on babies’ birth weights has a left skew. Most babies’ birth weights are within the healthy range, but there are a few babies born prematurely with very low birth weights.

The relationship between the mean and the median tells us something about the skew of the data set.

 Data sets with right skew feature a mean that is larger than the median. A few extremely high observations pull the mean up by a large amount but only increase the median slightly.

(7)

family had only $110,000 in assets, a few very wealthy families with high asset holdings pull the mean up to $507,000.

 Data sets with left skew feature a mean that is lower than the median. A few extremely low observations pull the mean down by a large amount but only decrease the median slightly.

EXAMPLE: The median birth weight for newborn boys in Canada is 3.46 KG, but the mean birth weight is 3.42 KG. These are close, but although the “middle” newborn boy in Canada weighs 3.46 KG, a few premature births with extremely low birth weight pull the mean birth weight down slightly to 3.42 KG.

(8)

EXERCISES

1. The table below shows the number of unique monthly visitors (in millions) to the top 20 most-visited websites in September, 2012.

a. Compute the mean. b. Compute the median.

c. Compute the mode (or modes).

d. Comment on the relationship between the mean and the median. Which do you think is the most accurate measure of central tendency in this case?

Website Visitors

(millions)

Google 215

Yahoo 189

Facebook 179

MSN 170

AOL 136

Amazon 135

Wikipedia 104

Glam 102

Apple 102

CBS 101

Turner Digital 99

Ask 91

EBay 90

Demand Media 90

New York Times 88

Comcast 83

Federated Media 82

Viacom 79

ESPN 65

(9)

1.3: Percentiles and quartiles

The previous section dealt with methods for summarizing the central tendency of data. But we might be interested in other locations besides central location. For example, I might want to know the IQ score such that only 10% of students in a school have IQ’s higher than this score.

The pth percentile is a value such that at least p percent of observations are less than or equal to this value and at least (100 – p) percent of observations are greater than or equal to this value.

The definition seems difficult because it is so precise, but the idea is easy to understand. When I say that the 90th percentile of IQ scores is 119, what this means is that approximately 90% of students have an IQ lower than 119 and approximately 10% of students have an IQ higher than 119.

The following is the recipe for computing percentiles. Suppose that you have 𝑛 observations and that you want to compute the pth percentile.

1. Arrange the data in ascending order.

2. Compute the index 𝑖 = (₁₀₀𝑝 ) 𝑛

3. If 𝑖 is not an integer, then round up. This next integer greater than 𝑖 gives the position of the pth percentile.

4. If 𝑖 is an integer, then the pth percentile is the mean of the values in positions 𝑖 and 𝑖 + 1.

Let us use the salary data from the previous section to compute the 25th percentile. Using the steps outlined above:

1. {2000, 3250, 4500, 5300, 6000}

2. 𝑖 = (₁₀₀25) 5 = 1.25

3. Because 𝑖 is not an integer, we round up to 2. The 25th percentile is in the second position.

Thus, for this data set, the 25th percentile is $3250. This is the salary such that at least 25% of salaries are less than or equal to $3250 and at least 75% of salaries are greater than or equal to $3250.

What about the 80th percentile? Using the steps outlined above:

1. {2000, 3250, 4500, 5300, 6000}

(10)

3. Because 𝑖 is an integer, the 80th percentile is the mean of the values in the 4th and 5th positions.

Thus, for this data set, the 80th percentile is 5300+6000₂ = 5650. This is the salary such that at least

80% of salaries are less than or equal to $5650 and at least 20% of salaries are greater than $5650. Notice that any value between $5300 and $6000 would satisfy this definition as well, but the convention is to report the mean point between the two.

There are three particularly important percentiles that are often reported.

 The first quartile or Q1 is the 25th percentile.

 The second quartile or Q2 is the 50th percentile.

 The third quartile or Q3 is the 75th percentile.

Basically, the first quartile is a number such that 25% of the data lie below this value and 75% lie above. The third quartile is a number such that 75% of the data lie below this value and 25% lie above.

(11)

EXERCISES

1. The table below shows the number of unique monthly visitors (in millions) to the top 20 most-visited websites in September, 2012.

a. Compute and interpret the first quartile. b. Compute and interpret the third quartile. c. Compute and interpret the 85th percentile. d. Compute and interpret the 87th percentile.

Website Visitors

(millions)

Google 215

Yahoo 189

Facebook 179

MSN 170

AOL 136

Amazon 135

Wikipedia 104

Glam 102

Apple 102

CBS 101

Turner Digital 99

Ask 91

EBay 90

Demand Media 90

New York Times 88

Comcast 83

Federated Media 82

Viacom 79

ESPN 65

(12)

1.4: Measures of variability

The previous two sections were about describing the location of data. Section 1.2 was about the central tendency of the data and section 1.3 was about measuring location other than center. But another question concerns the level of variability in data. That is, are the data very spread out or are the data close together?

The simplest measure of variability is the range.

The range of a dataset is the difference between the largest and the smallest value.

Recall that our monthly salary data was {2000, 3250, 4500, 5300, 6000}. For these data, the range is 6000 − 2000 = 4000.

One obvious concern with the range is that it might be affected by one extremely high or extremely low value. A measure of variability that reduces this susceptibility to extreme values is the interquartile range.

The interquartile range (IQR) of a dataset is the difference between the third quartile and the first quartile: 𝐼𝑄𝑅 = 𝑄3− 𝑄1.

Using the techniques of the previous section, the third quartile (the 75th_{percentile) for this} dataset is in the fourth position, so 𝑄₃ = 5300. The first quartile (the 25th_{percentile) is in the} second position, so 𝑄1 = 3250. Thus, for this data set the interquartile range is 𝐼𝑄𝑅 = 5300 −

3250 = 2050. This is less susceptible to extreme values because it uses only the first and third quartiles, not values at the very top and bottom.

Another way to measure the level of variability in a dataset is to figure out how far away each observation is from the mean and then average up these distances. The difference between an observation and the sample mean (𝑥_𝑖− 𝑥̅) does not work well because the positive and negative values cancel each other out. But we are interested in distance from the mean, regardless of whether the observation is higher or lower than the mean. Thus, the proper place to start is the

squared difference (𝑥𝑖 − 𝑥̅)2. The idea is to average up these squared differences across all

members of the sample.

The variance of a dataset 𝑠2 is defined as follows:1

𝑠2 ₌ ∑(𝑥𝑖−𝑥̅)2 𝑛−1

(13)

Let us compute the variance for our salary data given above. Recall we calculated in section 1.2 that the mean is 𝑥̅ = 4210.

𝑠2 ₌ (2000−4210)2+(3250−4210)2+(4500−4210)2+(5300−4210)2+(6000−4210)2

5−1 = 2,570,500

This is somewhat difficult to interpret in a simple way because the differences are squared. But from here we can derive the most frequently used measure of variability in a dataset, which is the standard deviation. The standard deviation𝑠 is the square root of the variance:

𝑠 = √𝑠2

For our salary data, the standard deviation is:

𝑠 = √2,570,500 = 1603.28

By “cancelling out” the square terms used in the variance calculation, the standard deviation can be interpreted in the same units as the original measurements. Informally, the standard deviation can be thought of as the average distance from the mean. So for our dataset, the typical salary is $1603.28 away from the mean. As such, a higher standard deviation indicates more “spread” in the data in the sense that the typical observation is farther away from the mean.

One issue is that the standard deviation is unit-dependent, meaning that how large or small it is depends on the units chosen. A unit-free measure of variance is the coefficient of variation which measures the magnitude of the standard deviation relative to the mean:

Coefficient of Variation = (Standard Deviation_Mean ) ⋅ 100

For our data, the coefficient of variation is (1603.28

4210 ) ⋅ 100 = 38.08. The way to interpret this

(14)

EXERCISES

1. The police take a sample of the number of crimes reported over ten different days in the winter and in the summer. The data are given below.

a. Compute the range for each season.

b. Compute the interquartile range for each season. c. Compute the variance for each season.

d. Compute the standard deviation for each season. e. Compute the coefficient of variation for each season.

f. Compare the variability of the number of crime reports in the winter and in the summer.

WINTER: 18, 20, 15, 16, 21, 20, 12, 16, 19, 20 SUMMER: 28, 18, 24, 32, 18, 29, 23, 38, 28, 18

2. The government collects the price of the same statistics book at five different retailers, and obtains the following: $45, $52, $63, $21, $68.

a. Compute the standard deviation.

b. Compute the coefficient of variation now.

c. Suppose now that the measurements were taken in UAE dirhams ($1 = 3.67 dirhams). Compute the standard deviation now.

(15)

1.5: Distribution of data

Knowing the mean and the standard deviation, it turns out that we are in a position to say something about the way in which data are distributed.

Central to understanding the distribution of data is the z-score. In words, the z-score of observation 𝑖 measures the number of standard deviations that observation 𝑖 is from the mean 𝑥̅. We can compute the z-score for observation 𝑖 as such:

𝑧_𝑖 =𝑥𝑖−𝑥̅ 𝑠

For example, if the mean IQ is 𝑥̅ = 100 and the standard deviation is 𝑠 = 15, then for an individual whose IQ is 𝑥_𝑖 = 130, we can compute:

𝑧_𝑖 = 130−100₁₅ = 2

The way to interpret this is that this individual’s IQ is 2 standard deviations above the mean.

Using this foundation, there are two main sets of results that we can use to describe the distribution of datasets. Chebyshev’s Theorem holds for all datasets, regardless of what the distribution looks like. The Empirical Rule gives tighter predictions, but it is accurate only for a certain class of distributions. Let us consider each in term.

Chebyshev’s Theorem states that at least 1 −_𝑎1₂ of the observations in any dataset lie within 𝑎

standard deviations of the mean.

For 𝑎 = 1, the theorem is meaningless, but we can substitute in higher values of 𝑎 to derive implications of Chebyshev’s Theorem:

 At least 75% of observations must lie within 𝑎 = 2 standard deviations of the mean.

 At least 88.9% of observations must lie within 𝑎 = 3 standard deviations of the mean.

 At least 93.8% of observations must lie within 𝑎 = 4 standard deviations of the mean, etc…

(16)

By contrast, the empirical rule applies only to data with a bell-shaped distribution, meaning that the distribution is approximately symmetric and mound-shaped, without large skew or severe outliers.

The empirical rule states the following:

 Approximately 68% of observations will lie within 1 standard deviation of the mean.

 Approximately 95% of observations will lie within 2 standard deviations of the mean.

 Approximately 99.7% of observations (almost all) will lie within 3 standard deviations of the mean.

Now we can see the tradeoff between the two. The empirical rule gives much “tighter” predictions. Chebyshev’s Theorem tells us nothing about how many observations lie within 1 standard deviation of the mean, but the empirical rule tells us that about 68% of observations will lie there. And while Chebyshev’s Theorem tells us that 75% of observations will lie within 2 standard deviations of the mean, the empirical rule tells us that about 95% of observations will lie in this interval.

(17)

EXERCISES

1. A professor’s midterm exam has a mean of 70 and a standard deviation of 5.

a. What fraction of students had exam scores between 60 and 80?

b. How would your answer to (a) change if you also assumed that the distribution of test scores was bell-shaped?

c. What fraction of students had exam scores between 58 and 82?

d. Assume again that the distribution of test scores is bell-shaped. What fraction of students had exam scores above 80?

(18)

1.6: Measures of association

The previous sections have dealt with observations on one variable and how we can summarize the data on this variable. But sometimes we are interested in the relationship between two variables. For example, suppose we collect data from five students about the number of hours that they sleep each night and their GPA. The data are below.

Hours of

Sleep GPA

6 3.0

7 2.8

10 3.9

4 2.2

7 3.5

On average, it looks like the students who sleep more hours have a higher GPA. But how can we describe this relationship precisely?

Now we have two variables, 𝑥 and 𝑦, for each member of our sample. There are 𝑛 observations, but now with 𝑥 and 𝑦 observed for all observations: {(𝑥1, 𝑦1), (𝑥2, 𝑦2), … , (𝑥𝑛, 𝑦𝑛)}.

The covariance𝑠_𝑥𝑦 for sample data is defined as follows:

𝑠_𝑥𝑦= ∑(𝑥𝑖−𝑥̅)(𝑦𝑖−𝑦̅) 𝑛−1

For our above sample, let us take 𝑥 to be hours of sleep and 𝑦 to be GPA. Standard calculations give that the sample means are 𝑥̅ = 6.8 and 𝑦̅ = 3.08. We can now compute the covariance:

𝑠_𝑥𝑦= (6−6.8)(3.0−3.08)+(7−6.8)(2.8−3.08)+(10−6.8)(3.9−3.08)+(4−6.8)(2.2−3.08)+(7−6.8)(3.5−3.08)₄

Doing the calculation gives 𝑠_𝑥𝑦= 1.295.

The way to think about this formula is that, when 𝑥 and 𝑦 move together, then the products

(𝑥𝑖− 𝑥̅)(𝑦𝑖− 𝑦̅) will mostly be positive, because when 𝑥𝑖 is above the mean 𝑥̅, then most of the

time 𝑦𝑖 will also be above the mean 𝑦̅. Similarly, when 𝑥𝑖 is below 𝑥̅, at the same time 𝑦𝑖 will

(19)

higher than GPA, and students who sleep fewer than the average number of hours tend to have a GPA lower than average. Thus, most of the products (𝑥_𝑖− 𝑥̅)(𝑦_𝑖− 𝑦̅) are positive. Another example is rainfall and crop yield. When rainfall is high, crop yield tends to be high and when rainfall is low, crop yield tends to be low. Thus, a positive covariance is indicative of two variables that tend to move together – on average, when one is high then the other is high.

On the other hand, consider the relationship between temperature and the number of heaters sold. When the temperature is higher than average, the number of heater sales will typically be lower than average. Thus, the products (𝑥_𝑖 − 𝑥̅)(𝑦_𝑖 − 𝑦̅) will tend to be negative. Thus, a negative covariance is indicative of two variables that tend to move in opposite directions – on average, when one is high then the other is low.

It is useful to know whether the covariance is positive or negative. However, like standard deviation, the magnitude of the covariance depends on the units chosen. By contrast, the correlation is a measure of the association between two variables that is unit free.

The correlation𝑟_𝑥𝑦 for sample data is defined as follows:

𝑟_𝑥𝑦= 𝑠𝑥𝑦 𝑠𝑥𝑠𝑦

For our data above, standard calculations give that the standard deviation for hours of sleep is

𝑠_𝑥= 2.1679 and the standard deviation for GPA is 𝑠_𝑦 = 0.6535. Thus, the correlation is:

𝑟𝑥𝑦= _{2.1679⋅0.6535}1.295 = 0.9141

Correlations are easier to interpret than covariances because they are unit-free. Correlations always lie between −1 and +1.

 Positive correlations indicate two variables that tend to move in the same direction. The closer to +1 the correlation is, the stronger is the relationship. The relationship between rainfall and crop yield and the relationship between education and salary are examples.

 Negative correlations indicate two variables that tend to move in opposite directions. The closer to −1 the correlation is, the stronger the relationship. The relationship between temperature and heaters sold and the relationship between alcohol consumption and GPA are examples.

(20)

(21)

EXERCISES

1. The two most commonly reported stock indexes are the Dow Jones Industrial Average (DJIA) and the Standard & Poor’s 500 (S&P). The table below gives the closing values of these two stock indexes for 20 successive Fridays in 2013.

a. Compute the covariance between the DJIA and S&P closing prices. b. Compute and interpret the correlation.

Date DJIA

Closing

S&P Closing

January 18 13650 1486

January 25 13896 1503

February 1 14010 1513

February 8 13993 1518

February 15 13982 1520

February 22 14001 1516

March 1 14090 1518

March 8 14397 1551

March 15 14514 1561

March 22 14512 1557

March 28 14579 1569

April 5 14565 1553

April 12 14865 1589

April 19 14548 1555

April 26 14713 1582

May 3 14974 1614

May 10 15118 1634

May 17 15354 1667

May 24 15303 1650

(22)

1.7: Alternative formulations of the mean

The mean that we defined in unit 1.2 is more precisely called the arithmetic mean. However, it is often just called the “mean” in common-language usage because it is by far the most common type of mean. Nevertheless, it is worth briefly considering other formulations of the mean that are important in some settings.

The weighted mean is just an arithmetic mean, but weighted for multiple occurrences of some observations. Here is an example. Suppose you hold a hamburger-eating contest with 100 contestants and you record the number of hamburgers they can eat in ten minutes. The table below records the number of hamburgers eaten.

Number of Hamburgers

Number of Contestants

3 10

4 16

5 46

6 24

7 3

8 1

What is the average number of hamburgers eaten by each contestant? Taking the mean of

{3,4,5,6,7,8} is not correct, because there are multiple occurrences. Rather, we use the arithmetic mean, but count each data point the number of times that it is recorded. In this context, we call the resulting answer a weighted mean because each possible number of hamburgers eaten is “weighted” by the number of contestants who ate this number of hamburgers.

𝑥̅ =10⋅3+16⋅4+46⋅5+24⋅6+3⋅7+1⋅8₁₀₀ = 4.97

As you can see, this is just the arithmetic mean, counting the number of occurrences properly. The general formula for the weighted mean is:

𝑥̅ =∑ 𝑤𝑖𝑥𝑖 𝑛

(23)

Another type of mean that appears in some contexts is the geometric mean, which is defined as follows:

𝐺 = √𝑥𝑛 1𝑥2⋯ 𝑥𝑛 = (𝑥1𝑥2⋯ 𝑥𝑛)1/𝑛

So, while the arithmetic mean adds all the observations and divides by the number of observations, the geometric mean multiplies all the observations and then takes the nth root of the resulting product. The geometric mean makes sense only for positive numbers.

The most important application of the geometric mean for economics is that it is the appropriate way to average growth rates. As an example, suppose that you leave your money in a bank account for three years. The first year, you earn 10% interest, the second year you earn 20% interest and the third year, you earn 30% interest. You might be tempted to say that your average return over the three years is 20% (the arithmetic mean), but that is not exactly correct. To see why, suppose that you start with $100, and compute your bank balance at the end of three years:

100(1 + 0.1)(1 + 0.2)(1 + 0.3) = 171.60

Now, the geometric mean of these returns is:

𝐺 = [(1 + 0.1)(1 + 0.2)(1 + 0.3)]1/3 _{= 1.19722}

It turns out that 19.72% is actually the correct way to think about the “average” rate of return. What this means is that if we invested our $100 in an account paying 19.72% interest annually, and left it there for three years, at the end of three years, the balance on the account would be:

100(1 + 0.19722)(1 + 0.19722)(1 + 0.19722) = 171.60

So that is what we mean by the “average” of these growth rates. Earning a return of 10% then 20% then 30% is equivalent to earning a return of 19.72% in each of the three years. A yearly interest rate of 20% (the geometric mean) would actually overshoot; intuitively, this is because of compounding.

Summarizing, for a sequence of growth rates {𝑟₁, 𝑟₂, … , 𝑟_𝑛}, the average growth rate is:

[(1 + 𝑟1)(1 + 𝑟2) ⋯ (1 + 𝑟𝑛)]1/𝑛

(24)

A final type of mean is the harmonic mean, which is defined as follows:

𝐻 = 1 𝑛

𝑥1+ 1 𝑥2+⋯+

1 𝑥𝑛

= (_𝑛1∑1

𝑥𝑖) −1

In words, the harmonic means takes the arithmetic mean of the reciprocals of the observations and then in turn takes the reciprocal of this entire sum.

The harmonic mean is used for averaging together ratios. For example, suppose that a diver drives from point A to point B along a 60-mile road. For the trip from A to B, he drives at 30 miles per hour, and for the return trip from B to A, he drives at 20 miles per hour. Your first guess might be that his average speed is 25 miles per hour, but this is not correct.

Think about it – The trip from A to B took 2 hours since the road is 60 miles and he drives at 30 miles per hour. The return trip from B to A took 3 hours since the road is 60 miles and he drives at 20 miles per hour. Overall, he drove 120 miles in 5 hours, so evidently his average speed is

120 miles

5 hours = 24 miles per hour. But this is just the harmonic mean:

𝐻 = 12

30+ 1 20

= [1₂(₃₀1 +₂₀1)]−1= 24

Basically, the idea is that speed is a ratio of distance to time, and the amount of time appears in the denominator rather than the numerator. Thus, the proper way to average up these ratios is the harmonic mean, not the arithmetic mean.

Other applications of the harmonic mean include the price earnings ratio in finance and averaging up the fuel economy of different automobiles. Basically, the harmonic mean is appropriate when averaging ratios.

(25)

EXERCISES

1. You run a farm, and you bought 1200 pounds of feed for $3.00 per pound. Later, you bought 500 pounds of feed for $3.40 per pound. Finally, you bought 2500 pounds of feed for $2.80 per pound. What is the average price per pound that you paid for the feed?

2. You are traveling from point A to point B along a 100 mile road. You traveled at 25 miles per hour for the trip from A to B. How fast do you need to travel for the return trip if you want your average speed to be 50 miles per hour

3. A country’s GDP grows at 10% every year for five years.

a. Compute the average growth rate.