Comparing Means Between Groups

(1)

Comparing Means Between Groups

Michael Ash

(2)

Summary of Main Points

I Comparing means between groups is an important method for

program evaluation by policy analysts and public administrators.

I The question “Does a program work?” is often answered in terms of the program’s effect on the mean of an important outcome variable by comparing the mean of a treated group and a comparison group.

I Comparing means between groups is an important method for

identifying discrimination and other social problems. Examples: income by white or non-white; drop-out risk by single-parent or two-parent household; body mass index (BMI) by urban or suburban residence.

I The treated group and the comparison group are samples

from two different populations. Sampling variation (rather than true underlying differences in the populations) may account for differences in the sample mean.between groups.

(3)

Caveats

Outcome The outcome must measure something worth knowing.

Confounding factors and selection into treatment The treatment and comparison groups may be different other than in the receipt of treatment.

Mean The population mean does not fully describe the distribution of outcomes. For example, two groups with equal population mean income could have different probabilities of extreme poverty.

(4)

Means for Different Populations

Example of two populations

1. population of women recently graduated from college, mean earnings µw

2. population of men recently graduated from college, mean earnings µm

Hypothesis Test for the Difference Between Two Means

The null hypothesis is that the difference is some amount d0

specified by the researcher.

H0:µ_m−µ_w = d0

H1:µm−µw 6= d0

For example, d0 = 0 would set up the test that there is no

difference in mean earnings between recent male and female college graduates.

(5)

Procedure to Test a Null about Differences

1. Yw is a good estimate of µw, andYm is a good estimate of

µm

2. Ym−Yw is a good estimate of the difference in population

means, µm−µw

3. Yw andYm are subject to sampling variation, as is the

difference Ym−Yw. We will need an estimate of the

standard deviation ofYm−Yw.

4. We want to know if, under the null hypothesis, the r.v. (Ym−Yw)−d0, the difference between the difference in

sample means and the null-hypothesized difference between population means, is likely to be as large as the observed actual difference between the sample means our particular sample and the null-hypothesized difference between population means.

(6)

The Hypothesis Test

I A test statistic for the difference between the difference in

sample means and the null-hypothesized difference in population means

t = (Ym−Yw)−d0

SE(Ym−Yw)

I This test statistic is distributedN(0,1) if the two samples are reasonably large. If the test statistic is “large” (bigger than 1.96), then we reject the null hypothesis. Why? The actual difference in sample means is unlikely to be as big as it is if the null were true.

(7)

Standard error of the difference in sample means

SE(Ym−Yw) = s s2 m nm + s 2 w nw I s2

m, Sample variance for men’s earnings

s_m2 = 1 nm−1 nm X i=1 Yi −Ym 2 I s2

w, Sample variance for women’s earnings

s_w2 = 1 nw−1 nw X j=1 Yj −Yw 2

(8)

Real-world data

I Table 3.1 presents summary statistics from real-world data I Useful exercise to

think about the underlying data

I What is the unit of

observation?

I What variables are

reported for each observation?

. use cps_ch3 . list in 1/7

+---+

| a_sex year ahe98 |

|---| 1. | 1 1992 12.99912 | 2. | 1 1992 11.61796 | 3. | 1 1992 17.37729 | 4. | 2 1992 10.06127 | 5. | 1 1992 16.75668 | |---| 6. | 2 1992 9.216171 | 7. | 2 1992 15.95874 | +---+

(9)

Comparing means with Stata

Stata can tabulate and summarize data for us.

. tabulate a_sex if year==1992, summarize(ahe98) | Summary of ahe98

a_sex | Mean Std. Dev. Freq. ---+---1 | 17.574572 7.4964888 1591 2 | 15.220472 5.9732026 1371 ---+---Total | 16.484946 6.932766 2962

With just one command, we have moved from “raw” individual data to the summary statistics in the first line of Table 3.1. (Think about how long it would take to do this in Excel—or by hand)

(10)

Comparing means with Stata

In fact, we can now test a null of equality (d0 = 0)of mean hourly

earnings for men and women in 1992, or H0:µm−µw = 0

t = (Ym−Yw)−d0 SE(Ym−Yw) = 17.57−15.22−0 SE(Ym−Yw) = 2.35 SE(Ym−Yw) SE(Ym−Yw) = s s2 m nm + s 2 w nw = r 7.502 1591 + 5.972 1371

(11)

Aside: is this SE, $0.25, plausible?

The SE for men’s earnings is sm √_n

m = 7.50/ √

1591 = 0.18 The SE for women’s earnings is sw

√

nw = 5.97/ √

1371 = 0.16 The SE for the difference should not be tremendously different from the SE for each group. (If you computed an SE of 7, you should be worried.)

(12)

Returning to our test statistic

t = (Ym−Yw)−d0 SE(Ym−Yw) = 17.57−15.22−0 SE(Ym−Yw) = 2.35 SE(Ym−Yw) = 2.35 0.25 = 9.35

This is a very large t-statistic (at-statistic of 2 is all that is required to reject the null hypothesis. So we reject the null hypothesis of equal wages with very high confidence (very low probability that the difference in sample means is only due to sampling variation).

(13)

Applying the method to a different null

Is the difference between male and female earnings $1.50?

H0 :µ_m−µ_w = 1.50 t = (Ym−Yw)−d0 SE(Ym−Yw) = (17.57−15.22)−1.50 SE(Ym−Yw) = 0.85 SE(Ym−Yw) = 0.85 0.25 = 3.4

(14)

Young men’s earnings over time

H0:µm,1998−µm,1992 = 0 t = (Ym,1998−Ym,1992)−d0 SE(Ym,1998−Ym,1992) = (17.94−17.57)−0 SE(Ym,1998−Ym,1992) = 0.37 SE(Ym,1998−Ym,1992) = 0.37 0.28 = 1.31

N.B. We are looking at two different samples of young men from two different cohorts.

(15)

Young men’s earnings over time

1. Thist-statistic is well below 1.96.

2. Pr(_|t_|>1.31) = 0.19, or 19 percent of the time the sample means will differ this much if there is no true difference in the population means.

3. We cannot reject the null hypothesis with 95 percent

confidence: there is no evidence that the wages of recent male college graduates was higher in the late 1990s than it had been in the early 1990s.

(16)

Bernoulli outcomes

Very common application.

1. What is the percent of positive outcomes (Y = 1) in the population?

2. Does the percent of positive outcomes (Y = 1) differ between two groups?

Methods are identical to the method for continuous variables, but the interpretation and computations differ slightly.

(17)

Bernoulli outcomes

An individual’s response is yes Yi = 1 or no Yi = 0.

Call px the mean population approval of President x and ˆpx the

mean sample approval of President x. Note that ˆ px = 1 n n X i=1 Yi

Sample Size Percent “yes” President I 250 0.54

President II 300 0.44 (Think about the underlying data.)

(18)

Is approval different from 50 percent?

H0 :p_I = 0.5 t = pˆI−pI,0 SE(ˆpI) = ˆpI−0.5 SE(ˆpI) = 0.54−0.5 SE(ˆpI) SE(ˆpI) = s s2 Y n no difference so far = r ˆ p(1₋ˆp) n special s 2

Yfor a Bernoulli variable

= r 0.54_·0.46 250 ≈0.031 t = 0.54−0.5 0.031 = 1.27

The t statistic is smaller than 1.96; so we cannot reject the null hypothesis.

(19)

Polling: margin of error

By the way, poll results are often expressed with a “margin of error” that is, in fact the 95 percent confidence interval.

Pr(ˆp₋1.96SE(ˆp) _≤p _≤ pˆ₋1.96SE(ˆp)) = 0.95 Pr(0.54₋1.96_×0.031 _≤p _≤ pˆ+ 1.96_×0.031) = 0.95

Pr(0.54₋.06 _≤p _≤ 0.54 + 0.06) = 0.95 Pr(0.48 _≤p _≤ 0.60) = 0.95

The margin of error would be reported as _±1.96_×SE(ˆp) =_±0.06 Note the importance of sample size for determining standard error and the margin of error of a poll:

SE(ˆp) =

r

ˆ

p(1₋ˆp)

n

You can push down the SE, and the margin of error, by increasing the sample size.

(20)

Approval rating for two presidents

Is approval for President I different from approval for President II?

H0 :p_I −p_II = 0 t = (ˆpI−ˆpII)−d0 SE(ˆpI −ˆpII) = (ˆpI−ˆpII)−0 SE(ˆpI−pÎI) = 0.54−0.44 SE(ˆpI−pÎI) SE(ˆpI −ˆpII) = s s2 YI nI +s 2 YII nII = s ˆ pI(1−pÎ) nI +pÎI(1−pÎI) nII = r 0.54_·0.46 250 + 0.44_·0.56 300 ≈0.0426 t = 0.54−0.44 0.0426 = 2.35

We can reject the null that approval for the two candidates is equal.