Comparing Means Between Groups
Michael Ash
Summary of Main Points
I Comparing means between groups is an important method for
program evaluation by policy analysts and public administrators.
I The question “Does a program work?” is often answered in terms of the program’s effect on the mean of an important outcome variable by comparing the mean of a treated group and a comparison group.
I Comparing means between groups is an important method for
identifying discrimination and other social problems. Examples: income by white or non-white; drop-out risk by single-parent or two-parent household; body mass index (BMI) by urban or suburban residence.
I The treated group and the comparison group are samples
from two different populations. Sampling variation (rather than true underlying differences in the populations) may account for differences in the sample mean.between groups.
Caveats
Outcome The outcome must measure something worth knowing.
Confounding factors and selection into treatment The treatment and comparison groups may be different other than in the receipt of treatment.
Mean The population mean does not fully describe the distribution of outcomes. For example, two groups with equal population mean income could have different probabilities of extreme poverty.
Means for Different Populations
Example of two populations
1. population of women recently graduated from college, mean earnings µw
2. population of men recently graduated from college, mean earnings µm
Hypothesis Test for the Difference Between Two Means
The null hypothesis is that the difference is some amount d0
specified by the researcher.
H0:µm−µw = d0
H1:µm−µw 6= d0
For example, d0 = 0 would set up the test that there is no
difference in mean earnings between recent male and female college graduates.
Procedure to Test a Null about Differences
1. Yw is a good estimate of µw, andYm is a good estimate of
µm
2. Ym−Yw is a good estimate of the difference in population
means, µm−µw
3. Yw andYm are subject to sampling variation, as is the
difference Ym−Yw. We will need an estimate of the
standard deviation ofYm−Yw.
4. We want to know if, under the null hypothesis, the r.v. (Ym−Yw)−d0, the difference between the difference in
sample means and the null-hypothesized difference between population means, is likely to be as large as the observed actual difference between the sample means our particular sample and the null-hypothesized difference between population means.
The Hypothesis Test
I A test statistic for the difference between the difference in
sample means and the null-hypothesized difference in population means
t = (Ym−Yw)−d0
SE(Ym−Yw)
I This test statistic is distributedN(0,1) if the two samples are reasonably large. If the test statistic is “large” (bigger than 1.96), then we reject the null hypothesis. Why? The actual difference in sample means is unlikely to be as big as it is if the null were true.
Standard error of the difference in sample means
SE(Ym−Yw) = s s2 m nm + s 2 w nw I s2m, Sample variance for men’s earnings
sm2 = 1 nm−1 nm X i=1 Yi −Ym 2 I s2
w, Sample variance for women’s earnings
sw2 = 1 nw−1 nw X j=1 Yj −Yw 2
Real-world data
I Table 3.1 presents summary statistics from real-world data I Useful exercise tothink about the underlying data
I What is the unit of
observation?
I What variables are
reported for each observation?
. use cps_ch3 . list in 1/7
+---+
| a_sex year ahe98 |
|---| 1. | 1 1992 12.99912 | 2. | 1 1992 11.61796 | 3. | 1 1992 17.37729 | 4. | 2 1992 10.06127 | 5. | 1 1992 16.75668 | |---| 6. | 2 1992 9.216171 | 7. | 2 1992 15.95874 | +---+
Comparing means with Stata
Stata can tabulate and summarize data for us.
. tabulate a_sex if year==1992, summarize(ahe98) | Summary of ahe98
a_sex | Mean Std. Dev. Freq. ---+---1 | 17.574572 7.4964888 1591 2 | 15.220472 5.9732026 1371 ---+---Total | 16.484946 6.932766 2962
With just one command, we have moved from “raw” individual data to the summary statistics in the first line of Table 3.1. (Think about how long it would take to do this in Excel—or by hand)
Comparing means with Stata
In fact, we can now test a null of equality (d0 = 0)of mean hourly
earnings for men and women in 1992, or H0:µm−µw = 0
t = (Ym−Yw)−d0 SE(Ym−Yw) = 17.57−15.22−0 SE(Ym−Yw) = 2.35 SE(Ym−Yw) SE(Ym−Yw) = s s2 m nm + s 2 w nw = r 7.502 1591 + 5.972 1371
Aside: is this SE, $0.25, plausible?
The SE for men’s earnings is sm √n
m = 7.50/ √
1591 = 0.18 The SE for women’s earnings is sw
√
nw = 5.97/ √
1371 = 0.16 The SE for the difference should not be tremendously different from the SE for each group. (If you computed an SE of 7, you should be worried.)
Returning to our test statistic
t = (Ym−Yw)−d0 SE(Ym−Yw) = 17.57−15.22−0 SE(Ym−Yw) = 2.35 SE(Ym−Yw) = 2.35 0.25 = 9.35This is a very large t-statistic (at-statistic of 2 is all that is required to reject the null hypothesis. So we reject the null hypothesis of equal wages with very high confidence (very low probability that the difference in sample means is only due to sampling variation).
Applying the method to a different null
Is the difference between male and female earnings $1.50?
H0 :µm−µw = 1.50 t = (Ym−Yw)−d0 SE(Ym−Yw) = (17.57−15.22)−1.50 SE(Ym−Yw) = 0.85 SE(Ym−Yw) = 0.85 0.25 = 3.4
Young men’s earnings over time
H0:µm,1998−µm,1992 = 0 t = (Ym,1998−Ym,1992)−d0 SE(Ym,1998−Ym,1992) = (17.94−17.57)−0 SE(Ym,1998−Ym,1992) = 0.37 SE(Ym,1998−Ym,1992) = 0.37 0.28 = 1.31N.B. We are looking at two different samples of young men from two different cohorts.
Young men’s earnings over time
1. Thist-statistic is well below 1.96.
2. Pr(|t|>1.31) = 0.19, or 19 percent of the time the sample means will differ this much if there is no true difference in the population means.
3. We cannot reject the null hypothesis with 95 percent
confidence: there is no evidence that the wages of recent male college graduates was higher in the late 1990s than it had been in the early 1990s.
Bernoulli outcomes
Very common application.
1. What is the percent of positive outcomes (Y = 1) in the population?
2. Does the percent of positive outcomes (Y = 1) differ between two groups?
Methods are identical to the method for continuous variables, but the interpretation and computations differ slightly.
Bernoulli outcomes
An individual’s response is yes Yi = 1 or no Yi = 0.
Call px the mean population approval of President x and ˆpx the
mean sample approval of President x. Note that ˆ px = 1 n n X i=1 Yi
Sample Size Percent “yes” President I 250 0.54
President II 300 0.44 (Think about the underlying data.)
Is approval different from 50 percent?
H0 :pI = 0.5 t = pˆI−pI,0 SE(ˆpI) = ˆpI−0.5 SE(ˆpI) = 0.54−0.5 SE(ˆpI) SE(ˆpI) = s s2 Y n no difference so far = r ˆ p(1−ˆp) n special s 2Yfor a Bernoulli variable
= r 0.54·0.46 250 ≈0.031 t = 0.54−0.5 0.031 = 1.27
The t statistic is smaller than 1.96; so we cannot reject the null hypothesis.
Polling: margin of error
By the way, poll results are often expressed with a “margin of error” that is, in fact the 95 percent confidence interval.
Pr(ˆp−1.96SE(ˆp) ≤p ≤ pˆ−1.96SE(ˆp)) = 0.95 Pr(0.54−1.96×0.031 ≤p ≤ pˆ+ 1.96×0.031) = 0.95
Pr(0.54−.06 ≤p ≤ 0.54 + 0.06) = 0.95 Pr(0.48 ≤p ≤ 0.60) = 0.95
The margin of error would be reported as ±1.96×SE(ˆp) =±0.06 Note the importance of sample size for determining standard error and the margin of error of a poll:
SE(ˆp) =
r
ˆ
p(1−ˆp)
n
You can push down the SE, and the margin of error, by increasing the sample size.
Approval rating for two presidents
Is approval for President I different from approval for President II?
H0 :pI −pII = 0 t = (ˆpI−ˆpII)−d0 SE(ˆpI −ˆpII) = (ˆpI−ˆpII)−0 SE(ˆpI−pˆII) = 0.54−0.44 SE(ˆpI−pˆII) SE(ˆpI −ˆpII) = s s2 YI nI +s 2 YII nII = s ˆ pI(1−pˆI) nI +pˆII(1−pˆII) nII = r 0.54·0.46 250 + 0.44·0.56 300 ≈0.0426 t = 0.54−0.44 0.0426 = 2.35
We can reject the null that approval for the two candidates is equal.