4 Significance testing
4.5 Testing the difference between means or proportions
In this section, sample estimates are indexed by 1 or 2 if there are two samples (e.g.
n1and n2are the sizes of the first and second samples respectively).
4.5.1
Difference between two independent means
The test compares the difference between mean values ¯x1and ¯x2of two independent
samples with an ‘expected’ difference e. The number e is not supplied by the test and should come from an ‘outside’ source of information. If there is no evidence about what could be expected, it may be reasonable to take e= 0 but the user should consider carefully whether the ‘no difference’ hypothesis is really what needs to be tested.
The sampling standard deviations S1and S2are assumed to be known. The standard
error s.e. of the difference ¯x1− ¯x2is then estimated as
s.e. = S2 1 n1,e + S2 2 n2,e,
where n1,e and n2,e are the effective sample sizes in the first and second samples,
respectively.
As with the test for single values, the Z-test is recommended for large samples. If the null hypothesis is the ‘no difference’ one, it is reasonable sometimes to assume that the two populations have the same variance. In this case, it may be better to compute the standard deviation as the square root of the pooled variance estimate. However, in pooling the estimates they must not be weighted in proportion to their respective sums of weights. The weighting schemes, and the mean weights applied,
may be arbitrarily different between two different samples, and even between two subsamples. The effective sample sizes n1,e, n2,eof the two (sub)samples, should be
used instead: S = n1,eS12+ n2,eS22 n1,e+ n2,e .
Then the standard error s.e. can be computed as S1/n1,e+ 1/n2,e. But the pooled
variance cannot be used if the expected difference e is non-zero. Let
d = ¯x1− ¯x2− e
be the ‘discrepancy’ between the ‘observed’ and the ‘expected’ difference. Then the test is the following:
null hypothesis μ1− μ2= e μ1− μ2≤ e μ1− μ2≥ e
alternative hypothesis μ1− μ2= e μ1− μ2> e μ1− μ2< e
(two-tailed) (one-tailed) (one-tailed)
statistic to compute Z = |d|/s.e. Z= d/s.e. Z= −d/s.e.
table to find P-value large sample: Table H1
small sample, normal population: Table H3 with
n1+ n2− 2 degrees of freedom
Example 4.3 Because of the result of the test in Example 4.1 a second sample was selected, supposedly in the same way, with a calibrated sample size of 150. This yielded a mean height of 1.785 m with a standard deviation of 0.05 m. Is it reasonable to conclude that these two estimates are compatible? What is the likelihood that the difference between the two means arose by chance?
The null hypothesis is that the population from which these samples were drawn is the same and that therefore there should be no difference between the means. The standard deviations of the two samples are markedly different so that we should not pool the variances in this case.
s.e.1= 0.0073 m (from Example 4.1).
s.e.2=
√
0.05/150 = 0.0041 m. Difference d= 0.015 m.
Standard error of the difference s.e. =0.082/120 + 0.052/150 = 0.0084 m.
Z = d/s.e. = 1.79.
As we have no prior expectation of the direction of any difference between the two samples a two-tailed test is appropriate. From Table H1, with Z = 1.79,
P= 0.073, so the odds are about 13:1 against this happening by chance and we
161 4.5 testing the difference between means or proportions
level of confidence is sufficient is of course a matter for judgement in the individual circumstances.
4.5.2
Difference between two independent proportions
An initial word of warning is necessary. ‘Independent proportions’ refers to two estimates of the incidence of some attribute within two samples (or two subsets of one sample) which do not overlap. There is thus no possible correlation between the estimates. However, if we want to examine the proportions with two attributes within the same sample or subsample these are ‘correlated attributes’ even if no two sample members can possess both attributes. Correlated proportions are dealt with in section 4.5.3.
The test is very similar to the previous one, the number e being now the ‘expected’ difference between two proportions. The standard error of the difference ˆp1− ˆp2is
given by the formula
s.e. = ˆ p1(1− ˆp1) n1,e + pˆ2(1− ˆp2) n2,e . (4.1)
As with the difference between means, it is possible to use the pooled estimate when the null hypothesis is p1= p2. The common proportion ˆp is then estimated as the
weighted average of ˆp1and ˆp2 As in section 4.5.1, this weighted average must use
the respective effective sample sizes, not the (arbitrary) sums of weights, of the (sub)samples within which ˆp1and ˆp2are calculated, so that
ˆ
p=n1,epˆ1+ n2,epˆ2 n1,e+ n2,e
.
The standard error will then be s.e. = ˆ p(1− ˆp) 1 n1,e + 1 n2,e .
But remember that this is not applicable when the expected difference between pro- portions is non-zero.
Again, denote by d the difference between observed and expected figures:
d = ˆp1− ˆp2− e.
Then the test details are the following:
null hypothesis p1− p2= e p1− p2≤ e p1− p2≥ e
alternative hypothesis p1− p2= e p1− p2> e p1− p2< e
(two-tailed) (one-tailed) (one-tailed)
statistic to compute Z = |d|/s.e. Z = d/s.e. Z= −d/s.e.
Example 4.4 In Example 4.2 we had one region where dog ownership was esti- mated at 40%. In an adjoining region the corresponding estimate was only 30%, though the sample size was smaller (effective sample size 100). Is this difference likely to have occurred by chance or can we reasonably conclude that there is a real difference between these two regions?
s.e.1(from Example 4.2)= 0.035 (3.5%).
s.e.2=
√
0.3(1 − 0.3)/100 = 0.046. Difference d= 0.1.
Standard error of the difference s.e. = 0.057 (5.7%) from Example 4.1.
Z = d/s.e. = 1.75.
From Table H1, assuming a two-tailed test, we find that there is only an 8% probability of a difference of that magnitude or greater arising by chance.
4.5.3
Difference between correlated proportions
It is necessary sometimes to test the difference between two proportions which do not come from independent samples. For simplicity, assume that we deal with ‘large’ samples. The main difficulty in this case is the fact that the two proportion estimates
ˆ
p1 and ˆp2are not independent so that the standard error of their difference cannot
be computed by formula (4.1) any more. The formula should now incorporate the
covariance between ˆp1and ˆp2:
var( ˆp1− ˆp2)= var( ˆp1)+ var( ˆp2)− 2cov( ˆp1, ˆp2).
In general, it is a very difficult problem to calculate the covariance especially if the sampling is not simple random – it is definitely not easier than to compute the variance.
The approach we take, although not strictly mathematical, is nevertheless practical, and it does give a good approximation in most cases. The covariance is first computed for simple random sampling and then, when sampling is not simple random, all estimates in the formula become weighted with the number of respondents being replaced by the effective sample size.
We consider only two very common situations where there is a relatively simple formula for the standard error.
Case 1: The two proportions are related to different categories of the same question and are