McNemar’s Test - Nonparametric Statistical Methods Using R

2.7 χ 2 Tests

2.7.4 McNemar’s Test

McNemar’s test for significant change is used in many applications. The data are generally placed in a contingency table but the analysis is not the χ² -goodness-of-fit tests discussed earlier. A simple example motivates the test.

Suppose A and B are two candidates for a political office who are having a debate. Before and after the debate, the preference, A or B, of each member of the audience is recorded. Given a change in preference of candidate, we are interested in the difference in the change from B to A minus the change from A to B. If the estimate of this difference is significantly greater than 0, we might conclude that A won the debate.

For notation assume we are observing a pair of discrete random variables X and Y . In most applications, the ranges of X and Y have two values, say, {0, 1}.⁸. In our simple debate example, the common range can be written as {For A, For B}. Note that there are four categories (0, 0), (0, 1), (1, 0), (1, 1).

Let pij, i, j = 0, 1, denote the respective probabilities of these categories.

Consider the hypothesis

H0: p01− p¹⁰= 0 versus HA: p016= p¹⁰. (2.31) One-sided tests are of interest, also; for example, in the debate situation, the claim that B wins the debate is expressed by the alternative HA :

8See Hettmansperger and McKean (1973) for generalizations to more than two categories.

p01 > p10. Let (X1, Y1), . . . , (Xn, Yn) denote a random sample on (X, Y ).

Let Nij, i, j = 0, 1, denote the respective frequencies of the categories (0, 0), (0, 1), (1, 0), (1, 1). For convenience, the data can be written in the difference in two proportions in a multinomial setting; hence, the standard error of this estimate is given in expression (2.26). For convenience, we repeat it with the current notation.

SE(ˆp01− ˆp10) =

rpˆ01+ ˆp10− (ˆp01− ˆp10)²

n . (2.32)

The Wald test statistic is the z-statistic which is the ratio of ˆp01− ˆp10 over its standard error. Usually, though, a scores test is used. In this case the squared difference in the numerator of the standard error is replaced by 0, its parametric value under the null hypothesis. Then the square of the z-scores test statistic reduces to

χ²=(N01− N¹⁰)² N01+ N10

. (2.33)

Under H0, this test statistic has an asymptotic χ²-distribution with 1 degree of freedom. Letting χ²₀ be the realized values of the test statistic once the sample is drawn, the p-value of this test is 1 − Fχ²(χ²₀; 1). For a one-sided test, simply divide this p-value by 2.

Actually, an exact test is easily formulated. Note that this test is condi-tioned on the categories (0, 1) and (1, 0). Furthermore, the null hypothesis says that these two categories are equilikely. Hence under the null hypothesis, the statistic N01 has a binomial distribution with probability of success 1/2 and N01+ N10trials. So the exact p-value can be determined from this bino-mial distribution. While either the exact or the asymptotic p-value is easily calculated by R, we recommend the exact p-value.

Example 2.7.4 (Hodgkin’s Disease and Tonsillectomy). Hollander and Wolfe (1999) report on a study concerning Hodgkin’s disease and tonsillectomy. A theory purports that tonsils offer protection against Hodgkin’s disease. The data in the study consist of 85 paired observations of siblings. For each pair, one of the pair have Hodgkin’s disease and the other does not. Whether or not each had a tonsillectomy was also reported. The data are:

Sibling

Tonsillectomy (0) No Tonsillectomy (1)

Hodgkin’s Tonsillectomy (0) 26 15

Patients No Tonsillectomy (1) 7 37

If the medical theory is correct then p01> p10. So we are interested in a one-sided test. The following R calculations show how easily the test statistic and p-value (including the exact) are calculated:

> teststat <- (15-7)^2/(15+7)

> pvalue <- (1 - pchisq(teststat,1))/2

> pexact <- 1 - pbinom(14,(15+7),.5)

> c(teststat,pvalue,pexact)

[1] 2.90909091 0.04404076 0.06690025

If the level of significance is set at 0.05 then different conclusions may be drawn depending on whether or not the exact p-value is used.

Remark 2.7.1. In practice, the p-values for the χ²-tests discussed in this section are often the asymptotic p-values based on the χ²-distribution. For McNemar’s test we have the option of an exact p-value based on a binomial distribution. There are other situations where an exact p-value is an option.

One such case concerns contingency tables where both margins are fixed. For such cases, Fisher’s exact test can be used; see, for example, Chapter 2 of Agresti (1996) for discussion. The R function for the analysis is fisher.test.

One nonparametric example of this test concerns Mood’s two-sample median test (e.g. Hettmansperger and McKean 2011: Chapter 2). In this case, Fisher’s exact test is based on a hypergeometric distribution.

2.8 Exercises

2.8.1. Verify, via simulation, the level of the wilcox.test when sampling from a standard normal distribution. Use n = 30 and levels of α = 0.1, 0.05, 0.01. Based on the resulting estimate of α, the empirical level, obtain a 95% confidence interval for α.

2.8.2. Redo Exercise 2.8.1 for a t-distribution using 1,2,3,5,10 degrees of free-dom.

2.8.3. Redo Example 2.4.1 without a for loop and using the apply function.

2.8.4. Redo Example 2.3.2 without a for loop and using the apply function.

2.8.5. Suppose in a poll of 500 registered voters, 269 responded that they would vote for candidate P. Obtain a 90% percentile bootstrap confidence interval for the true proportion of registered voters who plan to vote for P.

2.8.6. For Example 2.3.1 obtain a 90% two-sided confidence interval for the treatment effect.

2.8.7. Write an R function which computes the sign analysis. For example, the following commands compute the statistic S⁺, assuming that the sample is in the vector x.

xt <- x[x!=0]; nt <- length(xt); ind <- rep(0,nt);

ind[xt > 0] <-1; splus <- sum(ind)

2.8.8. Calculate the sign test for the nursery school example, Example 2.3.1.

Show that the p-value for the one-sided sign test is 0.1445.

2.8.9. The data for the nursery school study were drawn from page 79 of Siegel (1956). In the data table, there is an obvious typographical error. In the 8th set of twins, the score for the the twin that stayed at home is typed as 82 when it should be 62. Rerun the signed-rank Wilcoxon and t-analyses using the typographical error value of 82.

2.8.10. The contaminated normal distribution is frequently used in simulation studies. A standardized variable, X, having this distribution can be written as

X = (1 − Iǫ)Z + cIǫZ,

where 0 ≤ ǫ < 1, I^ǫ has a binomial distribution with n = 1 and probability of success ǫ, Z has a standard normal distribution, c > 1, and Iǫ and Z are independent random variables. When sampling from the distribution of X, (1 −ǫ)100% of the time the observations are drawn from a N(0, 1) distribution but ǫ100% of the time the observations are drawn from a N (0, c²). These later observations are often outliers. The distribution of X is a mixture distribution;

see, for example, Section 3.4.1 of Hogg et al. (2013). We say that X has a CN (c, ǫ) distribution.

1. Using the R functions rbinom and rnorm, write an R function which obtains a random sample of size n from a contaminated nor-mal distribution CN (c, ǫ).

2. Obtain samples of size 100 from a N (0, 1) distribution and a CN (16, 0.25) distribution. Form histograms and comparison box-plots of the samples. Discuss the results.

2.8.11. Perform the simulation study of Example 2.3.2 when the population has a CN (16, 0.25) distribution. For the alternatives, select values of θ so the spread in empirical powers of the signed-rank Wilcoxon test ranges from approximately 0.05 to 0.90.

2.8.12. The ratio of the expected squared lengths of confidence intervals is a measure of efficiency between two estimators. Based on a simulation of size 10,000, estimate this ratio between the Hodges–Lehmann and the sample mean for n = 30 when the population has a standard normal distribution.

Use 95% confidence intervals. Repeat the study when the population has a t-distribution with 2 degrees of freedom.

2.8.13. Suppose the cure rate for the standard treatment of a disease is 0.60.

A new drug has been developed for the disease and it is thought that the cure rate for patients using it will exceed 0.60. In a small clinical trial 48 patients having the disease were treated with the new drug and 34 were cured.

(a) Let p be the probability that a patient having the disease is cured by the new drug. Write the hypotheses of interest in terms of p.

(b) Determine the p-value for the clinical study. What is the decision for a nominal level of 0.05?

2.8.14. Let p be the probability of success. Suppose it is of interest to test H0: p = 0.30 versus HA: p < 0.30.

Let S be the number of successes out of 75 trials. Suppose we reject H0, if S ≤ 16.

(a) Determine the significance level of the test.

(b) Determine the power of the test if the true p is 0.25.

(c) Determine the power function for the test for the sequence for the probabilities of success in the set {0.02, 0.03, . . . , 0.35}. Then obtain a plot of the power curve.

2.8.15. For the situation of Exercise 2.8.13, a larger clinical study was run.

In this study, patients were randomly assigned to either the standard drug or the new drug. Let p1and p2denote the cure rates for patients under the new drug and the standard drug, respectively. The hypotheses of interest are:

H0: p1= p2 versus HA: p1> p2. The results of the study are:

Treatment No. of Patients No. Cured

New Drug 200 135

Standard Drug 210 130

(a) Determine the p-value of the scores test (2.21). Conclude at the 5%

level of significance.

(b) Obtain the 95% confidence interval for p1− p2.

2.8.16. Simulate the power of the Wald and scores type two-sample propor-tions test for the hypotheses

H0: p1= p2 versus HA: p1> p2.

for the following situation. Assume that population 1 is Bernoulli with p1 = 0.6; population 2 is Bernoulli with p2= 0.5; the level is α = 0.05; and n1 = n2= 50. Recall that the call rbinom(m,n,p) returns m binomial variates with distribution bin(n, p).

2.8.17. In a large city, four candidates (Smith, Jones, Martinelli, and Wagner) are running for Mayor. A poll was conducted by random dialing with the following results:

Smith Jones Martinelli Wagner Others

442 208 460 180 205

Using a 95% confidence interval, determine if there is a significant difference between the two front runners.

2.8.18. In Example 2.7.1 we tested whether or not a dataset was drawn from a binomial distribution. For this exercise, generate a sample of size n = 500 from a truncated Poisson distribution as illustrated with the following R code:

x <- rpois(500,3) x[x >= 8] = 7

(a) Obtain a plot of the histogram of the sample.

(b) Obtain an estimate of the sample proportion (phat<-mean(x/7)).

2.8.19. Rasmussen (1992) presents the following data on a survey of workers in a large factory on two variables: their feelings concerning a smoking ban (Approve, Do not approve, Not sure) and Smoking status (Never smoked, Ex-smoker, Current smoker). Use the χ²-test to test the independence of these two variables. Using a post-test analysis, determine what categories contributed heavily to the dependence.

Approval of the smoking ban Smoking status Approve Do not approve Not sure

Never smoked 237 3 10

Ex-smoker 106 4 7

Current smoker 24 32 11

2.8.20. The following data are drawn from Agresti (1996). It concerns the approval ratings of a Canadian prime minister in two surveys. In the first survey, ratings were obtained on 1600 citizens and then in a second survey, six months later, the same citizens were resurveyed. The data are tabled below.

Use McNemar’s test to see if given a change in attitude toward the prime minister, the probability of going from approval to disapproval is higher than the probability of going from disapproval to approval. Also determine a 95%

confidence interval for the difference of these two probabilities.

Second survey First survey Approve Disapprove

Approve 794 150

Disapprove 86 570

2.8.21. Even though the χ²-tests of homogeneity and independence are the same, they are based on different sampling schemes. The scheme for the test of independence is one-sample of bivariate data, while the scheme for the test of homogeneity consists of one-sample from each population. Let C be a con-tingency table with r rows and c columns. Assume for the test of homogeneity that the rows contain the samples from the r populations. Determine the (large sample) confidence intervals for each of the following parameters under both schemes, where pij is the probability of cell (i, j) occurring. Write R code to obtain these confidence intervals assuming the input is a contingency table.

1. p11. 2. p11− p12.

2.8.22. Mendel’s early work on heredity in peas is well known. Briefly, he conducted experiments and the peas could be either round or wrinkled; yellow or green. So there are four possible combinations: RY, RG, WY, WG. If his theory were correct the peas would be observed in a 9:3:3:1 ratio. Suppose the outcome of the experiment yielded the the following observed data

RY RG WY WG

315 108 101 32 Calculate a p-value and comment on the results.

2.8.23. Suppose there are two ways of making widgets: process A and process B. Assume there is a reliable way in which to measure the overall quality of widgets made from either process such that higher value can be measured with some accuracy.

Suppose that a plant has 25 operators and each operator then makes a widget of each type in random order. The results are such that process A has more value than process B for 20 operators, B has more value than A for 3, and the measurements were not different for 2 operators. These data present pretty convincing evidence in favor of Process A. How likely is such a result due to chance if the processes were actually equal in terms of quality?

2.8.24. Conduct a Monte Carlo simulation to approximate the power of the test discussed in Example 2.4.3 when the true θ = 1.5.

2.8.25. Let 0 < α < 1. Suppose I1 and I2 are respective confidence intervals for two parameters θ1 and θ2both with confidence coefficient 1 − (α/2); that is,

Pθi[θi∈ Iⁱ] = 1 −α

2, i = 1, 2.

Show that the simultaneous confidence for both intervals is at least 1 − α, i.e., Pθ1,θ2[{θ1∈ I1} ∩ {θ2∈ I2}] ≥ 1 − α.

Hint: Use the method of complements and Boole’s inequality, P [A ∪ B] ≤

P (a) + P (B). Extend the argument to m intervals each with confidence co-efficient 1 − (α/2) to obtain a set of m simultaneous Bonferroni confidence intervals.

In document Nonparametric Statistical Methods Using R (Page 59-67)