Chapter 25
Paired Data
Data are paired when the observations are collected in
pairs or the observations in one group are naturally related to observations in the other group.
Paired data arise in a number of ways. Perhaps the most
common is to compare subjects with themselves before and after a treatment.
When pairs arise from an experiment, the pairing is a
type of blocking.
When they arise from an observational study, it is a
Paired Data (cont.)
If you know the data are paired, you can (and must!) take
advantage of it.
To decide if the data are paired, consider how they
were collected and what they mean (check the W’s).
There is no test to determine whether the data are
paired.
Once we know the data are paired, we can examine the
pairwise differences.
Because it is the differences we care about, we treat
Paired Data (cont.)
Now that we have only one set of data to
consider, we can return to the simple one-sample
t-test.
Mechanically, a
paired
t
-test
is just a one-sample
t-test for the mean of the pairwise differences.
Assumptions and Conditions
Paired Data Assumption:
The data must be paired.
Independence Assumption:
The differences must
be independent of each other. Check the:
Randomization Condition
Normal Population Assumption:
We need to
assume that the population of
differences
follows a
Normal model.
Nearly Normal Condition:
Check this with a
The Paired
t
-Test
When the conditions are met, we are ready to test
whether the paired differences differ significantly
from zero.
We test the hypothesis H
0
:
d=
0, where the d’s
The Paired
t
-Test (cont.)
We use the statistic
where n is the number of pairs.
is the ordinary standard error for the mean
applied to the differences.
When the conditions are met and the null hypothesis is true, this
statistic follows a Student’s t-model on n – 1 degrees of freedom, so we can use that model to obtain a P-value.
01 n
d
t
SE d
sdSE d
n
Confidence Intervals for Matched Pairs
When the conditions are met, we are ready to find the confidence
interval for the mean of the paired differences.
The confidence interval is
where the standard error of the mean difference is
The critical value t* depends on the particular confidence level, C, that you specify and on the degrees of freedom, n – 1, which is based on the number of pairs, n.
s
dSE d
n
1
n
Example 1: Pulse Rates Once Again
One indicator of physical fitness is resting pulse
rate. Ten men volunteered to test an exercise
device advertised on television by using it three
times a week for 20 minutes. Their resting pulse
rates were measured before the test began, and
then again after six weeks. Results are shown in
the table. Is there evidence that this kind of
Example 1: Table of Data
Pulse Rates(bpm)
Subject Before After
Allen 73 73
Brandon 83 79
Carlos 85 81
David 87 86
Edwin 91 87
Franco 99 91
Graeme 87 84
Hans 85 83
Example 1: continued
The before and after measurements are not
independent – we CAN’T use a two sample t-test.
When this happens we will look at the paired
Example 1: continued
Hypothesis:
The null hypothesis is that there will be no difference in the pulse rates before and after using the exercise device. The alternative hypothesis is that the exercise device will cause a decrease the resting pulse rate. The difference will be final pulse rate – beginning pulse rate.
Example 1: continued
Model:
1.
The men are not selected randomly, but we assume
that they are a representative sample of adult men.
2.
Independence: The pulse rate of one man does not
influence the pulse rate of another.
3.
10 is certainly less than 10% of all men.
4.
Since n<30, we will do a boxplot. Since the graph of
the paired differences is unimodal and
Example 1: continued
The P value of .0034 is much lower than our level of
significance of .05. Therefore, using the matched
pairs t-test, we reject the null hypothesis. There
is strong evidence that this form of exercise can
reduce resting pulse rates.
90% confidence interval: (-4.266,-1.334)
I am 90% confident that mean decrease in the
Blocking
Consider estimating the
mean difference in age between husbands and wives.
The following display is
worthless. It does no good to compare all the wives as a group with all the
Blocking (cont.)
In this case, we have paired data—each husband is
paired with his respective wife. The display we are interested in is the difference in ages:
Blocking (cont.)
Pairing removes the extra variation that we saw in
the side-by-side boxplots and allows us to
concentrate on the variation associated with the
difference in age for each pair.
*The Sign Test Again?
Because we have paired data, we’ve been using a simple
t-test for the paired differences.
This suggests that if we want a distribution-free method, a
sign test on the paired differences testing whether the
median of the differences is 0 is appropriate.
The advantage of the sign test for matched pairs is that
we don’t require the Nearly Normal Condition for the paired differences.
However, if the assumptions of a paired t-test are met,
What Can Go Wrong?
Don’t use a two-sample
t-test for paired data.
Don’t use a paired-t method when the samples
aren’t paired.
Don’t forget outliers—the outliers we care about
now are in the differences.
Don’t look for the difference in side-by-side
What have we learned?
Pairing can be a very effective strategy.
Because pairing can help control variability between
individual subjects, paired methods are usually more powerful than methods that compare individual groups.
Analyzing data from matched pairs requires different
inference procedures.
Paired t-methods look at pairwise differences.
We test hypotheses and generate confidence intervals based
on these differences.
We learned to Think about the design of the study that