167
Type II Error Problems in the Use
of Moderated Multiple Regression
for the Detection of Moderating Effects
of Dichotomous Variables
Eugene
F. Stone-RomeroUniversity at Albany, State University of New York
George
M.Alliger
University at Albany, State University of New York Herman
Aguinis
University of Colorado at Denver
Monte Carlo simulation
procedures
were used to assessthe power of
moderatedmultiple
regression( MMR )
to detect the effects of adichotomous moderator variable under conditions of:
( 1 )
between-group
differences
inwithin-group relationships
between two variables(
i.e., |ρ XY (1) -ρ XY (2) |=
.20,.40, .60 ) ; ( 2 ) different
combinedsample
sizes
for
the two groups( N 1
+ N2= NT= 30, 60, 90, 180,300);
and(
3
) differing proportions of
cases( P -i )
in the two groups( i.e.,
P1 = .10, .30,.50).
Results showed that, consistent with our a prioripredictions, the power of MMR increased as: ( 1 ) total sample size ( NT )
increased;( 2 )
thedifference
betweenwithin-group
correlationcoefficients
increased; and( 3 )
thedifference
between theproportion of
cases in each group decreased. Moreover, the simulation showed that these three variables had interactiveeffects
on power. The majorimplication of our findings
is that in cases where testsof moderating effects
are conducted with MMR and the proportionof
cases in eachgroup
differs greatly, inferences of
nomoderating effect
may beerroneous: Such
inferences
may be the resultof
low statistical power rather than the absenceof
amoderating effect.
Researchers in industrial and
organizational psychology, organizational behavior,
human resources management, and a host of otherdisciplines
areoften interested in
testing
for the existence ofmoderating effects, i.e.,
interactive effects of two variables(cf. Stone, 1988; Zedeck,1971).
In recent years,attempts
to detect such effects have relied
increasingly
on moderatedmultiple regression (MMR;
Cohen &Cohen, 1983; Stone, 1988; Zedeck, 1971).
MMR has beenDirect all correspondence to: Eugene F. Stone-Romero, Department of Psychology, University at Albany,
Stae University of New York, Albany, NY 12222.
Copyright @ 1994 by JAI Press Inc. 0149-2063
used to detect
moderating
effects for moderator variables that are measuredon both continuous
(e.g., age)
and dichotomous(e.g., gender)
scales. Inpersonnel psychology,
forinstance,
researchers have used MMR to determineif pre-employment
selection testsdifferentially predict job performance
for twogroups of
applicants (e.g., male, female; white, non-white). Unfortunately,
testsof
moderating
effects for dichotomous moderator variables may notalways provide
valid information about the existence of such effects. One reason for the failure to find effects in studies thatattempt
to show differences inpredictor-
criterion
relationships (e.g.,
correlationcoefficients, regression slopes)
for twogroups
(e.g., male, female)
is that the relative sizes of thesamples
of the twogroups
(N1, N2),
and thus theproportions (pi)
of individuals in the groups(pi, p2)
may differmarkedly
from one another. As a consequence of differences between pi and p2 there will be limits on themagnitudes
ofrelationships (e.g.,
zero-order correlation
coefficients)
between the moderator(Z)
and(1) X,
theother
predictor
variable(e.g.,
testscores);
and(2) Y,
thecontinuously
measuredcriterion variable
(e.g., job performance) (cf.
Cohen &Cohen, 1983; Nunnally, 1978).
The effect
of p i - P2
differences onrelationships
can be deduced from thefollowing
formula for thepoint-biserial
correlation coefficient(cf.
Cohen &Cohen, 1983; Nunnally, 1978):
where:
M1
= mean score ofGroup
I on the continuous variable M2 = mean score ofGroup
2 on the continuous variablepi =
proportion
of cases inGroup
1p2 =
proportion
of cases inGroup
2a = standard deviation of the continuous variable
As
Equation
1illustrates, assuming
a fixed difference between the meansof
Group
1 andGroup
2 on a continuous variable and a constant standard deviation for thisvariable,
the greater thediscrepancy
between pi and p2, the lower will be the valueof rpb (Cohen
&Cohen, 1983,
pp.66-67; Nunnally, 1978,
pp.
145-146).
In thelimiting
case, when either pi or p2equals 0,
rpb willequal
zero.
Moreover,
to the extent that pi - p2 differences affect estimates of themagnitudes
of zero-order correlations between(a)
a moderator variable(Z)
andanother continuous
predictor (X),
and(b)
a moderator variable and a criterionvariable
(1),
the results of MMRanalyses involving
the dichotomouspredictor
variable will be affected. The reason for this is that there are upper limits on
the
magnitudes
ofsquared semi-partial
correlationsinvolving
dichotomous moderator variables(cf.
Cohen &Cohen, 1983,
pp.89-90).
Theimportant implication
of theforegoing
is that such limits will influence theability
of MMRto detect
moderating
effects: Thegreater
the pi - p2difference,
the lower willbe the power of an MMR-based test of a
moderating
effect for Z. Stateddifferently,
thegreater
the pi - p2discrepancy,
thegreater
will be the risk ofthe researcher
committing
aType
II statistical error in the search formoderating
effects
(i.e., falsely concluding
that there is nomoderating effect).
A recent
study by Hattrup
and Schmitt(1990) provides
an illustration of thisproblem.
In thestudy,
MMR was used to test for differentialpredictor-
criterion
relationships
in groups formed on the basis ofsubject gender (male
vs.
female)
and race(white
vs.non-white).
Their MMRanalyses
used datafrom
samples
that differedsubstantially
from one another in terms of the relative sizes of eachsubgroup, leading
to substantial pi - p2 differences.More
specifically,
in tests for race-basedmoderating
effects therespective proportions
of whites and nonwhites were .831 and.169,
and in tests forgender-based moderating
effects therespective proportions
of males and females were .899 and .101. Their MMRanalyses
failed to showmoderating
effects for either race or
gender, leading
them to conclude that neither race norgender
were moderators of thepredictor-criterion relationships
considered
by
theirstudy.
Another recent
study
thatprovides
an illustration of the low statistical powerproblem
in tests of the effects of dichotomous moderator variables isCortina, Doherty, Schmitt, Kaufman,
and Smith(1992).
In thestudy
MMRwas used to test for the
moderating
effects of race onrelationships
betweenpersonality predictors
and several criterion variables(e.g., performance ratings).
Minority
group memberscomprised only 31 %
of thesample.
Results of 12separate
MMRanalyses
showed that there was nomoderating
effect for race,leading
the researchers to conclude that the tests were fair to bothmajority
andminority
groups.As a consequence of statistical power
problems,
we believe that the conclusions ofHattrup
and Schmitt(1990),
Cortina et al.(1992),
and a hostof others may not be valid: More
specifically,
we believe that the inference ofno
moderating
effect that stems from the use of MMR with data sets for which there is alarge
difference between pi and p2 levels may be a function ofType
II error. At present,
however,
there is no direct evidence on the extent to which differencesin pi andp2 levels
influence the power of MMR to detectmoderating
effects.
Therefore,
we conducted a Monte Carlo simulation to assess the power of MMR to detect the effects of a dichotomous moderator variable under conditions ofdiffering
pi and p2 levels. Threeindependent
variables weremanipulated
in our simulation:(1)
the overall size of thesample (NT
=~Vi + N2)
upon which the MMRanalysis
wasbased; (2)
theproportions
of individuals in two groups for which differentialprediction
was tested(pl, p2);
and(3)
thezero-order correlations between the continuous
predictor (X)
and the criterion(Y)
in thepopulation
from which cases weresampled.
Method Simulation
Design
and ProceduresMonte Carlo simulation
procedures
were used to determine the power of MMR to detect themoderating
effects of a dichotomous moderatorvariable,
Z,
on therelationship
between two othervariables,
X and Y. AQuickBASIC
4.5 program was used for the simulation. Prior to
using
the program to conduct thesimulation,
we conducted a number of checks to insure its accuracy.The simulation
design
was a 4(p~y<t) = population
correlation between variables X and Y inGroup 1)
X 4(PXY(2) = population
correlation between variables X and Y inGroup 2)
X 5(NT
= totalsample size)
X 3(Ni /
NT =proportion
of cases inGroup 1): (1)
TheGroup
1(Gt)
correlation between the continuouspredictor (X)
and the criterion(Y)
in thepopulation (pxYc,~)
wasset at values of
.2, .4, .6,
and.8; (2)
TheGroup
2(G2)
correlation between the continuouspredictor (X)
and the criterion( Y)
in thepopulation (pXY(2)
wasset at levels of
.2, .4, .6,
and.8; (3)
The totalsample
size(NT)
was set at levelsof
30, 60, 90, 180,
and300;
and(4)
Theproportions
of individuals in the two groups(pi
andp2)
for which differentialprediction
was tested were variedby setting p 1 at .1, . .3,
and .5.For each of the 240
resulting
cells of thedesign 2,000 samples
were drawnand for each such
sample
MMR was used to test for themoderating
effect ofZ on the
relationship
between X and Y. The existence of such an effect wasassessed
by testing
the statisticalsignificance
of theregression weight
forQ3
inthe
following
model:where: X. Z is a
product
term that carries information about the interaction between X and Z. These MMRanalyses
and the associated statistical testsyielded
information about the power of MMR to detect true(population) moderating
effects forsamples
that differed in terms of the threemanipulated
variables.
Results
Tables 1-5 show results of the simulation. More
specifically,
the tables show the percentagerejection
rates of the nullhypothesis
thatQ3
= 0.Rejection
ofthis
hypothesis signals
the existence of amoderating
effect of Z on therelationship
between X and Y.The results
presented
in Tables 1-5 showthat,
consistent with our apriori predictions,
the power of MMR to detect true,between-group
differences inwithin-group
correlation coefficients increased as theproportion
of cases in eachof the groups
(pi
andp2) approached
.5. In many cases powerdropped substantially
as the pi - p2 difference increased. Forexample,
for NT =90,
and a .60 difference between the
population
correlationcoefficients,
power reached .925 for pi =.50,
butdropped
to .341 for pi = .10. Themagnitude
of these substantial
changes
in power is illustrated inFigure
1 which showsrejection
rates of the nullhypotheses
ofQ3
= 0 for NT = 90 anddiffering
levelsof the other two
manipulated
variables.The results in Tables 1-5 also show that our other a
priori expectations
were confirmed
by
thefindings
of the simulation. Morespecifically,
the results showed that power increased as a function of both:(1)
themagnitude
of theNote: pxyii) a = zero-order correlation between Y and X for Group I and PXY(2) = zero-order correlation between Y and X for Group 2.
Note: pxy<i> = zero-order correlation between Y and X for Group I and PXY(2) = zero-order correlation between Y and X for Group 2.
Note: pxy<i> = zero-order correlation between Y and X for Group I and PXY(2) = zero-order correlation between Y and X for Group 2.
Note: pxy<i> = zero-order correlation between Y and X for Group I and PXY(2) = zero-order correlation between Y and X for Group 2.
Note: PXY(1) = zero-order correlation between Y and X for Group I and PXY(2) = zero-order correlation between Y and X for Group 2.
difference between the
population
correlations(I
PXY(L) -PXY(2) );
and(2)
theoverall size of the
sample (NT).
These effects can beclearly
seen inFigure
1.The same
figure
also illustrates how themanipulated
variables interacted withone another in
influencing
power.In order to better determine the nature of the main and interactive effects of the
manipulated variables,
we conducted a MMRanalysis
in which weregressed
statistical power estimates derived from our simulation(i.e.,
theproportion
of times that the nullhypothesis
ofQ3
= 0 wascorrectly rejected)
on variables
representing
the main and interactive effects of themanipulated
parameters (i.e.,
differences inproportions,
overallsample size,
and the absolute difference between the Fisher’s Zequivalents
of thepopulation
correlationcoefficients).
Because thedependent
variable was statistical power, data fromonly
the 180 cells of thedesign
for which themanipulated moderating
effectwas not
equal
to zero(i.e., PXY(L) 54 /~XY(2))
were used in this MMRanalysis.
At step one of the
analysis,
the main effects of the threepredictors (manipulated variables)
were entered into theequation.
Atstep
two, the three two-way interaction terms were entered.Finally,
atstep three,
thethree-way
interactionterm was entered. Because
linearity
wasassumed,
the actual values of the variables rather thandummy
codes were used in the MMRanalysis.
Table 6 shows the results of this
analysis.
As can be seen in thetable,
boththe two-and
three-way
interaction terms werestatistically significant.
The meanrejection
rates(i.e.,
powerlevels)
showed thefollowing pattern: (1)
In a numberFigure
1. Effects of differences in(a)
correlation coefficients for two groups and(b) proportions
of cases in the groups on the power of moderatedmultiple regression
todetect
moderating
effects.of conditions power was
relatively
low and increasedonly slightly
as p,(i.e.,
the
proportion
of cases inGroup
1 orGroup 2)
increased from .10 to .50 .Examples
of this are the condition for which NT = 30and pxr~~~
-PXY(2) I
=
.40,
and the condition for which N = 90and pxrcl~
-PXY(2) _ .20; (2)
In several other conditions power was
relatively
low and increasedmoderately
as p, rose from .10 to
.30,
but showed a lesserdegree
of increase as p; shifted from .30 to .50. Twoexamples
of this are the condition for which NT = 30and pxY~l>
-PXY(2) _ .60,
and the condition for which NT = 180and pxY~l~
-PXY(2) I =
.20.(c)
Inyet
other cases power wasmoderately high for p, = . .10,
increased
markedly
as p, rose to.3,
and reached veryhigh
levels at p; = .50.Two conditions show this pattern
(i.e.,
that for which NT = 90and pxYu> -
PXY(2) I = .60,
and that for which N7= 180and pxY~~>
- PXY(2)I
=.40;
and(4)
In still other conditions power was veryhigh
at p, _ . l and increasedonly
Table 6.
Regression
ofEmpirically
Derived Power Levels on VariablesManipulated
in the SimulationNotes: * p < .0 1
**p<.001 I
b = unstandardized regression coefficient B = standardized regression coefficient NT = total sample size
pi = proportion of cases in Group I
Alp = absolute difference between Fisher’s Z equivalents of pxr~,> and pxr(2) bo = .007088 for the raw score regression equation
slightly
or not at all as p, rose to levels of .3 and .5. A condition that serves as anexample
of this is that for which NT = 300and pxYm - PXY(2) ~ ==
.60.Overall,
the results in Tables 1 to 5 showthat,
ingeneral, high
levels of power(i.e.,
power >.90)
areonly
assured when three conditions aresimultaneously satisfied, i.e., sample
size islarge (i.e.,
>180),
the difference between correlation coefficients is alsolarge (i.e., ~ pxrm - PXY(2) > .60),
and the smallest of the p, levels is .30 or greater.It deserves
noting
that a pl - p2 difference leads to arelatively
lowdrop
in power when overall
sample
size isrelatively
low(e.g.,
NT =30). However,
the decrease in power is substantial when overall
sample
size isrelatively high (e.g.,
NT = 180 or300). Thus,
while arelatively large
NT may do much to enhance the odds ofrejecting
the nullhypothesis
that thesquared multiple
correlation coefficient in the
population ( ’1’2 )
=0,
the power to detect interaction effects can sufferdramatically
if there aremarked pl - p2
differences.Discussion
Overall,
ourfindings
show that the power of MMR to detectmoderating
effects when the moderator variable is dichotomous is very much a function of the relative
sample
sizes of the groups for which thestrength
ofpredictor-
criterion
relationships
isbeing compared. Specifically,
as the difference between theproportions
pi and p2increases,
statistical power decreases.These
findings suggest
that the failure of researchers to findmoderating
effects
(e.g.,
differentialprediction)
with MMR may be attributable to low statistical power. Oneexample
of this is the earlier citedstudy by Hattrup
andSchmitt
(1990).
Theirregression analyses
were based upon an overallsample
size of about 300. In the case of the test for race-based
moderating effects, approximately
10percent
of thesubjects
were members of theminority
group.Assuming
the existence of a true difference between thevalidity
coefficients for theminority
and whitesubgroups
aslarge
as .20(e.g.,
pxY = .20 in one group and .40 in theother)
for thesubjects
studiedby Hattrup
andSchmitt,
the power of MMR to detect this effect wouldonly
be. .199(cf.
Table5). Assuming
a lowereffect
size,
power would be even lower. Low statistical power would also seem to be aproblem
in thestudy by
Cortina et al.(1992)
in which numerous MMRanalyses
failed to show evidence of race-basedmoderating
effects. With asample
size of
approximately
300 andassuming
a true difference invalidity
coefficients of .20(e.g.,
pxy = .20 in one group and .40 in theother)
the power to detect race-basedmoderating
effects wouldonly
be about .38.Our MMR
analysis
in which power was thedependent
variable showedthat not
only
is power affectedby
a pl - p2difference,
but that the effect of the pi - p2 difference on power varies across levels ofsample
size and themagnitude
of the absolute difference between the correlation coefficients(i.e., I
pxry> -PXY(2) ).
Forexample,
if one of the p, levels is.10,
power will be below .10 when NT = 30and
pAry<i) &dquo;PXY(2) _ .20,
and will rise toonly
.48 when NT = 90
and pxr~ n - PXY(2)
1 &dquo; .60. Theimportant implication
ofthis is that when the
proportions
of cases in two groups differmarkedly
fromone
another,
there willgenerally
be very low power to detectmoderating
effectsfor the levels of
sample
sizes and correlation differences that aretypically
foundin
organizational
research. In view ofthis,
under conditions where statistical power islow,
researchers should bequite
cautious aboutconcluding
that thereis no
moderating
effect.Tables 1-5 show that the
rejection
rates varied as a function of whetherthe smallest group had the
largest
correlation coefficient. Forexample,
Table3 shows that when pxY~,> and PXY(2) were .2 and
.6, respectively,
therejection
rate
for pi
= .10 was .263.However,
for the sameproportion
value(i.e.,
pi-
.10),
therejection
rate for PXY(l) = .6 and PXY(2) _ .2 was .170. This difference in power for constant levels of NTand pxY~~>
-PXY(2) is
consistent with thefindings
of a simulationby Alexander, DeShon,
and Govern(1993).
Theirstudy
assessed the relative power of:
(1)
the F-test used to test formoderating
effectsin
MMR;
and(2)
theX2 test
of theequality
of correlation coefficients in two groups. In their simulation differentsample
sizes and effect sizes(i.e.,
p values within each of twogroups)
werepaired. The x2 test
was used as the benchmarkagainst
which the F test wascompared.
Results of their simulation showed that when the smallest group had the smallest correlation coefficient(p),
the F-testwas
overly
liberal.However,
when the smallest group had thelargest
correlationcoefficient,
the F test formoderating
effects wasoverly
conservative. This difference in power was attributable to the fact that the standard error of the estimate for theregression equation
wasgreatest
when thelargest sample
sizewas
paired
with the smallest correlation coefficient.It deserves
noting
that our simulation showed thatrejection
rates of thenull
hypothesis
ofa3
= 0 decreased as pidecreased, irrespective
of whetherthepi
value wasaccompanied by
thelargest
or smallest correlation coefficient.The
important implication
of this is that theability
to findmoderating
effectsusing
MMR will suffer whensample
sizes differ across two groups,resulting
in different p, levels.
On the basis of the
findings
of Alexander et al.(1993)
it seems clearthat,
in some
instances, the x2 test
should be used intesting
formoderating
effectssince it will be more
powerful
than the F test used in MMR. Morespecifically,
the
x2 test
ispreferable
to the F-testexcept
when the group with thelargest sample
size has the smallest error variance.Note, however,
that this conclusiononly applies
in instances when the moderator variable is a naturaldichotomy.
It does not
apply
when theX2 test
is used to test for themoderating
effect ofa variable that was
initially
continuous but was later converted(i.e.,
trans-formed)
to a dichotomous variable. Stateddifferently,
power will be greaterwhen MMR is used to test for the
moderating
effects of a moderator variable that is measured on a continuous scale than it will be for thex2 test
of thedifference between correlation coefficients for a moderator variable that is
artificially
dichotomous(cf.
Stone-Romero &Anderson,
inpress).
If MMR is used to test for
moderating effects,
the results in Table 6 shouldprovide
researchers with a convenient method forestimating
statistical power.Researchers need
only
use theb-weights
in Table 6 inconjunction
with theirestimates of total
sample size,
themagnitudes
ofwithin-group
correlationcoefficients,
and theproportion
of cases in one of the two groups. Onepossible
limitation of this
procedure, however,
is that our results are based on a simulation that had limited range on all of thepredictors.
Forexample,
oursimulation
limited pxru> - PXY(2) I to.60.
Giventhis,
estimates of power derived from the use of ourequation
should be conservative. This inference is based upon the results of a simulationby Aguinis (1993)
that showed that the powerto detect
moderating
effects is attenuated whenpredictor
variables have restricted range.In summary, our results suggest that researchers should be cautious in their
interpretation
of nullfindings
whenusing
MMR to test formoderating
effectsin cases where
they
aredealing
with:(1)
a dichotomous moderatorvariable;
and
(2) differing proportions
of cases in two groups. In suchinstances,
lowstatistical power
(i.e., Type
IIerror)
may very well serve as a viableexplanation
for the failure of MMR to detect
moderating
effects.Acknowledgment:
Portions of this article werepresented
at asymposium
atthe 1993 annual
meeting
of theSociety
for Industrial andOrganizational Psychology,
San Francisco. We thankRalph
Alexander andPhilip
Bobko forhelpful
comments on aprevious
version of this article.References
Aguinis, H. (1993). The detection of moderator variables using moderated multiple regression: The impact of range restriction, unreliability, sample size, and the proportion of subjects in subgroups.
Unpublished doctoral dissertation. University at Albany, State University of New York.
Alexander, R. A., DeShon, R. P., & Govern, D. M. (1993). Error variance heterogeneity and testing for regression slope differences in differential prediction research. Paper presented at the meeting of the Society for Industrial and Organizational Psychology. San Francisco, CA.
Cohen, J. & Cohen, P. (1983). Applied multiple regression/correlation for the behavioral sciences. Hillsdale,
NJ: Erlbaum.
Cortina, J. M., Doherty, M. L., Schmitt, N., Kaufman, G., & Smith, R. G. (1992). The "big five" personality
factors in the IPI and MMPI: Predictors of police performance. Personnel Psychology, 45: 119-140.
Hattrup, K. & Schmitt, N. (1990). Prediction of trades apprentices’ performance on job sample criteria.
Personnel Psychology, 43: 453-466.
Nunnally, J. C. (1978). Psychometric theory, 2nd ed. New York: McGraw-Hill.
Stone, E. F. (1988). Moderator variables in research: A review and analysis of conceptual and methodological
issues. Pp. 191-229 in G. R. Ferris & K. M. Rowland (Eds.), Research in personnel and human
resources management, Vol. 6. Greenwich, CT: JAI.
Stone-Romero, E. F., & Anderson, L. (in press). Relative statistical power of moderated multiple regression
and the comparison of subgroup correlation coefficients for the detection of moderating effects.
Journal of Applied Psychology.
Zedeck, S. (1971). Problems with the use of "moderator" variables. Psychological Bulletin, 76: 295-310.