TESTING
THE
STABILITY OF REGRESSION PARAMETERSWHEN
SOME ADDITIONALDATA
SETSARE AVAILABLE
R. RADHAKRISHNAN DONR. ROBINSON
Department
ofManasement
andQuantitativeMethodsIllinomState University Normal,Illinois61761
(Received June 2, 1992 and in revised form September 8, 1992)
ABSTRACT. We consider the problem of testing the stability of regression parameters in
regressionlines ofdifferentpopulationswhen some additional, but unidentified, datasetsfrom those populationsare available. Thestandardtest
(To)
discardstheadditionaldataand teststhe stability,oftheregressionparametersusing only the data setsfrom identified Iopulations. We proposetwo testprocedures(T1
andT2)
utilizingalltheavailabledata,becausethe additional datamaycontain informationabout theparameters oftheregressionlines whicharetested forstability. Apower comparisonamongthetestsisalsopresented. Itis shownthat
T1
alwayshas larger power thanTo.
Incertain situationsT2
hasthelargestpower.KEY WORDS AND PHRASES: leastsquares estimates,regression parameters,power of the
test.
1992AMSSUBJECT CLASSIFICATION CODES: 62J05, 62J99
1. INTRODUCTION. Consider theregression model
Yij ai +
#i(xij
)
+ij, i=1,2 k, j= 1,2,...,ni,(1.1)
wheretheYij are observations on theresponsevariable, the xij areobservationsonthepredictor
variable, a and #i are the regression parameters, and the ij are the error terms, which are
unobserved random variables. Itisassumed that the errorsareindependent,normallydistributed random variableswithmean0and commonunknown varianceo
2.
Forthemodel,ai +#i(xij"
)
isthe regressionlineofthe variableyon the predictorvariable xfor the th group,a isthe y-interceptwhenx
x,
and#i is the slope. Suppose we have m(m
_<k)
additional data setscorrespondingtomregressionlineswhose modelisgiven by
Yij =ai
+i(xij
)+ij,
i=k+l k+m, j=1,2 hi.(1.2)
Weassumethat the error termsil in model
(1.2)
are independent,normallydistributedrandomvariables with mean 0 and common unknown varianceo
2.
It is further assumed that the mHowever,we cannotidentifythe mregression lines associated withthe additionaldata sets
(Yii,
xij, k+ k+m, 1,2 ni). Weare interestedintestingthe nullhypothesisHo: ,
a2--...
Ok;1
2=... #k againstHa:
either’i*
ai’ ori*
i’ forat least onepair(i,i’), wherei, i’, i,i’=1,2 k, utilizingall the available data. The nullhypothesis impliesthat all thek regressionlines in model (1.1)arecoincidentwhereas the alternative hypothesisis that at leasttwoof the
regressionlinesare different. The standardtest
(To)
ofH0againstHausingthe k datasets(Yij,
xij, i=1,2 k, j= 1,2 ni) is well-known in the literature and has diverse applications. A biostatistician maybe interested intestingthe equivalence of regression linesfor predictingthe
systolicblood pressure using ageas thepredictorvariable for four social groups. A testfor the
stability oftheregression parameters thatgenerated the datasets isHo"
’
2 3 4;fl #2 #3 #4. IfH0 is true, we use a single regression line based upon the four data setsforpredicting systolicbloodpressure using ageasthepredictor variable,Klienbaumand
Kupper
[1]. An economist mightbe interested in testingthe equivalence ofmultiple regression models for predictingthegrossdomesticproduct usinglabor andcapitalaspredictorvariables fordifferent timeperiods,Maddala[2].Inthispaperweconsidertwotests
(T1
andT2)
utilizingall the available data and make apower comparisonbetween thesetwo testsand thestandardtestwhich is basedsolelyon the k datasetsrelatingtothe regressionlineswhose parametersaretestedfor stability. InSection2we determineleast squaresestimatesof the regression parameterstoobtaintheteststatisticsfor the problem. The noncentrality parameter of thetests is derived in Section 3. InSection 4 we derive ourproposedtests,
T
andT2. Weillustrate andcomparethepowerof all threetestsin Section5.
2. LEASTSQUARESESTIMATES. Consider the sum
k+m n
0
i=lE
j=IE
(Yij
"i’’i(xij
.))2,
(2.1)
where
i
and/i
are theestimatesof regression parametersaand/i
(i=1,2k+m).
The least squaresestimates of the regression parametersare obtainedby differentiating0
partiallywithrespectto
i
and/i
andthensolving the resulting normal equations fori
andB’i.
Itcanbe shownthat the leastsquaresestimatesofa
and/i
aregivenbyni
ai^
j=lX:
Yij/hi
(2.2)
and
ni ni
/0’
IlYiJ
(xij
)/1
(xij
)2.
j= j=l
(2.3)
Then
R
2,
the unconditional error sumof squares,is obtainedbysubstitutingi
and/’i
givenby(2.2)
and(2.3)
into0. Itcanbe shown thatk+m n k+m
I2 2
(Yij
.-)2
/i
2Si
2(2.4)
R2
i=l j=l
]=1
ni
Si
2 ]2(Xij
)2,
1,2 k+m.j=l
(2.5)
The conditional errorsumofsquares underH0isobtainedbyminimizing
k n k+rn n
1
i=lr
j=lr"
(Yij--/(xij
.]))2
+i=k+r,
1
j=lI
(Yij-ti-/’i(xij
.))2
(2.6)withrespectto
,
’, ti, and/i
(i k+ k+ m).The second sum on the right-hand side of
(2.6)
is minimized with respect toi
andi
(i=k+1k+m)
whereand/i
aredefined, respectively, asin equations(2.2)
and(2.3).
Theleastsquaresestimates oftheregression parametersa and # aregiven by
k n
ly
(2.7)
a r, 1
ij/n=y
i=lj=
k n k
/ r, r,
Yij
(xij
)/i
r,Si
2(2.8)
i=l j=l "=1
where
k
n= Iz ni.
(2.9)
i=l
The conditional error sum ofsquaresunderH0is
k n k k+rn n k+rn
R
r, r,(Yij "y--)2./2
ISi
2 + v,(Yij
"’t)
2V,/i
2Si
2i=1 j=l i=1 i=k+lj=l
i’=k+m
(2.10)
Thesumofsquares for testingthe nullhypothesisH0is
SSH0
R12
-I
k k k
I ni]-
y--)2
+lI/’i
2Si
2-/2
Si
2i=l i=l i=l,
k k
1
ni- y--)2
+ ISi
2i’)2,
i=l i=l
where
k k
/
Si
2/i/
Si2.
i=l i=l
Itiswell-knowninthe literature that
1
2/0
2is distributedaschi-squarewithk+m
n’--n+ r ni-2(k+m) i=k+l
(2.11)
(2.12)
degrees of freedom and
SSHo/o
2 is distributed asnoncentral chi-squarewith 2(k-l) degrees offreedom. WhenH0is true,
SSH0/o
2isdistributedaschi-squarewith2(k-1)degrees offreedom.Further,
l
2andSSH0areindependent;forexampleseeKshirsagar[3].3. NONCENTRALITY PARAMETER. Herewederivetheexpectedvalueof SSH0under the
non-nullcase. Itcanbe shownthat
k k k k
r.
ni(-y-)2
r
ni(ai--)2 + r. ni(7--)2 + 2 r.ni(ai--)(7--),
i=1 i=1 i=1 i=1
(3.1)
where
k
-=
r.niai/n
i=l(3.2)
k
r
,ij/ni,
i=l
(3.3)
k
-=
Eni/n.
i=l
(3.4)
Now
and
where
Taking expectations ofboth sides of
(3.1)
we obtaink k
E(.r,
niC-y-)2)
r, ni(ai--)2 + (k-1)a2.
1=1 i=1
k k
E( x:
i
2Si
2Si
2(o2/Si
2+/i2
i=l i=l
k ko2+ 1 #i2
Si
2i=l
k k k
E(/2
p.Si
2)
Si
2(a2/
Si
2+2),
i=l i=l i=l
k
=02
+-2
Si
2,
i=l
k k
=
Si23i/
Si2.
i=1 i=l
(3.5)
(3.6)
(3.7)
(3.8)
Using equations
(3.5), (3.6), (3.7),
and(2.11)
wegetk k
E(SSH0)
2(k-1)o
2 + r, ni(ai--)2 + r,Si2(Bi--)
2Since
SSHo/o
2 is distributed as noncentral chi-square with 2(k-l) degrees offreedom, itfollows from(3.9)that thenoncentralityparameterisgiven by
k k
,x
(1/02)[
12 ni(a.-)2
+ r,Si
2(fli.-)2
].(3.10)
i=1 i=1
4. TEST PROCEDURES. The standardprocedurefor testingHoagaimtHais an
F-test
based upontheteststatisticF0
(SSHo/2(k-1))/(R0
2/(n-2k)),
(4.1)
where
k n k
I2 I2
(Yij
.)2
I2$i2Si
2(4.2)
RO2
i=lj=l i=l
See, for example, Kshirsagar [3]. The above test rejects the null hypothesis H0 if F0 >
F,,2(k.),,.
and acceptsH0otherwise,whereF,f
,f2istheupper100a percentile point ofthe
F-distribution with
fl
numerator degrees of freedom(ndf)
andf2
denominator degrees of freedom(ddf).
Wenote that the standardtestis based upon the k datasets(Yi],
xi], i=1,2 k,1,2 ni)anddiscardstheadditionaldata
(Yi],
xi], k+1 k+ m, 1,2 ni).Considerthefollowingtestprocedure
(T1).
RejctH0ifF1
(SSHo/2(k-1))/(R0/n
’)
>Wa,2(k.1),n,
(4.3)
and acceptH0otherwise. Acomparison betweenTOandT1shows thatbothhavethe same ndf
but thatthe latterhaslarger ddf thanT0. Wefurthernotethat
T1
isbaseduponall theavailable data. Under the non-null case,both test statistics havenoncentralF-distributions with the samenoncentrality parameter, asin
(3.10).
ThereforeF1
willhavelargerpowerthanF0,Graybill[4].When the mregressionlines in
(1.2)
are anunidentifiedsubset ofthe kregressionlines in the model (1.1),testing Ho againstHais equivalenttotestingI-
thek+m regression lines areidenticalagaimt
I-1:
atleasttwoof them are different.Following the procedureoutlined in Section2,itcanbe shown that thesum ofsquares for testing
I-
iswhere
k+m k+m
SSI-I
Ihi(8
.&,,)2
+ ISi ./,)2,
(4.4)
i=l i=l
k+m k+m
a
.
ni
r, ni,(4.5)
i=1 =1
k+m k+m
/’
i=
1Si2/i/i’=ll
Si
2
(4.6)
where
and
k+m k+m
,X’=
(1/o2)[
E ni(ai-a)2 + :ESi2(fli-3)2l,
(4.7)
i=l i=l
k+m k+m
a I2
niai/
Z ni,(4.8)
i=l i=l
k+m k+m
/ E:
Si2/i/i
I2Si
2(4.9)
i=l =1
We notethat
SSI-I
can beobtainedfrom SSH0by replacingk with k+m. WhenI-
is true,thesamplingdistributionof
SSHo
2ischi-squarewith2(k + m-l) degreesof freedom. Further,SSI-I
and
R
2areindependent.Weuse anF-test
(T2)
totestI-
against basedupontheteststatisticF2
(SSIf’l)/(l2/n’),
(4.10)
where 2(k + m-l). Wereject
H
ifF2 >F,,fl,
n, and acceptI
otherwise. WhenI
is true, thesamplingdistributionofF2isnoncentralFwith ndfandn’ddfandnoncentrality parametera’. Weusenoncentral F-distribution tables tocomputethepowerof thetests. The next section illustrates andcomparesthepowerof thesethreetests.
5. POWER COMPARISONSOF
THE
TESTS. WhenH0andI-
are nottruetheteststatistics(4.1),
(4.3), and(4.10)
follow noncentralF-distributions. The non-null distributions ofthe test statisticsF0
and F1 have the same noncentrality parameter,a,
defined in(3.10).
The noncentrality parameter for the non-nulldistributionofF2
isx’ asdefined in(4.7).
Thendffor bothTO andT1
isf
2(k-l). For T2thendf is 2(k+ m-l). To hasddff2
n-2k, while theddf for
Tx
andT2isn’as defined in(2.13).
Tables 1, 2, and 3illustratethepowers ofTOandourproposedtests,
T1
andT2.We
chose,
0.05 and situations involving k 4 regressionlines. Thenumberof datasets consideredfrom unidentified populationsism, where 1_< m_<k. Forsimplicityweuseequal samplesizes(ni10)
for thek identifiedpopulations and equal samplesizes(n)
for the munidentified populations. Fromourearlier notationn
nk+ (i 1m).
Tables 1, 2,and3 differ in themagnitudeofni.Inthe tableswe denote the noncentrality parameter for the poweroftest T asi. The power ofeachtestisafunction
ofai
andthe relevantdegrees offreedom. Asindicatedabove,a0
"
1. Form k,each,x is aspecific value. Form< k,,x0and,x are(the same)
specific values,but,x2 varies dependinguponwhichunidentified populationsproducethem datasets. Forthis
reason wecalculatethetests’powers for selectedsetsof kregressionlines andvalues of
Si
2 Thedifferences between the parameters of these lines together with
Si
2 anda 2 affect ,xi. The parameters of the k lines,Si
2 and2
were chosentoproducethe three valuesindicatedfor,x0,sothat thepower ofTO isabout .25, .5, and.75. IfTO hasverysmallpower, then additionaldata provide verylittleimprovement. Conversely,when the powerofTO isvery large, thereis little
need for improvementwithadditional data.
Examinationsof the tablesproducethe followingobservations. The powers of
T
O arethethe powerofT isalwaysgreater than the powerofTO consistent with Graybill’s conclusion [4]
thatforagivenndf the powerofthe test increases as the ddf increases. Also, ineach table the
power of
T1
increases asmincreases, becausethe ndfremainsat2(k-l)while theddfincreasesbyn-2.
Likewise, for each value ofm the power ofT increases from Table through Table 3 becausethe ddf increases asaresult of then
increasingfrom5to7andfinallyto10.The powerofT2 isheavily influencedbythe choice ofregression lines for the additional
data whenm < k. Ineachtable thepowerofT2 doesnotconsistently increase as m increases.
The increasesin,x2andddfare sometimesoffset bythe increase in of2m. Foreach valueofm thepowerofT2generallyincreases from Table through Table3because of the same increase in
ddfasfor
T,
Butas Table 1indicates,forsmalln
relative to n thepower ofT2
may belower thanthe power ofT0,andisseldom much betterthanthepowerofT1. InTable 2 when then
approaches ni in size, improvements in the power ofT2 overthe power of
T
are noticeable.Table 3 indicates that whenthe
n
equalni,T2
issuperiortothepower ofT1
except occasionallyforsmall m.
6. APPLICATIONS. Usingadditionaldatafromunidentifiedpopulations improvesthepowerof thetestforstabilityof theparametersinkregressionlines. Theonly requirementisthat theerror termsof the regressionlines from allpopulations have a common variance. Thepower ofour
proposedtest,T1,isalwaysgreaterthan thepower of the standardtest,T0. Ifm, the number of
datasetsfromunidentified populations, isclosetokandifthe
n
are near theni,then T2 canproducealargerincrease inthepowerthanT1. Ifmissmall or if
n
issmallrelative to ni,thenT1
may beabetter choice thanT2.noncentrality parameters Powerofthetests
m A0,A1 )2
TO
T1
T2
4 4.454 6.681 .2502 .2622 .2409
4 9.085 13.627 .5016 .5252 .5069
4 14.866 22.299 .7500 .7750 .7744
3 4.454 5.707to 6.440 .2502 .2598 .2233to.2518
3 9.085 12.012to12.770 .5016 .5205 .4803to.5105
3 14.866 19.191to21.353 .7500 .7701 .7299to.7858
2 4.454 4.895to 6.240 .2502 .2570 .2115to.2684
2 9.085 10.648to12.055 .5016 .5151 .4645to.5243
2 14.866 16.601to20.564 .7500 .7645 .6944to.8050
1 4.454 4.650to 5.248 .2502 .2539 .2256to.2536
1 9.085 9.784 to10.405 .5016 .5089 .4738to.5029
1 14.866 15.637to17.399 .7500 .7579 .7139to.7687
Table1
m
noncentralityparameters Powerof thetests
A0,
A1
"2
TO
T1
T2
4 4.454 7.572 .2502 .2674 .2836
4 9.085 15.444 .5016 .5353 .5914
4 14.866 25.273 .7500 .7851 .8522
3 4.454 6.178to 7.228 .2502 .2643 .2481to.2913
3 9.085 13.127to14.217 .5016 .5294 .5388to.5810
3 14.866 20.826to23.919 .7500 .7792 .7874to.8529 2 4.454 5.071to 6.954 .2502 .2606 .2228 to.3057 2 9.085 11.287to 13.243 .5016 .5221 .5014to.5832
2 14.866 17.295to22.843 .7500 .7718 .7271to.8616 4.454 4.717to 5.519 .2502 .2560 .2310to.2692 1 9.085 10.022to10.854 .5016 .5131 .4900to .5287 1 14.866 15.900to 18.261 .7500 .7624 .7283to.7977
Table2
Powerof theF-testsfora =.05 k=4 n 10
n
=7m
noncentralityparameters Powerof thetests
)0,’
"2
’ro
T1
T2
4 4.454 8.908 .2502 .2730 .3498
4 9.085 18.170 .5016 .5459 .7024
4 14.866 29.732 .7500 .7955 .9270
3 4.454 6.867to 8.404 .2502 .2695 .2851 to .3522 3 9.085 14.776to16.373 .5016 .5393 .6189to.6756
3 14.866 23.220to27.749 .7500 .7891 .8539to.9207
2 4.454 5.336to 8.026 .2502 .2650 .2395 to .3632 2 9.085 12.230to15.025 .5016 .5306 .5540to.6643 2 14.866 18.336to26.262 .7500 .7805 .7703to.9210
1 4.454 4.807to 5.883 .2502 .2589 .2383to.2909
1 9.085 10.343 to 11.461 .5016 .5188 .5119to.5633
1 14.866 16.254to19.425 .7500 .7683 .7470to.8330
[1]
Table3
Powerof theF-testsfor --.05 k=4 n 10
n
10REFERENCES
KLIENBAUM,
D. G. and L. L.KUPPER,
Applied Regression and Other MultivariableMethods,Chapter13,PWS-Kent,Boston,MA, 1978.
[2]
MADDAI.A,
G.S.,IntroductiontoEconometrics,Chapter 4,MacMillian, NewYork,NY,1988.
[3] KSHIRSAGAR,A.M., ACoursein LinearModels, Chapter 3,MarcelDekker,NewYork,
NY,
1983.