• No results found

Testing the stability of regression parameters when some additional data sets are available

N/A
N/A
Protected

Academic year: 2020

Share "Testing the stability of regression parameters when some additional data sets are available"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

TESTING

THE

STABILITY OF REGRESSION PARAMETERS

WHEN

SOME ADDITIONAL

DATA

SETS

ARE AVAILABLE

R. RADHAKRISHNAN DONR. ROBINSON

Department

of

Manasement

andQuantitativeMethods

IllinomState University Normal,Illinois61761

(Received June 2, 1992 and in revised form September 8, 1992)

ABSTRACT. We consider the problem of testing the stability of regression parameters in

regressionlines ofdifferentpopulationswhen some additional, but unidentified, datasetsfrom those populationsare available. Thestandardtest

(To)

discardstheadditionaldataand teststhe stability,oftheregressionparametersusing only the data setsfrom identified Iopulations. We proposetwo testprocedures

(T1

and

T2)

utilizingalltheavailabledata,becausethe additional datamaycontain informationabout theparameters oftheregressionlines whicharetested for

stability. Apower comparisonamongthetestsisalsopresented. Itis shownthat

T1

alwayshas larger power than

To.

Incertain situations

T2

hasthelargestpower.

KEY WORDS AND PHRASES: leastsquares estimates,regression parameters,power of the

test.

1992AMSSUBJECT CLASSIFICATION CODES: 62J05, 62J99

1. INTRODUCTION. Consider theregression model

Yij ai +

#i(xij

)

+ij, i=1,2 k, j= 1,2,...,ni,

(1.1)

wheretheYij are observations on theresponsevariable, the xij areobservationsonthepredictor

variable, a and #i are the regression parameters, and the ij are the error terms, which are

unobserved random variables. Itisassumed that the errorsareindependent,normallydistributed random variableswithmean0and commonunknown varianceo

2.

Forthemodel,ai +

#i(xij"

)

isthe regressionlineofthe variableyon the predictorvariable xfor the th group,a isthe y-interceptwhenx

x,

and#i is the slope. Suppose we have m

(m

_<

k)

additional data sets

correspondingtomregressionlineswhose modelisgiven by

Yij =ai

+i(xij

)+ij,

i=k+l k+m, j=1,2 hi.

(1.2)

Weassumethat the error termsil in model

(1.2)

are independent,normallydistributedrandom

variables with mean 0 and common unknown varianceo

2.

It is further assumed that the m
(2)

However,we cannotidentifythe mregression lines associated withthe additionaldata sets

(Yii,

xij, k+ k+m, 1,2 ni). Weare interestedintestingthe nullhypothesis

Ho: ,

a2

--...

Ok;

1

2=... #k against

Ha:

either’i

*

ai’ ori

*

i’ forat least onepair(i,i’), wherei, i’, i,i’=1,2 k, utilizingall the available data. The nullhypothesis impliesthat all thek regression

lines in model (1.1)arecoincidentwhereas the alternative hypothesisis that at leasttwoof the

regressionlinesare different. The standardtest

(To)

ofH0againstHausingthe k datasets

(Yij,

xij, i=1,2 k, j= 1,2 ni) is well-known in the literature and has diverse applications. A biostatistician maybe interested intestingthe equivalence of regression linesfor predictingthe

systolicblood pressure using ageas thepredictorvariable for four social groups. A testfor the

stability oftheregression parameters thatgenerated the datasets isHo"

2 3 4;fl #2 #3 #4. IfH0 is true, we use a single regression line based upon the four data setsfor

predicting systolicbloodpressure using ageasthepredictor variable,Klienbaumand

Kupper

[1]. An economist mightbe interested in testingthe equivalence ofmultiple regression models for predictingthegrossdomesticproduct usinglabor andcapitalaspredictorvariables fordifferent timeperiods,Maddala[2].

Inthispaperweconsidertwotests

(T1

and

T2)

utilizingall the available data and make a

power comparisonbetween thesetwo testsand thestandardtestwhich is basedsolelyon the k datasetsrelatingtothe regressionlineswhose parametersaretestedfor stability. InSection2we determineleast squaresestimatesof the regression parameterstoobtaintheteststatisticsfor the problem. The noncentrality parameter of thetests is derived in Section 3. InSection 4 we derive ourproposedtests,

T

andT2. Weillustrate andcomparethepowerof all threetestsin Section

5.

2. LEASTSQUARESESTIMATES. Consider the sum

k+m n

0

i=l

E

j=IE

(Yij

"i’’i(xij

.))2,

(2.1)

where

i

and/i

are theestimatesof regression parametersa

and/i

(i=1,2

k+m).

The least squaresestimates of the regression parametersare obtainedby differentiating

0

partiallywith

respectto

i

and/i

andthensolving the resulting normal equations for

i

and

B’i.

Itcanbe shown

that the leastsquaresestimatesofa

and/i

aregivenby

ni

ai^

j=lX:

Yij/hi

(2.2)

and

ni ni

/0’

I

lYiJ

(xij

)/1

(xij

)2.

j= j=l

(2.3)

Then

R

2,

the unconditional error sumof squares,is obtainedbysubstituting

i

and/’i

givenby

(2.2)

and

(2.3)

into0. Itcanbe shown that

k+m n k+m

I2 2

(Yij

.-)2

/i

2

Si

2

(2.4)

R2

i=l j=l

]=1

(3)

ni

Si

2 ]2

(Xij

)2,

1,2 k+m.

j=l

(2.5)

The conditional errorsumofsquares underH0isobtainedbyminimizing

k n k+rn n

1

i=lr

j=lr"

(Yij--/(xij

.]))2

+i=k+r,

1

j=lI

(Yij-ti-/’i(xij

.))2

(2.6)

withrespectto

,

’, ti, and/i

(i k+ k+ m).

The second sum on the right-hand side of

(2.6)

is minimized with respect to

i

and

i

(i=k+1

k+m)

where

and/i

aredefined, respectively, asin equations

(2.2)

and

(2.3).

The

leastsquaresestimates oftheregression parametersa and # aregiven by

k n

ly

(2.7)

a r, 1

ij/n=y

i=lj=

k n k

/ r, r,

Yij

(xij

)/i

r,

Si

2

(2.8)

i=l j=l "=1

where

k

n= Iz ni.

(2.9)

i=l

The conditional error sum ofsquaresunderH0is

k n k k+rn n k+rn

R

r, r,

(Yij "y--)2./2

I

Si

2 + v,

(Yij

"’t)

2

V,/i

2

Si

2

i=1 j=l i=1 i=k+lj=l

i’=k+m

(2.10)

Thesumofsquares for testingthe nullhypothesisH0is

SSH0

R12

-I

k k k

I ni]-

y--)2

+

lI/’i

2

Si

2

-/2

Si

2

i=l i=l i=l,

k k

1

ni- y--)2

+ I

Si

2

i’)2,

i=l i=l

where

k k

/

Si

2/i/

Si2.

i=l i=l

Itiswell-knowninthe literature that

1

2/0

2is distributedaschi-squarewith

k+m

n’--n+ r ni-2(k+m) i=k+l

(2.11)

(2.12)

(4)

degrees of freedom and

SSHo/o

2 is distributed asnoncentral chi-squarewith 2(k-l) degrees of

freedom. WhenH0is true,

SSH0/o

2isdistributedaschi-squarewith2(k-1)degrees offreedom.

Further,

l

2andSSH0areindependent;forexampleseeKshirsagar[3].

3. NONCENTRALITY PARAMETER. Herewederivetheexpectedvalueof SSH0under the

non-nullcase. Itcanbe shownthat

k k k k

r.

ni(-y-)2

r

ni(ai--)2 + r. ni(7--)2 + 2 r.

ni(ai--)(7--),

i=1 i=1 i=1 i=1

(3.1)

where

k

-=

r.

niai/n

i=l

(3.2)

k

r

,ij/ni,

i=l

(3.3)

k

-=

E

ni/n.

i=l

(3.4)

Now

and

where

Taking expectations ofboth sides of

(3.1)

we obtain

k k

E(.r,

niC-y-)

2)

r, ni(ai--)2 + (k-1)a

2.

1=1 i=1

k k

E( x:

i

2

Si

2

Si

2

(o2/Si

2

+/i2

i=l i=l

k ko2+ 1 #i2

Si

2

i=l

k k k

E(/2

p.

Si

2)

Si

2(a2/

Si

2

+2),

i=l i=l i=l

k

=02

+-2

Si

2,

i=l

k k

=

Si23i/

Si2.

i=1 i=l

(3.5)

(3.6)

(3.7)

(3.8)

Using equations

(3.5), (3.6), (3.7),

and

(2.11)

weget

k k

E(SSH0)

2(k-1)o

2 + r, ni(ai--)2 + r,

Si2(Bi--)

2
(5)

Since

SSHo/o

2 is distributed as noncentral chi-square with 2(k-l) degrees offreedom, it

follows from(3.9)that thenoncentralityparameterisgiven by

k k

,x

(1/02)[

12 ni(a

.-)2

+ r,

Si

2(fli

.-)2

].

(3.10)

i=1 i=1

4. TEST PROCEDURES. The standardprocedurefor testingHoagaimtHais an

F-test

based upontheteststatistic

F0

(SSHo/2(k-1))/(R0

2/(n-2k)),

(4.1)

where

k n k

I2 I2

(Yij

.)2

I2

$i2Si

2

(4.2)

RO2

i=lj=l i=l

See, for example, Kshirsagar [3]. The above test rejects the null hypothesis H0 if F0 >

F,,2(k.),,.

and acceptsH0otherwise,where

F,f

,f

2istheupper100a percentile point ofthe

F-distribution with

fl

numerator degrees of freedom

(ndf)

and

f2

denominator degrees of freedom

(ddf).

Wenote that the standardtestis based upon the k datasets

(Yi],

xi], i=1,2 k,

1,2 ni)anddiscardstheadditionaldata

(Yi],

xi], k+1 k+ m, 1,2 ni).

Considerthefollowingtestprocedure

(T1).

RejctH0if

F1

(SSHo/2(k-1))/(R0/n

’)

>

Wa,2(k.1),n,

(4.3)

and acceptH0otherwise. Acomparison betweenTOandT1shows thatbothhavethe same ndf

but thatthe latterhaslarger ddf thanT0. Wefurthernotethat

T1

isbaseduponall theavailable data. Under the non-null case,both test statistics havenoncentralF-distributions with the same

noncentrality parameter, asin

(3.10).

Therefore

F1

willhavelargerpowerthanF0,Graybill[4].

When the mregressionlines in

(1.2)

are anunidentifiedsubset ofthe kregressionlines in the model (1.1),testing Ho againstHais equivalenttotesting

I-

thek+m regression lines are

identicalagaimt

I-1:

atleasttwoof them are different.

Following the procedureoutlined in Section2,itcanbe shown that thesum ofsquares for testing

I-

is

where

k+m k+m

SSI-I

I

hi(8

.&,,)2

+ I

Si ./,)2,

(4.4)

i=l i=l

k+m k+m

a

.

ni

r, ni,

(4.5)

i=1 =1

k+m k+m

/’

i=

1Si2/i/i’=ll

Si

2

(4.6)

(6)

where

and

k+m k+m

,X’=

(1/o2)[

E ni(ai-a)2 + :E

Si2(fli-3)2l,

(4.7)

i=l i=l

k+m k+m

a I2

niai/

Z ni,

(4.8)

i=l i=l

k+m k+m

/ E:

Si2/i/i

I2

Si

2

(4.9)

i=l =1

We notethat

SSI-I

can beobtainedfrom SSH0by replacingk with k+m. When

I-

is true,the

samplingdistributionof

SSHo

2ischi-squarewith2(k + m-l) degreesof freedom. Further,

SSI-I

and

R

2areindependent.

Weuse anF-test

(T2)

totest

I-

against basedupontheteststatistic

F2

(SSIf’l)/(l2/n’),

(4.10)

where 2(k + m-l). Wereject

H

ifF2 >

F,,fl,

n, and accept

I

otherwise. When

I

is true, thesamplingdistributionofF2isnoncentralFwith ndfandn’ddfandnoncentrality parameter

a’. Weusenoncentral F-distribution tables tocomputethepowerof thetests. The next section illustrates andcomparesthepowerof thesethreetests.

5. POWER COMPARISONSOF

THE

TESTS. WhenH0and

I-

are nottruetheteststatistics

(4.1),

(4.3), and

(4.10)

follow noncentralF-distributions. The non-null distributions ofthe test statistics

F0

and F1 have the same noncentrality parameter,

a,

defined in

(3.10).

The noncentrality parameter for the non-nulldistributionof

F2

isx’ asdefined in

(4.7).

Thendffor bothTO and

T1

is

f

2(k-l). For T2thendf is 2(k+ m-l). To hasddf

f2

n-2k, while the

ddf for

Tx

andT2isn’as defined in

(2.13).

Tables 1, 2, and 3illustratethepowers ofTOandourproposedtests,

T1

andT2.

We

chose

,

0.05 and situations involving k 4 regressionlines. Thenumberof datasets consideredfrom unidentified populationsism, where 1_< m_<k. Forsimplicityweuseequal samplesizes(ni

10)

for thek identifiedpopulations and equal samplesizes

(n)

for the munidentified populations. Fromourearlier notation

n

nk+ (i 1

m).

Tables 1, 2,and3 differ in themagnitudeofni.

Inthe tableswe denote the noncentrality parameter for the poweroftest T asi. The power ofeachtestisafunction

ofai

andthe relevantdegrees offreedom. Asindicatedabove,

a0

"

1. Form k,each,x is aspecific value. Form< k,,x0and,x are

(the same)

specific values,

but,x2 varies dependinguponwhichunidentified populationsproducethem datasets. Forthis

reason wecalculatethetests’powers for selectedsetsof kregressionlines andvalues of

Si

2 The

differences between the parameters of these lines together with

Si

2 anda 2 affect ,xi. The parameters of the k lines,

Si

2 and

2

were chosentoproducethe three valuesindicatedfor,x0,so

that thepower ofTO isabout .25, .5, and.75. IfTO hasverysmallpower, then additionaldata provide verylittleimprovement. Conversely,when the powerofTO isvery large, thereis little

need for improvementwithadditional data.

Examinationsof the tablesproducethe followingobservations. The powers of

T

O arethe
(7)

the powerofT isalwaysgreater than the powerofTO consistent with Graybill’s conclusion [4]

thatforagivenndf the powerofthe test increases as the ddf increases. Also, ineach table the

power of

T1

increases asmincreases, becausethe ndfremainsat2(k-l)while theddfincreasesby

n-2.

Likewise, for each value ofm the power ofT increases from Table through Table 3 becausethe ddf increases asaresult of the

n

increasingfrom5to7andfinallyto10.

The powerofT2 isheavily influencedbythe choice ofregression lines for the additional

data whenm < k. Ineachtable thepowerofT2 doesnotconsistently increase as m increases.

The increasesin,x2andddfare sometimesoffset bythe increase in of2m. Foreach valueofm thepowerofT2generallyincreases from Table through Table3because of the same increase in

ddfasfor

T,

Butas Table 1indicates,forsmall

n

relative to n thepower of

T2

may belower thanthe power ofT0,andisseldom much betterthanthepowerofT1. InTable 2 when the

n

approaches ni in size, improvements in the power ofT2 overthe power of

T

are noticeable.

Table 3 indicates that whenthe

n

equalni,

T2

issuperiortothepower of

T1

except occasionally

forsmall m.

6. APPLICATIONS. Usingadditionaldatafromunidentifiedpopulations improvesthepowerof thetestforstabilityof theparametersinkregressionlines. Theonly requirementisthat theerror termsof the regressionlines from allpopulations have a common variance. Thepower ofour

proposedtest,T1,isalwaysgreaterthan thepower of the standardtest,T0. Ifm, the number of

datasetsfromunidentified populations, isclosetokandifthe

n

are near theni,then T2 can

producealargerincrease inthepowerthanT1. Ifmissmall or if

n

issmallrelative to ni,then

T1

may beabetter choice thanT2.

noncentrality parameters Powerofthetests

m A0,A1 )2

TO

T1

T2

4 4.454 6.681 .2502 .2622 .2409

4 9.085 13.627 .5016 .5252 .5069

4 14.866 22.299 .7500 .7750 .7744

3 4.454 5.707to 6.440 .2502 .2598 .2233to.2518

3 9.085 12.012to12.770 .5016 .5205 .4803to.5105

3 14.866 19.191to21.353 .7500 .7701 .7299to.7858

2 4.454 4.895to 6.240 .2502 .2570 .2115to.2684

2 9.085 10.648to12.055 .5016 .5151 .4645to.5243

2 14.866 16.601to20.564 .7500 .7645 .6944to.8050

1 4.454 4.650to 5.248 .2502 .2539 .2256to.2536

1 9.085 9.784 to10.405 .5016 .5089 .4738to.5029

1 14.866 15.637to17.399 .7500 .7579 .7139to.7687

Table1

(8)

m

noncentralityparameters Powerof thetests

A0,

A1

"2

TO

T1

T2

4 4.454 7.572 .2502 .2674 .2836

4 9.085 15.444 .5016 .5353 .5914

4 14.866 25.273 .7500 .7851 .8522

3 4.454 6.178to 7.228 .2502 .2643 .2481to.2913

3 9.085 13.127to14.217 .5016 .5294 .5388to.5810

3 14.866 20.826to23.919 .7500 .7792 .7874to.8529 2 4.454 5.071to 6.954 .2502 .2606 .2228 to.3057 2 9.085 11.287to 13.243 .5016 .5221 .5014to.5832

2 14.866 17.295to22.843 .7500 .7718 .7271to.8616 4.454 4.717to 5.519 .2502 .2560 .2310to.2692 1 9.085 10.022to10.854 .5016 .5131 .4900to .5287 1 14.866 15.900to 18.261 .7500 .7624 .7283to.7977

Table2

Powerof theF-testsfora =.05 k=4 n 10

n

=7

m

noncentralityparameters Powerof thetests

)0,’

"2

’ro

T1

T2

4 4.454 8.908 .2502 .2730 .3498

4 9.085 18.170 .5016 .5459 .7024

4 14.866 29.732 .7500 .7955 .9270

3 4.454 6.867to 8.404 .2502 .2695 .2851 to .3522 3 9.085 14.776to16.373 .5016 .5393 .6189to.6756

3 14.866 23.220to27.749 .7500 .7891 .8539to.9207

2 4.454 5.336to 8.026 .2502 .2650 .2395 to .3632 2 9.085 12.230to15.025 .5016 .5306 .5540to.6643 2 14.866 18.336to26.262 .7500 .7805 .7703to.9210

1 4.454 4.807to 5.883 .2502 .2589 .2383to.2909

1 9.085 10.343 to 11.461 .5016 .5188 .5119to.5633

1 14.866 16.254to19.425 .7500 .7683 .7470to.8330

[1]

Table3

Powerof theF-testsfor --.05 k=4 n 10

n

10

REFERENCES

KLIENBAUM,

D. G. and L. L.

KUPPER,

Applied Regression and Other Multivariable

Methods,Chapter13,PWS-Kent,Boston,MA, 1978.

[2]

MADDAI.A,

G.S.,IntroductiontoEconometrics,Chapter 4,MacMillian, NewYork,NY,

1988.

[3] KSHIRSAGAR,A.M., ACoursein LinearModels, Chapter 3,MarcelDekker,NewYork,

NY,

1983.

References

Related documents

This paper discusses the practice of China Telecom in the past two years, &#34;the use of mobile Internet, changes in service, improve efficiency, improve customer perception&#34;

The bulk of funding for primary and secondary schools in the seven predominantly Karen camps is provided by ZOA Refugee Care Thailand (ZOA), an international NGO.. The rest of

PhD in Computational and Applied Mathematics, University of Milan, 1998, Doctoral Dissertation: Mathematical and Numerical Modeling of Blood Flow Problems, July 1998, Alfio

Rady MY, Rivers EP, Nowak RM: Resuscitation of the critically ill in the ED: responses of blood pressure, heart rate, shock index, central venous oxygen saturation, and lactate..

The Service Director, Finance, as the Council’s Chief Finance Officer, will confirm to Council (as required by the Local Government Act 2003) that the spending plans

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

1. Handpick stones from a representative portion of the cleaned sample. Determine stone concentration in the net sample. • In western Canada samples of grain containing stones in

Music in new games is very diverse and suited to the game and of course the graphics, which should not be compared to the graphics from the.. old