• No results found

Test for intercept after pre-testing on slope - a robust method

N/A
N/A
Protected

Academic year: 2019

Share "Test for intercept after pre-testing on slope - a robust method"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Test for intercept after pre-testing on slope

- a robust method

Rossita M Yunus and Shahjahan Khan Department of Mathematics and Computing Australian Centre for Sustainable Catchments

University of Southern Queensland Toowoomba, Q 4350, AUSTRALIA.

Emails:yunus@usq.edu.au and khans@usq.edu.au

Abstract

The idea of using non-sample prior information in the form of pre-testing for improving properties of estimators is applied in the testing regime to achieve better power of the ultimate test in this paper. In particular, to test the intercept of a simple regression model, prior information from previous investigations or expert knowledge on the suspected value of the slope is potentially beneficial. Any uncertainty on the value of the slope is removed by performing a pre-test before testing the significance of the intercept. The impact of the pre-test on the performance (power and size) of the ultimate test is studied. A robust procedure based on M-estimator is used to formulate a test and deriving its power function. It is shown that the ultimate test based pre-test achieves a reasonable dominance over the others asymptotically and performs better for larger coefficient of variation.

Keywords: pre-test, asymptotic size, asymptotic power, M-estimation, regression model.

1

Introduction

Consider a simple regression model ofnobservable random variables,Xi, i= 1, . . . , n

Xi =θ+βci+ei, (1.1)

where the errorsei’s are from an unspecified symmetric and continuous distribution function,

Fi, i= 1, . . . , n,theci’s are known real constants of the explanatory variable andθ andβ are

the unknown intercept and slope parameters respectively.

Testing the significance of the intercept, H?

0 :θ = 0 are carried out under three possible

cases based on the knowledge of the slope. In the first case, the slope is unspecified i.e. it is treated as a nuisance parameter and the testing of the significance of the intercept is referred as the unrestricted test (UT). If a non-sample prior information on the value of the slope (say

(2)

0) is known, the testing of the significance of the intercept is defined as the restricted test (RT). If the prior information on the value of the slope is uncertain, it is suggested to perform a pre-test to remove the uncertainty on the value of the slope before testing the significance of the intercept. For the final case, the ultimate test (on the intercept) is defined as the pre-test test (PTT). Obviously the pre-test (on the slope) affects the power and size of the ultimate test (on the intercept).

In spite of numerous works in improving the estimation by the inclusion of a non-sample prior information (Saleh, 2006, Khan and Saleh, 2001 and Khan et al., 2002), very little attention has been paid in improving the performance of the test of the parameters. The effect of pre-test on the performance (size and power) of the ultimate test are studied in parametric cases (Bechhofer, 1951 and Bozivich et al., 1956) as well as non-parametric cases (Saleh and Sen, 1982) though the number of studies is very small in literature. Saleh and Sen (1981) use rank based nonparametric tests and formulate the power function of the ultimate test. However, there are some limited discussions in investigating the power of the ultimate test discussed in the paper.

Realizing M-estimation is more popular than the other robust methods and well known for its flexibility and well defined for a variety of models for which MLE is also defined (Huber,

1981, p.43, Jurˇeckov´a and Sen, 1996 p.80), a score type M-test is proposed to formulate

the power functions of the UT, RT and PTT (see Yunus and Khan, 2007). The asymptotic power functions for the UT, RT and PTT that are derived using M-test are found to have the same form as that derived by using the rank statistic by Saleh and Sen (1982) though the methodology of M-estimation and R-estimation is different. The paper discussed the asymptotic comparison of the UT, RT and PTT analytically and computationally for a special case of the value of coefficient of variation. The dependency of power functions to the coefficient of variation is studied in this paper as an extension of the previous works.

Along with some preliminary notions, the method of M-estimation is presented and sta-tistical tests concerning testing on the intercept, namely, the UT, RT and PTT are given in Section 2. The asymptotic distributions of the test statistics and the asymptotic power func-tions of the test are given in Section 3. Section 4 is devoted to the analytical results comparing the asymptotic power functions of the UT, RT and PTT while the investigation of the power functions through an illustrative example is presented in Section 5.

2

The Proposed Test

Given an absolutely continuous functionρ :< → <, M-estimator of θ and β is defined as the

values ofθ and β that minimize the objective function

n X

i=1

(3)

M-estimator ofθ andβ can also be defined as the solutions of the system of equations,

Pn

i=1ψθ(Xi) = Pn

i=1ψ(Xi−θ−βci) = 0,

Pn

i=1ψβ(Xi) = Pn

i=1ciψ(Xi−θ−βci) = 0.

(2.2)

If ρ is differentiable with partial derivatives ψθ = ∂ρ/∂θ and ψβ = ∂ρ/∂β, then the

M-estimators that minimize the function in (2.1) are the solutions to the system (2.2). On the contrary, the M-estimators obtained from solving system (2.2) may not minimize equation (2.1) (c.f. Caroll and Rupert, 1988 p.210). The system of equations (2.2) may have more roots,

while only one of them leads to a global minimum of (2.1). Jurˇeckov´a and Sen (1996) have

given proof that there exists at least one root of (2.2) which is a√n- consistent estimator of

θ andβ under some conditions [c.f. p.215 - 224]. Theψ function is decomposed into the sum

ψ=ψa+ψc+ψs,

where

(a) ψa is absolutely continuous function with absolutely continuous derivative.

(b)ψcis a continuous, piecewise linear function with knots at µ1, . . . , µk, that is, constant in

a neighborhood of±∞ and hence its derivative is a step functionψ0c(z) =αv, µv < z <

µv+1, v = 0,1, . . . , k where α0, . . . , αk ² <, α0 =αk = 0 and −∞ = µ0 < µ1 < . . . <

µk+1=. We assume thatf(z) = dFdz(z) is bounded in neighborhoods ofSµ1, . . . , Sµk.

(c) ψs is a nondecreasing step function, ψs(z) = λv, qv < z qv+1, v = 1, . . . , m where

−∞ = q0 < q1 < . . . < qm+1 = and −∞ < λ0 < λ1 < . . . < λm < ∞. We assume

that 0< f(z) = (d/dz)F(z) and f0(z) = (d2/dz2)F(z) are bounded in neighborhoods of

Sq1, . . . , Sqm.

The asymptotic result under conditions M1 to M5 of Jurˇeckov´aand Sen (1996, p.217) is used

in this paper. Further assume that all ψa, ψc and ψs are nondecreasing and skew symmetric

that is ψj(−x) =−ψj(x), j= 1,2,3.Let F be symmetric about 0, so that

Z

−∞

ψ(x)dF(x) = 0.

Assume that

σ02=

Z

−∞

ψ2(x)dF(x). (2.3)

Following Jurˇeckov´aand Sen (1996, p.217), two cases are considered:

(i) if ψs= 0 then

γ =

Z

−∞

(ψa0(x) +ψc0(x))f(x)dx. (2.4)

(ii) ifψa=ψc= 0, then

(4)

Further assume that σ0 and γ are both positive and finite quantities. Let the distribution

function, F be continuous and symmetric about zero and have finite Fisher information,

I(f) =

Z

−∞

{f0(x)/f(x)}2dF(x), (2.6)

wheref0(x) = (d/dx)f(x) = (d2/dx2)F(x).Assume that

(i) there exists finite constants ¯c and C?(>0) such that

lim

n→∞¯cn= ¯c and nlim→∞n

1C?

n2 =C?2 (2.7)

with

¯

cn=n−1 n X

i=1

ci and Cn?2 = n X

i=1

c2i −n¯c2n (2.8)

both exist.

(ii) the ci’s are all bounded, so that by (i),

max

1≤i≤n(ci−¯cn)

2/C?

n2 0, as n→ ∞. (2.9)

Let ψ:< → <be nondecreasing and skew symmetric score function. For any real numbers a

and b, consider the statistics below

Mn1(a, b) =

n X

i=1

ψ(Xi−a−bci), (2.10)

Mn2(a, b) =

n X

i=1

ciψ(Xi−a−bci). (2.11)

Ifβ is unspecified, the designated test function isφU T

n with the null hypothesisH0? :θ= 0. The

testing for θ involves the elimination of the nuisance parameter β. It follows that Mn2(0, b)

is decreasing if b is increasing (Jurˇeckov´a and Sen, 1996, p.85) and under local hypothesis,

H0(1):β= 0, Mn2(0,0) has expectation 0. Then let

˜

β = (sup{b:Mn2(0, b)>0}+ inf{b:Mn2(0, b)<0})/2.

Then ˜β is a translation invariant and robust estimator ofβ.

We consider the test statisticTU T

n =Mn1(0˜) where under H0?, asn→ ∞,

TU T n q

Cn(1)Sn(1)

2 →N(0,1) (2.12)

withCn(1) =n−nc2n/

P

c2

i =nCn?2/(Cn?2+n¯c2n) and [Sn(1)]2 =

P

ψ2(x

i−βc˜ i)/n.

If β = 0, the designated test function is φRT

n for testing the null hypothesis H0? : θ = 0.

The proposed test statistic isTRT

n =Mn1(0,0).Note that for large n, underH0:θ= 0, β= 0,

(5)

whereσ2 0 =

R

−∞ψ2(x)dF(x) (see Sen, 1982, eq 3.7).

For the preliminary test on the slope, the test function, φP T

n is designed to test the null

hypothesis H0(1):β = 0. The proposed test statistic isTP T

n =Mn2(˜θ,0) where

˜

θ= (sup{a:Mn1(a,0)>0}+ inf{a:Mn1(a,0)<0})/2

is a robust estimator. UnderH0(1), asn→ ∞,

TP T n q

Cn(3)Sn(3)

2 →N(0,1), (2.14)

whereCn(3) =

P

c2

i −n¯c2n=Cn?2 and [Sn(3)]2 =

P

ψ2(x

i−θ˜)/n.

The consistency of [Sn(1)]2, [Sn(2)]2 =

P

ψ2(x)/n and [Sn(3)]2 as estimators of σ02 follows by

law of large number (Jurˇeckov´aand Sen, 1981).

Now, we are in a position to formulate a test function φP T T

n to test H0? :θ= 0 following

a preliminary test on β.First, we consider the case where all ofφ(nj), j= 1,2,3 are one-sided

test. Let us choose positive numbers αj (0< αj <1) and real values `n,α(j)j, j = 1,2,3, such

that for large sample size,

P[TnU T > `U Tn,α1|H0? :θ= 0] =α1, (2.15)

P[TnRT > `RTn,α2|H0 :θ= 0, β= 0] =α2, (2.16)

P[TnP T > `P Tn,α3|H0(1) :β = 0] =α3, (2.17)

where`(n,αj)j is the critical value ofT

(j)

n at theαj level of significance. Let Φ(x) be the standard

normal cumulative distribution function, then

Φ(τα) = 1−α, for 0< α <1. (2.18)

Using equations (2.12)(2.18), as n→ ∞ we have

n−1/2`RT n,α2

q

S(2)n

2 →τα2 =

n−1/2`RT n,α2

p

σ2 0

(say). (2.19)

n−1/2`U T n,α1

q

Sn(1)

2

Cn(1)/n

→τα1 =

n−1/2`U T n,α1

q

σ2

0C?2/(C?2+ ¯c2)

(say), (2.20)

n−1/2`P T n,α3

q

Sn(3)

2

C? n2/n

→τα3 =

n−1/2`P T n,α3

q

σ2 0C?2

(say), (2.21)

where Sn(1)

2

=Pψ2(x

i−βc˜ i)/n, Cn(1) =n−nc2n/ P

c2

i, Sn(3)

2

=Pψ2(x

i−θ˜)/n, and Cn?2 = P

c2

i −n¯c2n.

Now we may write

(6)

as the test function for testing H0?:θ= 0 after a pre-test on β.Note thatI(A) stands for the

indicator function of the setA.It takes value 1 if Aoccurs, otherwise 0. The function enables

us to define the power of the testφP T T

n , that is given by

ΠP T Tn (θ) = E(φP T Tn )

= P[TnP T ≤`P Tn,α3, TnRT > `RTn,α2] +P[TnP T > `P Tn,α3, TnU T > `U Tn,α1]. (2.23)

In general, the power of the test φP T T

n depends onα1, α2, α3, θ, nas well as β.Note that the

size of the ultimate test αP T Tn is a special case of the power of the test whenθ= 0.Since the

nuisance parameterβ is unknown, but, suspected to be close to 0, it is of interest to study the

dependence of bothαP T T

n and ΠP T Tn (θ) on β (close to 0).

3

Asymptotic properties for UT, RT and PTT

Let{Kn}be a sequence of alternative hypotheses, where

Kn: (θ, β) = (n−1/2λ1, n−1/2λ2), (3.1)

with λ1, λ2 are (fixed) real numbers. The following results follows directly from Yunus and

Khan (2007) [and hence, their proofs are omitted]: Under{Kn},for large sample,

n−1/2

"

TU T n

TP T n

#

∼N2

γλ1C?2

C?2c2

γλ2C?2

!

, σ20

à C?2

C?2c2 cC¯ ?2

C?2c2

¯cC?2

C?2c2 C? 2

!#

(3.2)

and

n−1/2

"

TRT n

TP T n

#

∼N2

γ(λ1+λ2c¯)

γλ2C?2

!

, σ20

Ã

1 0

0 C?2

!#

. (3.3)

Define d(q1, q2 :ρ) to be the bivariate normal probability integral for random variablesx and

y,

d(q1, q2;ρ) = 1

2π(1−ρ2)1/2

Z

q1

Z

q2

exp

½

(x2+y22ρxy)

2(1−ρ2)

¾

dxdy, (3.4)

where q1, q2 are real numbers and 1 < ρ < 1. Here d(q1, q2;ρ) is the complement of the

cumulative density function of standard bivariate normal variable.

Using equations (2.18)(2.21), (3.2)(3.4), the power function for the PTT, under{Kn},

is

ΠP T Tn (λ1, λ2) =E(φnP T T|Kn)ΠP T T(λ1, λ2)

= Φ(τα3 −γλ2C?/σ0)[1Φ(τα2 −γ(λ1+λ2c¯)0)] +

d(τα3 −γλ2C

?

0, τα1−γλ1

q

(7)

Similarly, the power function for the RT and UT are respectively

ΠRT(λ1, λ2) = 1Φ(τα2−γ(λ1+λc)0) (3.6)

and

ΠU T(λ1, λ2) = 1Φ(τα1 −γλ1

q

C?2/(C?2+ ¯c2)0) (3.7)

using equations (2.18)(2.20).

4

Asymptotic comparison

The below results are obtained by Yunus and Khan (2007): For ¯c > 0, α1 = α2 = α and

λ20,it is easy to show thatλ1+λc≥λ1

q

C?2/(C?2+ ¯c2).Thus,

ΠRT(λ

1, λ2)ΠP T T(λ1, λ2)

ΠRT(λ1, λ2)>ΠU T(λ1, λ2)

ΠU T(λ

1, λ2)ΠP T T(λ1, λ2) =<

> 0 if B <

=

> |A|whereA= Φ(τα−γ(λ1+λc)0)Φ(τα−

γλ1

q

C?2/(C?2+ ¯c2)0) and B =d(τα

3 −γλ2C?/σ0, τα−γ(λ1+λc)0; 0)−d(τα3

γλ2C?

0, τα−γλ1

q

C?2/(C?2+ ¯c2) 0;¯c/

p

C?2+ ¯c2 ).

Let α1 =α2=α, c >¯ 0, λ2 0 we find ΠRT(0, λ2) = 1Φ(τα−γλ2c/σ¯ 0)1Φ(τα) =

ΠU T(0, λ

2) from equations (3.6) and (3.7). Obvious that if the inverse of the coefficient of

variation that is ¯c/σ0 decreases, ΠRT(0, λ

2) decreases but ΠU T(0, λ2) fixed at α.

The size of the PTT is always smaller than that of the RT and hence it approaches the size

of the UT (that is constant at α) as the coefficient of variation increases.

The analytical results in this section is accompanied with an illustrative example in inves-tigating the comparison of the power of the tests discussed in the next section. The power of

the tests at any value other thanθ= 0 is also considered in the example to study the behavior

of the power functions corresponds to the probabilities of type I and type II errors.

5

Illustrative Example - Power Comparison

For this illustrative example, the random errors of the simple linear model are generated from

Normal distribution with mean 0 and variance 1. The sample size is n = 30. Four sets of

values: 0 and 1 with ratio 1:9, 3:7, 5:5, 7:3 are considered as the values of the regressor

ci, i= 1,2, . . . ,30.These values guarantee ¯cn2/Cn?2 to be 0.300, 0.078, 0.033, 0.014. Note that

as the coefficient of variation (σ0/¯c) increases, ¯cn2/Cn?2 decreases.

In this example, the ψfunction is taken as Huber ψ function (Hoaglin et al., 1983, p.366,

Wilcox, 2005, p.77), is defined as

ψh(ui) = (

ui if |ui| ≤k

(8)

0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 (a) λ2 λ1= 0

0 2 4 6 8 10

0.0 0.2 0.4 0.6 0.8 1.0 (e) λ2 λ1= 3

0 2 4 6 8 10

0.0 0.2 0.4 0.6 0.8 1.0 (b) λ2 λ1= 0

0 2 4 6 8 10

0.0 0.2 0.4 0.6 0.8 1.0 (f) λ2 λ1= 3

0 2 4 6 8 10

0.0 0.2 0.4 0.6 0.8 1.0 (c) λ2 λ1= 0

0 2 4 6 8 10

0.0 0.2 0.4 0.6 0.8 1.0 (g) λ2 λ1= 3

0 2 4 6 8 10

0.0 0.2 0.4 0.6 0.8 1.0 (d) λ2 λ1= 0

0 2 4 6 8 10

0.0 0.2 0.4 0.6 0.8 1.0 (h) λ2 λ1= 3

Figure 1: Graphs of power functions for increasing λ2 with α1 = α2 = α3 = α = 0.05

and n = 30 for selected values of λ1. Dotted line, solid line and line with star represent

ΠU T(λ

1, λ2), ΠRT(λ1, λ2) and ΠP T T(λ1, λ2) respectively. Here ¯cn2/Cn?2 = 0.300 for graphs

(a) & (e), ¯cn2/C?

n2 = 0.078 for graphs (b) & (f), ¯cn2/Cn?2 = 0.033 for graphs (c) & (g) and

¯

(9)

whereui =Xi−θ−βci.As suggested in many reference books (Wilcox, 2005, p.76), the value

of k = 1.28 is chosen because k= 1.28 is the 0.9 quantile of a standard normal distribution,

there is a 0.8 probability that a randomly sampled observations will have a value between−k

and k(Wilcox, 2005, p.76). The estimate forσ0 is taken to be

P

ψ(u)2/n.The estimate forγ

isPψ0(u)/n(Caroll and Rupert, 1988, p.212) where

ψ0(u) =

(

1 if |u| ≤1.28

0 if |u|>1.28.

The ΠU T, ΠRT and ΠP T T are calculated using the formulas given by equations (3.5), (3.6)

and (3.7). The R-package (mvtnorm) is used in computing the bivariate Normal probability integral.

The size of the UT, RT and PTT are plotted against λ2 in Figures 1(a)(d) while power

of the test at λ1 = 3 are plotted againstλ2 in Figures 1(e)(h) for selected values of ¯cn2/Cn?2.

We desire the size of a particular test to be small and power of the test to be large so that

probability of type I and II errors are both small. As λ2 grows larger, the size and power of

the RT grows larger and approaches 1 (see Figure 1(a) & (e)). Observe also the size of the RT

approaches 1 faster for larger values of ¯cn2/Cn?2 as λ2 grows larger. From Figures 1(a)(d),

the size of the UT is constant atα= 0.05 and does not depend onλ2 and ¯cn2/Cn?2.The power

of the UT though constant regardless the values of λ2, it increases as a smaller ¯cn2/C?

n2 is

selected (see Figures 1(e)(h)). Starting at a value that is slightly less thanα,the size of the

PTT increases before drops and converges to the nominal size α = 0.05 as λ2 grows larger.

The size of the PTT is small when ¯cn2/Cn?2 is small and large for larger ¯cn2/Cn?2 but it is

always less than that of the RT. The power of the PTT is larger for smallerλ2and it decreases

to the power of the UT as λ2 grows larger. For a larger value of λ2, the power of the PTT is

increasing as ¯cn2/Cn?2 is decreasing. The PTT has smaller size and larger power for a small

¯

cn2/C?

n2 than a larger one.

It is impossible to obtain a test that uniformly minimizes the size and maximizes the power at the same time. We are looking for a test that is a compromise between minimizing the size and maximizing the power (small probabilities of type I and type II errors). The RT is the

best choice for its largest power but the worst choice for its largest size as λ2 grows larger.

On the contrary, the UT is the best choice for its smallest size but the worst choice for its smallest power. Both RT and UT uniformly minimize or maximize the size and power at the

same time. The PTT has larger power than the UT for small and moderate values of λ2 and

it has significantly smaller size than that of the RT for moderate and large λ2. Therefore, if

(10)

ACKNOWLEDGEMENTS

The authors thankfully acknowledge valuable suggestions of Professor A K Md E Saleh, Car-leton University, Canada that helped improve the content and quality of the results in the paper.

References

Bechhofer, R.E. (1951). The effect of preliminary test of significance on the size and power of certain tests of univariate linear hypotheses. Ph.D. Thesis (unpublished), Columbia Univ. Bozivich, H., Bancroft, T.A. and Hartley, H. (1956). Power of analysis of variance test

proce-dures for certain incompletely specified models. Ann. Math. Statist. 27, 1017 - 1043.

Caroll, R.J. and Rupert, D. (1988). Transformation and Weighting in Regression. Chapman

& Hall, US.

Hoaglin, D.C., Mosteller, F. and Tukey, J.W. (1983). Understanding Robust and Explanatory

Data Analysis. John Wiley and Sons, US.

Huber, P.J. (1981). Robust Statistics. Wiley, New York.

Jurˇeckov´a, J. and Sen, P.K. (1981). Sequential procedures based on M-estimators with

dis-continuous score functions. J. Statist. Plan. Infer. 5, 253-66.

Jurˇeckov´a, J. and Sen, P.K. (1996). Robust Statistical Procedures Asymptotics and

Interrela-tions. John Wiley & Sons, US.

Khan, S., and Saleh, A.K.Md.E. (2001). On the comparison of pre-test and shrinkage

estima-tors for the univariate normal mean. Stat. papers. 42, 451-473.

Khan, S., Hoque, Z., and Saleh, A.K.Md.E. (2002). Estimation of the slope parameter for

linear regression model with uncertain prior information. J. Stat. Res. 36, 55-74.

Saleh, A.K.Md.E. and Sen, P.K. (1982). Non-parametric tests for location after a preliminary

test on regression. Communications in Statistics: Theory and Methods. 11, 639-651.

Saleh, A.K.Md.E. (2006). Theory of Preliminary test and Stein-type estimation with

applica-tions. John Wiley & Sons, New Jersey.

Sen. P.K. (1982). On M-tests in linear models. Biometrika. 69, 245-248.

Yunus, R.M. and Khan, S. (2007). Increasing power of the test through pre-test - a robust method. USQ Working Paper Series SC-MC-0713.

Wilcox, R.R. (2005). Introduction to Robust Estimation and Hypothesis Testing. Elsevier Inc,

References

Related documents

The discussion is exhaustive, well balanced, and clearly emphasizes the importance of social, historical, and cultural practices on the welfare of a child.

Page | 45 Figure 14: Comparison of hard, consolidated (Σ Acropora, reef, hardground, reef flat, algal ridge) benthic class percentages versus soft, unconsolidated (Σ sand, sparse

Some examples are dashed frames (which represent pending or suspected location of units) and dashed ellipses (which can be used to represent a seize action.)

wealthy country, for one thing; to ensure preventive measures against clearly preventable diseases, for an- other; especially to offset the racially discriminatory patterns of

Aerobic performance (250-kJ self-paced cycling time trial), and glucose, insulin and NEFA responses to a 75-g oral glucose load (oral glucose tolerance test; OGTT) were

Asst Finance Executive Consultant-SKP Cross Consultant Head Finance CFO Director - Finance Technical Head Accounts Executive VP-Finance MD Finance Executive Manager

After broadcast, the film Playing Model Soldiers enjoyed further distribution on the independent film festival circuit, with screenings at Liverpool’s Black Film Festival,