Statement of the Problem - Robustness of Triple Sampling Inference Procedures toUnderlying Dis

2.1 Problem setting

Let

X X

₁

,

₂

,X

₃

,...

be a sequence of

i i d. . .

random variables from a continuous distribution function

.; ,



F

 

, where the two parameters, the mean  and the variance  are assumed unknown but finite. We also assume that the skewness



and the kurtosis  are both unknown but finite



 

and



 . The main interest in this study is estimation of



in the presence of the unknown variance



Having observed a random sample

X X

₁

,

₂

,...,X

_n,  n 2 from the distribution function

F.; , 

, we propose to use the sample mean and the sample variance as point estimates of



and



respectively: 1 1 n n i i X n X  



and 2









2 1 1 n n i n i S n  X X   



 ,

 n

2.

It is well-known that these estimators are unbiased for their respective parameters and minimal sufficient statistics in the case of the normal distribution.

In the literature on sequential sampling for inference of the mean for most distributions it is assumed that the sample size required to satisfy the conditions (i), (ii) and (iii) in section 1.1.1 can take the general form (2.1) below; see Sen (1985) and Ghosh et al.(1997) for details.

 

2.1 n

g



,

where



depends on some predetermined constants (which may, for example, appear in a loss (cost) function incurred in point estimation of



or arise from consideration of a fixed width confidence interval for



with a prescribed coverage probability). Further,



is permitted to tend to infinity if the optimal sample sizen*  . Note that

g 

is a positive real valued twice continuously differentiable function such that

g g,

'and

g

''are bounded. We shall use the representation

 2.1

in this thesis to develop theory for point and interval estimation and for hypothesis testing.

2.2 Triple sampling procedure for inference for the population mean

Since

n

*in

 2.1

is numerically unknown because



is unknown, then no fixed sample size procedure provides the above point estimation for



uniformly for  . Therefore, we use 0 sequential sampling procedures to estimate



via estimation of the optimal sample size

n

*. We now

give a rigorous account of the triple sampling procedure as described by Hall (1981). As the name suggests, triple sampling can be described by the three phases.

Pilot Phase: Here an initial sample

X

₁

,X

₂

,...,X

_mof size m2 is taken at random from the distribution

F.; , 

, from which X_m and S_m2 are calculated as our initial estimates for



and



respectively.

The Main Study Phase: In the main study phase a fraction of n*is estimated during that phase, say,



where



is a fixed number between 0 and 1. The sample size required to complete the main study phase is defined by the following stopping rule as

 





1 1 1

2.2 N

_ g S

_1,

N

max

m N,

.

where

0 

1

and

0  

are known constants and

 x

denotes the integer part of

x

. Observe that

N

₁ estimates



n* in this phase.

Ifm N₁*, stop sampling at this stage; otherwise continue to observe an extra random sample of size

N

m

from the distribution function

F.; , 

, say

1, 2,...,

m m N

X _ X _ X . Hence, we augment the

N

m

observations by the previous

m

observations and calculate

1 N

X

and 2 1 N

S

as new estimates of



and



respectively.

The fine tuning phase: This is defined according to the following final stage stopping rule

 

2₁





2.3 N g S_N 1, N max N N, .

If *

N  N , then stop at this stage, otherwise continue to sample

N

N

₁ more observations randomly from the distribution

F.; , 

, say

1 1, 1 2, ,

N N N

X _ X _  X . Whenever sampling is terminated and

N

is realized, then X_N is a natural point estimate for



, and hence X_Nis a sequential point estimator of



. Observe that

N

estimates n*in this phase.

Throughout the thesis, the asymptotic characteristics of triple sampling are developed under the assumption made by Hall (1981) that for

0 

1

and



positive, we assume that

 2.4

n



 ,mm n 



 , limsupm n





and



O m

 

s , s1.

where s1is a fixed constant. Moreover we assume thatE X₁ 6 . This moment condition is the same as that used by Chow and Martinsek (1982) to estimate the mean of an unknown distribution using the one-by-one purely sequential procedure proposed by Robbins (1959). Although the assumption E X₁ 6  may seem restrictive, but we shall show in the next chapter that second order approximations of a continuously differentiable function of the stopping sample sizes

N

₁and

Ndepend on the first four moments of the distribution in order to capture both the skewness and kurtosis of the underlying distribution. Therefore it is clear that one needs more than the first four moments to be finite to obtain such approximations and ensure that the corresponding error terms tend to zero. Also, it is necessary to put lower and upper bounds on



in order to ensure that

g 

and its first two derivatives are bounded in the inferential situations we shall investigate later in the thesis.

Lemma 2.1 (Hall 1981)

For the triple sampling rule (2.2) – (2.3) as

m 

we have











 



* 1 1 * 2 1

exp

,

0,

10. P N

N

O

km

P N

N

o m

 

k

q















See Honda (1992) for the proof.

Lemma 2.1 shows essentially that the probability of not completing all three stages is small for large values of m.

Remarks

1. Mukhopadhyay (1990) noted that if the design factor



is chosen near zero or one, then a three stage procedure would clearly be rather like Stein’s two stage procedure. Therefore a three stage procedure is better implemented with



0.4, 0.5 or 0.6. Hall (1981) mentioned that in practice it seems a reasonable compromise to choose



0.5.

2. In the context of two stage sampling Seelbinder (1953) and Moshman (1958) developed some criteria based on prior information about the variance in order to suggest a reasonable choice of the pilot sample size

m

, while Mukhopadhyay (2005a) developed an information-based approach to suggest a reasonable choice of

m

without any prior information about the variance.

Notes

1. The objective of Hall (1981) was to construct a fixed width confidence interval for the normal mean with a prescribed width

d0

and coverage

1

without any concern about the estimate of the optimal sample sizen*. Moreover, he used only the first order approximation of the stopping sample size

N

2. Mukhopadhyay et al. (1987) treated the same situation as Hall (1981) but they considered point estimation of the normal mean to achieve the minimum bounded risk.

In our thesis we deal with inference (point and interval estimation and hypothesis testing) about the mean of any continuous underlying distribution whose analytical form is unknown and for any positive twice continuously differentiable and bounded function of



. We use the second order approximations of a suitable continuously differentiable function of the stopping sample sizes N₁and

N in order to evaluate the asymptotic regret in terms of the first four moments of the underlying distribution.

Chapter III

In document Robustness of Triple Sampling Inference Procedures to Underlying Distributions (Page 34-38)

Statement of the Problem

2.1 Problem setting

X X

,

,X

,...

i i d. . .

.; ,



F

 





 







X X

,

,...,X

F.; , 

















 n

2.

 

 

2.1

n

g



,









g 

g g,

g

 2.1

2.2 Triple sampling procedure for inference for the population mean

n

 2.1







n

X

,X

,...,X

F.; , 









 

 





2.2

N

 g S

1,

N

max

m N,

.

0 

1

0  

 x

x

N



_ g S

_1,