2.1 Problem setting
Let
X X
1,
2,X
3,...
be a sequence ofi i d. . .
random variables from a continuous distribution function.; ,
F
, where the two parameters, the mean and the variance are assumed unknown but finite. We also assume that the skewness
and the kurtosis are both unknown but finite
and
. The main interest in this study is estimation of
in the presence of the unknown variance
.Having observed a random sample
X X
1,
2,...,X
n, n 2 from the distribution functionF.; ,
, we propose to use the sample mean and the sample variance as point estimates of
and
respectively: 1 1 n n i i X n X
and 2
1
2 1 1 n n i n i S n X X
, n
2.
It is well-known that these estimators are unbiased for their respective parameters and minimal sufficient statistics in the case of the normal distribution.
In the literature on sequential sampling for inference of the mean for most distributions it is assumed that the sample size required to satisfy the conditions (i), (ii) and (iii) in section 1.1.1 can take the general form (2.1) below; see Sen (1985) and Ghosh et al.(1997) for details.
*
2.1
n
g
,
where
depends on some predetermined constants (which may, for example, appear in a loss (cost) function incurred in point estimation of
or arise from consideration of a fixed width confidence interval for
with a prescribed coverage probability). Further,
is permitted to tend to infinity if the optimal sample sizen* . Note thatg
is a positive real valued twice continuously differentiable function such thatg g,
'andg
''are bounded. We shall use the representation 2.1
in this thesis to develop theory for point and interval estimation and for hypothesis testing.2.2 Triple sampling procedure for inference for the population mean
Since
n
*in 2.1
is numerically unknown because
is unknown, then no fixed sample size procedure provides the above point estimation for
uniformly for . Therefore, we use 0 sequential sampling procedures to estimate
via estimation of the optimal sample sizen
*. We nowgive a rigorous account of the triple sampling procedure as described by Hall (1981). As the name suggests, triple sampling can be described by the three phases.
Pilot Phase: Here an initial sample
X
1,X
2,...,X
mof size m2 is taken at random from the distributionF.; ,
, from which Xm and Sm2 are calculated as our initial estimates for
and
respectively.The Main Study Phase: In the main study phase a fraction of n*is estimated during that phase, say,
*
,
n
where
is a fixed number between 0 and 1. The sample size required to complete the main study phase is defined by the following stopping rule as
*
2
*
1 1 1
2.2
N
g S
m1,
N
max
m N,
.
where
0
1
and0
are known constants and x
denotes the integer part ofx
. Observe thatN
1 estimates
n* in this phase.Ifm N1*, stop sampling at this stage; otherwise continue to observe an extra random sample of size
1
N
m
from the distribution functionF.; ,
, say1
1, 2,...,
m m N
X X X . Hence, we augment the
1
N
m
observations by the previousm
observations and calculate1 N
X
and 2 1 NS
as new estimates of
and
respectively.The fine tuning phase: This is defined according to the following final stage stopping rule
*
21
*
1
2.3 N g SN 1, N max N N, .
If *
1
N N , then stop at this stage, otherwise continue to sample
N
N
1 more observations randomly from the distributionF.; ,
, say1 1, 1 2, ,
N N N
X X X . Whenever sampling is terminated and
N
is realized, then XN is a natural point estimate for
, and hence XNis a sequential point estimator of
. Observe thatN
estimates n*in this phase.Throughout the thesis, the asymptotic characteristics of triple sampling are developed under the assumption made by Hall (1981) that for
0
1
and
positive, we assume that 2.4
n
,mm n
, limsupm n
and
O m
s , s1.where s1is a fixed constant. Moreover we assume thatE X1 6 . This moment condition is the same as that used by Chow and Martinsek (1982) to estimate the mean of an unknown distribution using the one-by-one purely sequential procedure proposed by Robbins (1959). Although the assumption E X1 6 may seem restrictive, but we shall show in the next chapter that second order approximations of a continuously differentiable function of the stopping sample sizes
N
1andNdepend on the first four moments of the distribution in order to capture both the skewness and kurtosis of the underlying distribution. Therefore it is clear that one needs more than the first four moments to be finite to obtain such approximations and ensure that the corresponding error terms tend to zero. Also, it is necessary to put lower and upper bounds on
in order to ensure thatg
and its first two derivatives are bounded in the inferential situations we shall investigate later in the thesis.Lemma 2.1 (Hall 1981)
For the triple sampling rule (2.2) – (2.3) as
m
we have
* 1 1 * 2 1exp
,
,
0,
10.
qP N
N
O
km
P N
N
o m
k
q
See Honda (1992) for the proof.
Lemma 2.1 shows essentially that the probability of not completing all three stages is small for large values of m.
Remarks
1. Mukhopadhyay (1990) noted that if the design factor
is chosen near zero or one, then a three stage procedure would clearly be rather like Stein’s two stage procedure. Therefore a three stage procedure is better implemented with
0.4, 0.5 or 0.6. Hall (1981) mentioned that in practice it seems a reasonable compromise to choose
0.5.2. In the context of two stage sampling Seelbinder (1953) and Moshman (1958) developed some criteria based on prior information about the variance in order to suggest a reasonable choice of the pilot sample size
m
, while Mukhopadhyay (2005a) developed an information-based approach to suggest a reasonable choice ofm
without any prior information about the variance.Notes
1. The objective of Hall (1981) was to construct a fixed width confidence interval for the normal mean with a prescribed width
d0
and coverage1
without any concern about the estimate of the optimal sample sizen*. Moreover, he used only the first order approximation of the stopping sample sizeN
.2. Mukhopadhyay et al. (1987) treated the same situation as Hall (1981) but they considered point estimation of the normal mean to achieve the minimum bounded risk.
In our thesis we deal with inference (point and interval estimation and hypothesis testing) about the mean of any continuous underlying distribution whose analytical form is unknown and for any positive twice continuously differentiable and bounded function of
. We use the second order approximations of a suitable continuously differentiable function of the stopping sample sizes N1andN in order to evaluate the asymptotic regret in terms of the first four moments of the underlying distribution.