Online Full Text

(1)

Abstract— The purpose of this paper is to develop technique

appropriate for construction of frequentist prediction intervals or prediction bounds for order statistics in future samples when only the functional form of the underlying distribution is specified, but some or all of its parameters are unspecified. Prediction intervals for order statistics are widely used for reliability problems and other related problems. The determination of these intervals has been extensively investigated. But the optimal properties of these intervals in the sense of minimal area for a given probability content or maximal probability content for a given area have not been fully explored. In this paper, in order to discuss this problem, a specific loss function is introduced to compare prediction intervals. In particular, within-sample prediction based on the early observed data from a current experiment (i.e., when for predicting the future observation in a sample there are available the early observed data only from that sample), new-sample prediction based on a previous new-sample (i.e., when for predicting the future observation in a new sample there are available the data only from a previous sample), and new-within-sample prediction based on both the early-failure data from that sample and the data from a previous sample (i.e., when for predicting the future failure time of an unit in a new sample there are available both the early-failure data from that sample and the data from a previous sample). We restrict attention to families of distributions invariant under location and/or scale changes. The technique used here for optimization of prediction intervals based on censored data emphasizes pivotal quantities relevant for obtaining ancillary statistics. It allows one to solve the optimization problems in a simple way. Illustrative examples are given.

Index Terms—Order statistic, prediction interval, loss

function, optimization

I. INTRODUCTION

REDICTION is perhaps one of the most commonly undertaken activities in the physical, the engineering, and the biological sciences. In the econometric and the social sciences, prediction generally goes under the name of

Manuscript received October 7, 2011. This work was supported in part by Grant No. 06.1936, Grant No. 07.2036, Grant No. 09.1014, and Grant No. 09.1544 from the Latvian Council of Science and the National Institute of Mathematics and Informatics of Latvia.

N.A. Nechval is with the Statistics Department, EVF Research Institute, University of Latvia, Riga LV-1050, Latvia (e-mail: [email protected]).

M. Purgailis is with the University of Latvia, Riga LV-1050, Latvia (e-mail: [email protected]).

K.N. Nechval is with the Applied Mathematics Department, Transport and Telecommunication Instutute, Riga LV-1019, Latvia (e-mail: [email protected]).

V.F. Strelchonok is with the Baltic International Academy, Riga LV-1019, Latvia (e-mail: [email protected]).

forecasting, and in the actuarial and the assurance sciences under the label life-length assessment. Automatic process control, filtering, and quality control, are some of the engineering techniques that use prediction as a basis of their modus operandus. Statistical techniques play a key role in prediction, with regression, time series analysis, and dynamic linear models (also known as state space models) being the predominant tools for producing forecasts. The importance of statistical methods in forecasting was underscored by Pearson [1] who claimed that prediction is the “fundamental problem of practical statistics.” Patel [2] provides an extensive survey of literature on this topic. In the areas of reliability and life-testing, lifetime data are often modeled via the Exponential and the Weibull in order to make predictions about future observations. Prediction intervals are constructed to have a reasonably high probability of containing a specified number of such future observations. Interval prediction is an important part of the forecasting process aimed at enhancing the limited accuracy of point estimation. An interval forecast usually consists of an upper and a lower limit between which the future value is expected to lie with a prescribed probability. The limits are sometimes called prediction limits or prediction bounds. These limits may be helpful in establishing warranty policy, determining maintenance schedules, etc. For a very readable discussion of prediction limits and related intervals, see Hahn and Meeker [3].

Many authors have reported their efforts for constructing prediction limits for the Weibull and for the related extreme value distributions (see Patel [2]). Mann and Saunders [4] proposed prediction limits for the Weibull which make use of only two or three order statistics (see also Mann [5]). Antle and Rademaker [6] used simulation to produce a table of factors to use with ML estimates to obtain prediction limits. Lawless [7] proposed prediction limits based on a conditional confidence approach; his limits require both determination of the ML estimates and numerical integration. Engelhardt and Bain [8-9], and Fertig, Meyer and Mann [10] have proposed various approximate prediction limits for the Weibull. Mee and Kushary [11] provided a simulation based procedure for constructing prediction intervals for Weibull populations for Type II censored case. This procedure is based on maximum likelihood estimation and requires an iterative process to determine the percentile points. Bhaumik and Gibbons [12], and Krishnamoorthy et al. [13] proposed approximate methods for constructing upper prediction limits for a gamma distribution.

Consider the following examples of practical problems

Optimal Predictive Inferences for Future Order

Statistics via a Specific Loss Function

Nicholas Nechval,

Member, IAENG

, Maris Purgailis, Konstantin Nechval, and Vladimir Strelchonok

P

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(2)

which often require the computation of prediction bounds and prediction intervals for future values of random quantities: (i) a consumer purchasing a refrigerator would like to have a lower bound for the failure time of the unit to be purchased (with less interest in distribution of the population of units purchased by other consumers); (ii) financial managers in manufacturing companies need upper prediction bounds on future warranty costs; (iii) when planning life tests, engineers may need to predict the number of failures that will occur by the end of the test to predict the amount of time that it will be take for a specified number of units to fail.

Some applications require a two-sided prediction interval that will, with a specified high degree of confidence, contain the future random variable of interest. It is important to note that in the context of this paper, a prediction interval is not to be viewed as a confidence interval. The former is an estimate of a future observable value; the latter an estimate of some fixed but unknown (and often unobservable) parameter. Prediction intervals are produced via frequentist or Bayesian methods, whereas confidence intervals can only be constructed via a frequentist argument. The discussion of this paper revolves around prediction intervals produced by a frequentist approach; thus we are concerned here with frequentist prediction intervals. In many applications, however, interest is focused on either an upper prediction bound or a lower prediction bound (e.g., the maximum warranty cost is more important than the minimum, and the time of the early failures in a product population is more important than the last ones).

Conceptually, it is useful to distinguish between ‘within-sample’ prediction, ‘within-sample’ prediction, and ‘new-within-sample’ prediction.

For within-sample prediction, the problem is to predict future events in a sample or process based on early data from that sample or process. If, for example, m units are followed until trand there are r observed failures, Y1 < Y2 < ⋅⋅⋅< Yr, one could be interested in predicting the time of the

next failure, Yr+1; time until l additional failures, Yr+l; number

of additional failures in a future interval (t₁,t₂).

For new-sample prediction, data from a previous sample are used to make predictions on a future unit or sample of units from the same process or population. For example, based on previous (possibly censored) life test data, one could be interested in predicting the time to failure of a new unit, time until r failures in a future sample of m units, or number of failures by time t in a future sample of m units.

For new-within-sample prediction, the problem is to predict future events in a sample or process based on the early data from that sample or process as well as on a previous data sample from the same process or population. For example, if m units are followed until trand there are the

r observed failures, Y₁, …, Yr, from a current experiment as

well as the k observed failures, X₁, …, Xk, from a previous

data sample (possibly censored), one could be interested in predicting the time of the next failure Yr+1; time until l

additional failures, Yr+l; number of additional failures in a

future interval (t1,t2).

In general, to predict a future realization of a random

quantity one needs the following:

1) A statistical model to describe the population or process of interest. This model usually consists of a distribution depending on a vector of parameters θθθθ. In this paper, attention is restricted to families of distributions which are invariant under location and/or scale changes. In particular, the case may be considered where a previously available complete or type II censored sample is from a continuous distribution with cdf F((y-µ)/σ), where F(⋅) is known but both the location (µ) and scale (σ) parameters are unknown. For such family of distributions the decision problem remains invariant under a group of transformations (a subgroup of the full affine group) which takes µ (the location parameter) and σ (the scale) into cµ + b and cσ, respectively, where b lies in the range of µ, c > 0. This group acts transitively on the parameter space.

2) Information on the values of components of the parametric vector θθθθ. It is assumed that only the functional form of the distribution is specified, but some or all of its parameters are unspecified. In such cases ancillary statistics and pivotal quantities, whose distribution does not depend on the unknown parameters, are used.

The technique used here for constructing prediction intervals (or bounds) emphasizes pivotal quantities relevant for obtaining ancillary statistics. It represents a simple procedure that can be utilized by non-statisticians, and which provides easily computable explicit expressions for both prediction bounds and prediction intervals. The technique is a special case of the method of invariant embedding of sample statistics into a performance index (see, e.g., Nechval et al. [14-23]) applicable whenever the statistical problem is invariant under a group of transformations, which acts transitively on the parameter space.

II. WITHIN-SAMPLE PREDICTION PROBLEM

For within-sample prediction, the problem is to predict future events in a sample or process based on early data from that sample or process. For example, if m units are followed until Yrand there are r observed failures, Y1, …, Yr,

one could be interested in predicting the time of the next failure Yr+1; time until l additional failures, Yr+l; number of

additional failures in a future interval.

A. Location-Scale Family of Density Functions

Consider a situation described by a location-scale family of density functions, indexed by the vector parameter

θθθθ=(µ,σ), where µ and σ(>0) are respectively parameters of location and scale. For this family, invariant under the group G of positive linear transformations: y→ay+b with a>0, we shall assume that there is obtainable (from some informative experiment) the first r order statistics Y₁<Y₂<⋅⋅⋅ <Yr from a

random sample of size m with cumulative distribution function

, )

, |

( 

     − ≡

σ µ σ

µ F y

y F

. 0 ,

, )

(−∞ µ<y<∞ −∞<µ<∞ σ > (1)

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(3)

If Ys is a future observation (sth order statistic) from the

same sample of size m, then W=(Y−Y_r)/S_r (or W=

r r s Y Y

Y )/

( − ) is an ancillary statistic, the distribution of which does not depend on (µ,σ); Sr is a sufficient statistic

(or a maximum likelihood estimatorσk

) _{) for}_σ _{based on}

Y=(Y₁, …, Yr).

B. Piecewise-Linear Loss Function

We shall consider the interval prediction problem for the sth order statistic Ys, r<s≤m, in the same sample of size m for

the situation where the first r observations Y₁ < Y₂ < ⋅⋅⋅< Yr,

1≤r<m, have been observed. Suppose that we assert that an interval d=(d1,d2) contains Ys. If, as is usually the case, the

purpose of this interval statement is to convey useful information we incur penalties if d1 lies above Ys or if d2

falls below Ys. Suppose that these penalties are c1(d1− Ys)

and c₂(Ys−d2), losses proportional to the amounts by which

Ys escapes the interval. Since c1 and c2 may be different the

possibility of differential losses associated with the interval overshooting and undershooting the true µ is allowed. In addition to these losses there will be a cost attaching to the length of interval used. For example, it will be more difficult and more expensive to design or plan when the interval d=(d₁,d₂) is wide. Suppose that the cost associated with the interval is proportional to its length, say c(d₂−d₁). In the specification of the loss function, σ is clearly a ‘nuisance parameter’ and no alteration to the basic decision problem is caused by multiplying all loss factors by 1/σ. Thus we are led to investigate the piecewise-linear loss function

  

  

 

> −

+ −

≤ ≤ −

< −

+ −

=

). (

) ( ) (

), (

) (

), (

) ( ) (

) , (

2 2

2 1 2

2 1

1 2

1 1

2 1

1

d Y d

Y c d d c

d Y d d

d c

d Y d

d c Y d c

r

s s

s s s

σ σ

σ

σ σ

d

θθθθ (2)

The decision problem specified by the informative experiment distribution function (1) and the loss function (2) is invariant under the group G of transformations. Thus, the problem is to find the best invariant interval predictor of Ys,

), , ( min

arg d

d

d θθθθ

R

D

∈

∗₌ ₍₃₎

where D_{is a set of invariant interval predictors of}_Y_s_, R(θθθθ,d)=E_θθθθ{r(θθθθ,d)} is a risk function.

C. Transformation of the Loss Function

It follows from (2) that the invariant loss function, r(θθθθ,d), can be transformed as follows:

), , ( ) ,

(θθθθd r V ηηηη

r =&& (4)

where

    

> −

+ −

≤ ≤ −

< +

+ − =

), (

)

( ) (

), (

)

(

), (

)

-( ) (

) , (

2 2 1 2

1 2 2 2 1 2

2 2 1 2 1 2

1 2

2 1 1 2

1 2 2 1 1 1

V V V

c V V c

V V V V

c

V V V

c V V c r

η η

ηηηη

V

& &

(5)

V=(V₁,V₂), V₁=(Y_s−Y_r)/σ, V₂=S_r/σ;

ηηηη=(η1,η2), η1=(d1−Yr)/Sr, η2=(d2−Yr)/Sr.

(6) D. Risk Function

It follows from (5) that the risk associated with d and θθθθ can be expressed as

{

( , )

} {

( , )

}

) ,

(θθθθd Eθθθθ r θθθθd Er V ηηηη

R = = &&

∫ ∫

∞

+ − =

0 0

2 1 2 1 2 1 1 1

2 1

) , ( ) (

v

dv dv v v f v v c

η

∫ ∫

∞ ∞

− +

0

2 1 2 1 2 2 1 2

2 2

) , ( ) (

v

dv dv v v f v v c

η

( ) ( , ) ,

0 0

2 1 2 1 2 1

2

∫ ∫

∞ ∞

−

+cη η v f v v dvdv (7)

which is constant on orbits when an invariant predictor (decision rule) d is used, where f(v₁,v₂) is defined by the joint probability density of the first r observations Y1 < Y2 <

⋅⋅⋅< Yr and Ys,

)! ( )! 1 (

! )

, | , ..., , ( 1

s m r s

m y

y y

f _r _s

− − − =

σ µ

s m r

r s r

s F y F y

y

F − −− − −

×[ ( |_µ,_σ) ( |_µ,_σ)] 1[1 ( |_µ,_σ)]

( | , ) ( | , ).

1

σ µ σ

µ s

r

i

i f y

y f

∏

=

× (8)

E. Risk Minimization and Invariant Prediction Rules The following theorem gives the central result in this section.

Theorem 1 (Optimal predictor of Ysbased onY). Suppose

that (u1, u2) is a random vector having density function

( , ) ( , ) ( 1 , 2 0),

1

0 0

2 1 2 1 2 2 1

2 _ >

 

  

∞ ∞ −

∫ ∫

u f u u dudu u u u

u f

u (9)

where f is defined by f(v₁,v₂), and let Q be the probability distribution function of u1/u2.

(i) If c/c₁+c/c₂<1 then the optimal invariant linear-loss interval predictor of Ys based on Y is d*=(Yr+η1Sr, Yr+η2Sr),

where

Q(η₁)=c/c₁, Q(η₂)=1−c/c₂. (10)

(ii) If c/c₁+c/c₂≥1 then the optimal invariant linear-loss interval predictor of Ys based on Y degenerates into a point

predictor Yr+ηSr, where

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(4)

). /( )

( c₂ c₁ c₂

Qη = + (11)

Proof. From (7)

{

}

1

) , (

η ∂ ∂E r&&V ηηηη

∫ ∫

∞ ∞

∞

− =

0 0 2 1 2 1 2 0 0 2 1 2 1 2

1 ( , ) ( , )

2 1

dv dv v v f v c dv dv v v f v c

v

η

( , ) [ ₁ ( ₁) ],

0 0

2 1 2 1

2f v v dvdv cQ c

v −

=∞ ∞

∫ ∫

η (12)

and

{

}

2

) , ( η ∂ ∂E r&&V ηηηη

,] )) ( 1 ( [ ) , (

0

2 2

0

2 1 2 1 2

∫ ∫

∞ ∞

+ −

−

= v f v v dvdv c Qη c (13)

where

η =η

∫

0

, ) ( )

( qw dw

Q (14)

, ) , (

) , ( )

(

0 0

2 1 2 1 2 0

2 2 2 2 2

∫ ∫

∫

∞ ∞ ∞

=

dv dv v v f v

dv v wv f v

w

q (15)

. / 2 1 V V

W= (16)

Now ∂E{r&&(V,ηηηη)}/∂η₁ = ∂E{r&&(V,ηηηη)}/∂η₂ = 0if and only if (10) hold. Thus, E{r&&(V,ηηηη)} provided (10) has a solution with η1<η2 and this is so if 1−c/c2>c/c1. It is easily confirmed that this ηηηη=(η1,η2) gives the minimum value of E{r&&(V,ηηηη)}. Thus (i) is established.

If c/c₁+c/c₂≥1 then the minimum of E{r&&(v,η)} in the

region η2≥η1 occurs where η1=η2=η, η being determined by setting

∂E{r&&(V,(η,η))}/∂η=0 (17)

and this reduces to

, 0 )] ( 1 [ )

( 2

1Qη −c −Qη =

c (18)

which establishes (ii). 

Corollary 1.1 (Minimum risk of the optimal invariant predictor of Ysbased on Y).The minimum risk is given by

{

( , )

}

{

( , )

}

) ,

(θθθθd Eθθθθ r θθθθd E r V ηηηη

R ∗ = ∗ = &&

∫ ∫

∞ ∞

∞

+ −

=

0

2 1 2 1 1 2 0 0

2 1 2 1 1 1

2 2 2

1

) , ( )

, (

v v

η η

(19) for case (i) with ηηηη=(η1,η2) as given by (10) and for case (ii) with η1=η2=η as given by (11).

Proof. These results are immediate from (7) when use is made of ∂E{r&&(V,ηηηη)}/∂η₁ = ∂E{r&&(V,ηηηη)}/∂η₂ = 0 in case (i) and ∂E{r&&(V,(η,η))}/∂η=0 in case (ii). 

The underlying reason why c/c1+c/c2 acts as a separator of interval and point prediction is that for c/c₁+c/c₂≥1 every interval predictor is inadmissible, there existing some point predictor with uniformly smaller risk.

Theorem 2 (Optimal invariant predictor of Ys based on

Yr). Suppose that µ=0 and

V=(V₁,V₂), V₁=(Y_s−Y_r)/σ, V₂=Y_r/σ;

ηηηη=(η1,η2), η1=(d1−Yr)/Yr, η2=(d2−Yr)/Yr.

(20)

Let us assume that (u₁, u₂) is a random vector having density function

( , ) ( , ) ( 1 , 2 0),

1

0 0

2 1 2 1 0 2 2 1 0

2 >

   

  

∞ ∞ −

∫ ∫

u f

u (21)

where f₀ is defined by f₀(v₁,v₂), and let Q₀ be the probability distribution function of u₁/u₂.

(i) If c/c₁+c/c₂<1 then the optimal invariant linear-loss interval predictor of Ys based on Yr is d*=((1+η1)Yr,

(1+η2)Yr), where

. / 1 ) ( Q , / )

( ₁ ₁ ₀ ₂ ₂

0 c c c c

Q η = η = − (22)

(ii) If c/c₁+c/c₂≥1 then the optimal invariant linear-loss interval predictor of Ys based on Yr degenerates into a point

predictor (1+η)Yr, where

Q0(η)=c2/(c1+c2). (23)

Proof. For the proof we refer to Theorem 1. 

Corollary 2.1 (Minimum risk of the optimal invariant predictor of Ysbased on Yr).The minimum risk is given by

{

( , )

}

{

( , )

}

) ,

(θθθθd Eθθθθ r θθθθd E r V ηηηη

R ∗ = ∗ = &&

∫ ∫

∞ ∞

∞

+ −

=

0

2 1 2 1 0 1 2 0 0

2 1 2 1 0 1 1

2 2 2

1

) , ( )

, (

v v

η η

(24)

for case (i) with ηηηη=(η1,η2) as given by (22) and for case (ii) with η1=η2=η as given by (23).

Proof. For the proof we refer to Corollary 1.1. 

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(5)

III. EQUIVALENT CONFIDENCE COEFFICIENT

For case (i) when we obtain an interval predictor for Ys we

may regard the interval as a confidence interval in the conventional sense and evaluate its confidence coefficient. The general result is contained in the following theorem.

Theorem 3 (Equivalent confidence coefficient for d∗ based on Y). Suppose that V=(V₁,V₂) is a random vector having density function f(v₁,v₂) (v₁,v₂>0) where f is defined by (8) and let H be the distribution function of W=V₁/V₂, i.e., the probability density function of W is given by

( ) ( 2, 2) 2.

0

2f wv v dv v

w

h

∫

∞

= (25)

Then the confidence coefficient associated with the optimum prediction interval d*=(d₁,d₂), where d₁=Yr+η1Sr,

d₂=Yr+η2Sr, is

} , | :

{

Pr d∗ d₁<Ys<d₂ µσ

)]. / ( [ )] / 1 (

[Q 1 c c2 H Q 1 c c1

H − − − −

= (26)

Proof. The confidence coefficient for d∗ corresponding to (µ,σ) is given by

} , | :

) ,

Pr{(Yr Sr Yr+η1Sr <Ys<Yr+η2Sr µσ

} /

: ) ,

Pr{( 1 2 η1< 1 2 <η2

= v v v v

)]. / ( [ )] / 1 ( [ ) ( )

( 2 H 1 HQ 1 c c2 H Q 1 c c1

H − = − − − −

= η η

(27)

This is independent of (µ,σ). 

Theorem 4 (Equivalent confidence coefficient for d∗ based on Yr). Suppose that V=(V1,V2) is a random vector

having density function f0(v1,v2) (v1 real, v2>0), where f0 is defined by

) 1 ,

( ) , (

1 )

, | , (

+ − Β − Β =

s m s r s r y

y

f _r _s µσ

1 1_[ ₍ _| _, ₎ ₍ _| _, _)] )]

, | (

[ − ₋ − −

× s r

r s

s

r F y F y

y

F µσ µσ µσ

) , | ( ) , | ( )] , | ( 1

[ µσ r µσ s µσ

s m

s f y f y

y

F −

−

× , (28)

and let H₀ be the distribution function of W=V₁/V₂, i.e., the probability density function of W is given by

( ) 0( 2, 2) 2. 0

2

0 w v f wv v dv

h

∫

∞

= (29)

Then the confidence coefficient associated with the optimum prediction interval d* = (d₁,d₂), where d₁= (1+η₁)Yr, d2 =

(1+η2)Yr, is

} , | :

{

Pr d∗ d₁<X_r <d₂ µσ

)]. / ( [ )] / 1 (

[ 1 ₁

0 0 2 1

0

0 Q c c H Q c c

H − − − −

= (30)

Proof. For the proof we refer to Theorem 3. 

The way in which (26) (or (30)) varies with c, c₁ and c₂, and the fact that c₁ and c₂ are the factors of proportionality associated with losses from overshooting and undershooting relative to loss involved in increasing the length of interval, provides an interesting interpretation of confidence interval prediction.

IV. NEW-SAMPLE PREDICTION PROBLEM

For new-sample prediction, data from a past sample are used to make predictions on a future unit or sample of units from the same process or population. For example, based on previous (possibly censored) life test data, one could be interested in predicting the time to failure of a new item, time until l failures in a future sample of m units, or number of failures by time t_•in a future sample of m units.

θθθθ=(µ,σ), where µ and σ(>0) are respectively parameters of location and scale. For this family, invariant under the group of positive linear transformations: x→ax+b with a>0, we shall assume that there is obtainable from some informative experiment (the first k order statistics X₁<X₂<⋅⋅⋅ <Xk from a

previous random sample of size n) a sufficient statistic (Mk,Sk) (or a maximum likelihood estimator (µ)k,σ)k)) for

(µ,σ) based on X=(X₁, …, Xk) with density function

] / , / ) [( )

, | ,

(mk sk µσ σ 2p0 mk µ σ sk σ

p = − −

−∞<mk <∞, 0<sk <∞, −∞<µ>∞, σ >0.

(31) We are thus assuming that for the family of density functions an induced invariance holds under the group G of transformations: mk→amk+b, sk→ask or µk

) _→

k

aµ) +b,

k

σ) →aσk

) ₍_a_{>0). The family of density functions satisfying} the above conditions is, of course, the limited one of normal, negative exponential, Weibull and gamma (with known index) density functions. The structure of the problem is, however, more clearly seen within the general framework.

Let Ys be an independent future observation (sth order

statistic) from a new sample. If Ys is invariantly predictable

then W=(Ys−Mk)/Sk (or W= Ys µk σk

) ) ₎_/

( − ) is a maximal invariant pivotal, conditional on X.

We shall consider the interval prediction problem for the sth order statistic Ys, 1 ≤s ≤m, in a future sample of size m

for the situation where the first k observations X₁ < X₂ < ⋅⋅⋅< Xk, 1≤k<n, from a past sample of size n have been observed.

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(6)

Suppose that we deal with the piecewise-linear loss function (2).

The decision problem specified by the informative experiment density function (31) and the loss function (2) is invariant under the group G of transformations. Thus, the problem is to find the optimal interval predictor of Ys,

d argmin ( ,d),

d θθθθ

o R

D

∈

∗₌ ₍₃₂₎

where D_{is a set of invariant interval predictors of}_Y_s_, )

, (θθθθd

o

R =Eθθθθ{r(θθθθ,d)} is a risk function. C. Transformation of the Loss Function

), , ( ) ,

(_θθθθ_d _&_r_&_Vo _ηηηηo

r = (33)

where

    

> −

+ −

≤ ≤ −

< +

+ − =

), (

)

( ) (

), (

)

(

), (

)

-( ) (

) , (

2 2 1 2

1 2 2 2 1 2

2 2 1 2 1 2

1 2

2 1 1 2

1 2 2 1 1 1

o o o o

o o o o o

o o o o o o

o o

o o o o

o o o o o

o o

& &

V V V

c V V c

V V V V

c

V V V

c V V c r

η

ηηηη

V

(34)

), , ( 1o 2o o ₌ _V _V

V V₁o=(Ys−Mk)/σ, V2o =Sk/σ;

), , ( 1o 2o o₌ _η _η

ηηηη η₁o=(d₁−M_k)/S_k, η₂o=(d₂−Mk)/Sk.

{

( , )

}

{

( , )

}

) ,

( o o

o _θθθθ _θθθθ _&_& _ηηηη

θθθθ d V

d E r E r

R = =

∫ ∫

∞

∞ −

+ − =

0

2 1 2 1 2 1 1 1

2 1

) , ( ) (

o o

o o o o o o o o

v

dv dv v v f v v c

η

∫ ∫

∞ ∞

− +

0

2 1 2 1 2 2 1 2

2 2

) , ( ) (

o o

o o o o o o o o

v

dv dv v v f v v c

η

( ) ( , ) ,

0

2 1 2 1 2 1

2

∫ ∫

∞ ∞

∞ −

−

+_c_ηo _ηo _vo_fo _vo _vo _dvo_dvo ₍₃₆₎

which is constant on orbits when an invariant predictor (decision rule) d is used, where fo(v₁o,v₂o) is defined by the joint probability density of the first k observations X1 < X2 <

⋅⋅⋅< Xk from the past random sample of size n and the sth

order statistic Ys in the future sample of size m,

)! ( )! 1 (

! )!

( ! ) , | , ..., , ,

(x1 x2 x y _nn_k _s m_m _s

f k s

− − − =

σ µ

k n k

k

i

i F x

x

f −

=

−

×

∏

( | , [)1 ( | , )]

1

σ µ σ

µ

). , | ( )] , | ( 1 [ )] , | (

[ _µ_σ 1 _µ_σ _µ_σ

s s m s

s

s F y f y

y

F − − −

× (37)

Theorem 5 (Optimal invariant predictor of Ys based on

X). Suppose that (u₁, u₂) is a random vector having density function

), 0 real, ( )

, ( )

,

( 1 2

1

0

2 1 2 1 2 2 1

2 >

   

  

∞ ∞ −

∞ −

∫ ∫

u f

u o o

(38)

where _fo_{is defined by} ₍ _, ₎ 2 1o o o _v _v

f , and let Qo be the probability distribution function of u₁/u₂.

(i) If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on X is d*=(Mk+η₁oSk,

Mk+η₂oSk), where

Qo(η1o)=c/c1, Qo(η2o)=1−c/c2. (39)

(ii) If c/c1+c/c2≥1 then the optimal invariant linear-loss interval predictor of Ys based on X degenerates into a point

predictor Mk+ηoSk, where

). /( )

( c2 c1 c2

Qo ηo = + (40)

Corollary 5.1 (Minimum risk of the optimal invariant predictor of Ysbased on X).The minimum risk is given by

{

( , )

} {

( , )

}

)

,

( o o

o _θθθθ _θθθθ _&_& _ηηηη

θθθθ d V

d E r Er

R ∗ = ∗ =

∫ ∫

∞ ∞

∞

∞ −

+ −

=

0

2 1 2 1 1 2 2 1 0

2 1 1 1

2 2 2

1

) , ( )

, (

o o o

o

o o o o o o o

o o o o o

v v

η η

(41)

for case (i) with ηηηηo=(η1o,η2o) as given by (39) and for case (ii) with ηo=ηo =η

2

1 as given by (40).

Proof. For the proof we refer to Corollary 1.1. 

Theorem 6 (Equivalent confidence coefficient for d∗ based onX). Suppose thatVo=(V₁o,V₂o), is a random vector having density function fo(v1o,v2o) (v1o real, vo2> 0) where

o

f is defined by (38) and let Ho be the distribution function of Wo=V1o/V2o, i.e., the probability density function of _Wo_{is given by}

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(7)

( ) ( 2, 2) 2.

0

2o o o o o o o

o _w _v _f _w_v _v _dv

h

∫

∞

= (42)

Then the confidence coefficient associated with the optimum prediction interval d*=(d₁,d₂), where d₁=Mk+η1oSk,

d₂=Mk+η₂oSk, is

} , | :

{

Pr d∗ d₁<Ys<d₂ µσ

=Ho[Qo( −1)(1−c/c2)]−Ho[Qo( −1)(c/c1)]. (43)

Proof. For the proof we refer to Theorem 3.  V. NEW-WITHIN-SAMPLE PREDICTION PROBLEM For new-within-sample prediction, the problem is to predict future events in a sample or process based on early data from that sample or process as well as on a previous data sample from the same process or population. For example, if m units are followed until trand there are the r

observed failures, Y₁, …, Yr, from a current experiment as

well as the k observed failures, X1, …, Xk, from a previous

data sample (possibly censored), one could be interested in predicting the time of the next failure Yr+1; time until l

additional failures, Yr+l; number of additional failures in a

future interval (t₁,t₂).

θθθθ=(µ,σ), where µ and σ(>0) are respectively parameters of location and scale. For this family, invariant under the group of positive linear transformations: x→ax+b with a>0, we shall assume that there is obtainable from both some current informative experiment (the first r order statistics Y₁<Y₂<⋅⋅⋅ <Yr from a random sample of size m) and the k observed

failures, X1, …, Xk, from a previous censored data sample of

size n a sufficient statistic (M+,S+) (or a maximum likelihood estimator (µ)+,_σ)+_{)) for (}_µ_,_σ_{) based on}_Y₌₍_Y

1,…, Yr) and

X=(X₁,…, Xk) with density function

] / , / ) [( )

, | ,

(m+ s+ µσ =σ−2p0 m+−µ σ s+ σ p

−∞<_m+<∞, 0<_s+<∞, −∞<_µ>∞, _σ >0.

₍₄₄₎

We are thus assuming that for the family of density functions an induced invariance holds under the group G of transformations: m+→am++b, s+→as+ or µ)+→aµ)++b,

+

σ) → _σ)+

a (a>0). The family of density functions satisfying the above conditions is, of course, the limited one of normal, negative exponential, Weibull and gamma (with known index) density functions. The structure of the problem is, however, more clearly seen within the general framework.

Let Ys be a future observation (sth order statistic) from the

same sample of size m. If Ys is invariantly predictable then

+ + +₌ _Y ₋_M _S

W ( s )/ (or W+=(Ys−µ)+)/σ)+) is a maximal

invariant pivotal, conditional on (X,Y).

We shall consider the interval prediction problem for the sth order statistic Ys, 1 ≤s ≤m, in the same sample of size m

for the situation where both the first k observations X1 < X2 <

⋅⋅⋅< Xk, 1≤k<n, from the previous data sample of size n and

the early observed r order statistics Y1<Y2< ⋅⋅⋅ <Yr from a

random sample of size m are available. Suppose that we deal with the piecewise-linear loss function (2).

The decision problem specified by the informative experiment density function (31) and the loss function (2) is invariant under the group G of transformations. Thus, the problem is to find the optimal interval predictor of Ys,

d argmin ( ,d),

d θθθθ

+ ∈

∗₌ _R

D (45)

where D_{is a set of invariant interval predictors of}Ys,

d , (θθθθ

+

R =E_θθθθ{r(θθθθ,d)} is a risk function. C. Transformation of the Loss Function

), , ( ) ,

(θθθθd =r V+ ηηηη+

r && (46)

where

) , (_V+ _ηηηη+

r& &

    

> −

+ −

≤ ≤ −

< +

+ − =

+ + + +

+ + + + +

+ + + + + +

+ +

+ + + +

+ + + + +

), (

)

( ) (

), (

)

(

), (

)

-( ) (

2 2 1 2

1 2 2 2 1 2

2 2 1 2 1 2

1 2

2 1 1 2

1 2 2 1 1 1

V V V

c V V c

V V V V

c

V V V

c V V c

η η

(47)

), , ( ₁+ ₂+ +₌ _V _V

V V₁+=(Ys−M+)/σ, V2+=S+/σ;

), , ( ₁+ ₂+ +₌

η

ηηηη

η

₁+₌(_d₁₋_M+)/_S+, ₍ ₎_/ _.

2

2+= d −M+ S+

η

{

( , )

}

{

( , )

}

) ,

( + +

+ _θθθθ ₌ _θθθθ ₌ _ηηηη

θθθθ d V

d E r Er

R &&

∫ ∫

∞

∞ −

+ + + + + + + +

+ +

+ − =

0

2 1 2 1 2 1 1 1

2 1

) , ( ) (

v

dv dv v v f v v c

η

∫ ∫

∞ ∞

+ + + + + + + +

+ +

− +

0

2 1 2 1 2 2 1 2

2 2

) , ( ) (

v

dv dv v v f v v c

η

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(8)

( ) ( , ) ,

0

2 1 2 1 2 1

2

∫ ∫

∞ ∞

∞ −

+ + + + + + + +₋

+cη η v f v v dv dv (49)

which is constant on orbits when an invariant predictor (decision rule) d is used, where f+(v₁+,v₂+) is defined by the joint probability density of the first k observations X₁ < X₂ < ⋅⋅⋅< Xk from the previous random sample of size n, the

early observed r order statistics Y₁<Y₂<⋅⋅⋅ <Yr from a random

sample of size m, and the future sth order statistic Ys (s>r) in

the same sample of size m,

) , | , ..., , , ..., , ,

(x1 x2 xk y1 yr ys µσ

f

)! ( )! 1 (

! )!

( !

s m r s

m k

n n

− − − − =

k n k

k

i

i F x

x

f −

=

−

×

∏

( | , [)1 ( | , )] 1

σ µ σ

µ

s m r

r s r

s F y F y

y

F − −− − −

×[ ( |_µ,_σ) ( |_µ,_σ)] 1[1 ( |_µ,_σ)]

( | , ) ( | , ).

1

σ µ σ

µ s

r

i

i f y

y f

∏

=

× (50)

Theorem 7 (Optimal invariant predictor of Ys based on

(X,Y)). Suppose that (u₁, u₂) is a random vector having density function

), 0 real, ( )

, ( )

,

( 1 2

1

0

2 1 2 1 2 2 1

2 _ >

 

  

∞ ∞ −

∞ −

+

+ _u _u

∫ ∫

_u _f _u _u _du_du _u _u

f u

(51)

where _f+_{is defined by} ₍ _, ₎

2 1+ +

+ _v _v

f , and let Q+ be the probability distribution function of u1/u2.

(i) If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on (X,Y) is d*= (M++η1+S+,

+ + +₊ _S

M η₂ ), where

. / 1 ) ( , / )

( 1 c c1 Q 2 c c2

Q+ η+ = +η+ = − (52)

(ii) If c/c₁+c/c₂≥1 then the optimal invariant linear-loss interval predictor of Ys based on (X,Y) degenerates into a

point predictor_M+₊_η+_S+,_where

). /( )

( c₂ c₁ c₂

Q+η+ = + (53)

Corollary 5.1 (Minimum risk of the optimal invariant predictor of Ysbased on (X,Y)).The minimum risk is given

by

{

( , )

} {

( , )

}

)

,

( ∗ ∗ + +

+ _θθθθ ₌ _θθθθ ₌ _ηηηη

θθθθ d V

d E r Er

R &&

∫ ∫

+ + ∞ ∞ + + + + + + ∞

∞ −

+ + + +

+ + +

+

+ −

=

0

2 1 2 1 1 2 2 1 0

2 1 1 1

2 2 2

1

) , ( )

, (

v v

η η

(54)

for case (i) with ηηηη+=(η1+,η2+) as given by (52) and for case (ii) with η+=η+ =η

2

1 as given by (53).

Proof. For the proof we refer to Corollary 1.1. 

Theorem 8 (Equivalent confidence coefficient for d∗ based on (X,Y)). Suppose thatV+=(V₁+,V₂+), is a random vector having the probability density function f+(v1+,v2+) (v1+ real, v2+> 0), where f+ is defined by (50), and let H+ be the probability distribution function of +₌ + +

2 1 /V V

W ,

where the probability density function of W+ is given by

( ) ( ₂, ₂) ₂.

0

2 + + + + +

∞ + +

+ _w ₌

∫

_v _f _w _v _v _dv

h (55)

Then the confidence coefficient associated with the optimum prediction interval d*=(d₁,d₂), where d₁=M++η₁+S+, d₂=M++η₂+S+, is

} , | :

{

Pr d∗ d1<Ys<d2 µσ

[ (1 / )] [ ( 1)( / ₁)].

2 )

1

( _c _c _H _Q _c _c

Q

H+ + − − − + + −

= (56)

Proof. For the proof we refer to Theorem 3. 

VI. EXAMPLES

A. Within-Sample Prediction of Order Statistic

Exponential distribution. Let Y₁< Y₂< ⋅⋅⋅ <Ym be order

statistics of size m from the exponential distribution with the density

. 0 , 0 , exp 1 ) |

(  > >

     −

= σ

σ σ

σ y y

y

f (57)

Consider the prediction problem of Ys for the situation

where the first r observations Y1< Y2< ⋅⋅⋅ <Yr, 1≤r < s ≤m,

have been observed. We are now concerned with optimization of the prediction interval for Ys under the loss

function (2).

Let Y=(Y1, …, Yr) and Ys > Yr for s ≤ m. Then the joint

probability density function of Y and Ys is given by

)! ( )! 1 (

! )

| , ..., , ( ₁

s m r s

m y

y y

f r s

− − − = σ

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(9)

s m r r s r

s F y F x

y

F − −− − −

×[ ( |_σ) ( |_σ)] 1[1 ( |_σ)]

∏

=

× r

i

s i f y

y f 1 ) | ( ) |

( σ σ

              − + − − − − =

∑

= + _σ σ r i r i r y r m y s m r s m ₁ 1 ) ( exp 1 )! ( )! 1 ( ! . exp exp 1 1

1 −+

− −            ₋ −            ₋ − − × s m r s r s r

s y y y

y

σ

σ (58)

Let

σ r

s Y

Y

V₁= − (59)

and

.

) (

1

2 _σ _σ

∑

= − + = = r i r i r Y r m Y S V (60)

Using the invariant embedding technique [14-23], we then find in a straightforward manner that the joint density of V₁, V₂ is

), ( ) ( ) ,

(v₁ v₂ f₁ v₁ f₂ v₂

f = (61)

where ) 1 , ( ] [ ] 1 [ )

( ₁ 1 1

1 1 1 + − − Β − = − −− − −+ s m r s e e v f s m v r s v , ) 1 ( 1 ) 1 , (

1 1 ( 1 )

0

1m s j

v j r s j e j r s s m r s + + − − − − = −       − − + − − Β =

∑

v₁>0, (62) and , ) ( 1 )

( 1 2

2 2

2 vr e v

r v

f − −

Γ

= v2>0. (63)

It follows from (15) and (61) that

2 2 2 2 1 0 2 2 0 0 2 1 2 1 2 0 2 2 2 2 2 ) ( ) ( 1 ) , ( ) , ( )

( v f wv f v dv

r dv dv v v f v dv v wv f v w q

∫

∫ ∫

∫

∞ ∞ ∞ ∞ = = ) 1 , ( 1 + − − Β + = s m r s r . )] 1 ( 1 [ 1 ) 1 ( 1 2 1 0 + − − = + − + + −       ₋ ₋

×

∑

j _r

r s

j j wm s j

r s

(64)

2 2 2 2 1 0 2 2 2 2 0

2 ( , ) ( ) ( )

)

(w v f wv v dv v f wv f v dv

h

∫

∞ ∞ = = ) 1 , ( − − + Β = s m r s r . )] 1 ( 1 [ 1 ) 1 ( 1 1 1 0 + − − = + − + + −       ₋ ₋

×

∑

j _r

r s

j j wm s j

r s

(65)

If c/c₁+c/c₂<1 then the optimal invariant linear-loss interval predictor of Ys based on Y is given by

d*=(Yr+η1Sr, Yr+η2Sr), (66)

where         =

=

∫

1

0 1

1 arg ( )

η η c c dw w q (67) and . 1 ) ( arg 2 0 2

2 _

      − = =

∫

η η c c dw w q (68)

The confidence coefficient associated with the optimum prediction interval d*=(d₁,d₂), where d₁=Yr+η1Sr,

d₂=Yr+η2Sr, is given by

} , | :

{

Pr d∗ d1<Ys<d2 µσ

∫

= − = 2 1 . ) ( ] [ ]

[ 2 1

η

η H hw dw

H (69)

B. New-Sample Prediction of Order Statistic

Exponential distribution. Consider the problem of prediction of the sth order statistic Ys, 1 ≤s ≤ m, in a new

(future) sample of size m from the exponential distribution with the probability density function (57) for the situation where the first k observations X₁ < X₂ < ⋅⋅⋅< Xk, 1≤k≤n, from

the previous data sample of size n have been observed. We are now concerned with optimization of the prediction interval for Ys under the loss function (2).

Let X=(X₁, X₂, …, Xk) for k ≤n. Then the joint probability

density function of X1, X2, …, Xk is given by

∏

= − − − = k i k n k i

k f x F x

k n n x x x f 1 2

1 , ,..., | ) ₍ ! _)! ( | )[1 ( | )]

( σ σ σ

              − + − − =

∑

= σ σ k i k i k x k n x k n

n ₁ ( )

exp 1 )! (

! _. ₍₇₀₎

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(10)

The probability density function of the sth order statistic Ys

in the future sample of size m is given by

)! ( )! 1 ( ! ) | ( s m s m y f _s − − = σ ) | ( )] | ( 1 [ )] | (

[ _σ 1 _σ _σ

s s m s s

s F y f y

y

F − − −

× . 1 ) ( exp ) 1 ( ) 1 , ( 1

0 σ σ

      ₊ − −       ₋ + − Β =

∑

− = s j s m j y s j j s m s m

s (71)

Let

σs Y

V₁o = (72)

and

. ) (

1

2 _σ _σ

∑

= − + = = k i k i k X k n X S

Vo (73)

Using the invariant embedding technique [14-23], we then find in a straightforward manner that the joint density of V1, V₂ is

), ( ) ( ) ,

( 1o 2o 1o 1o 2o 2o o _v _v _f _v _f _v

f = (74)

where

(

( )

)

, exp ) 1 ( ) 1 , ( 1 ) ( 1 0 1

1o o j svo

j s m s m s v f j s m j + − −       ₋ + − Β =

∑

− =

v1o>0, (75) and , ) ( ) ( 1 )

( 1 2

2 2 2 o o o

o k v

e v k v

f − −

Γ

= v2o >0. (76)

∫∫

∫

∞ ∞ ∞ = 0 0 2 1 2 1 2 0 2 2 2 2 2 ) , ( ) , ( ) ( ) ( o o o o o o o o o o o o o o dv dv v v f v dv v v w f v w q o o o o o o o 2 2 2 2 1 0 2

2) ( ) ( )

( 1 dv v f v w f v k

∫

∞ = . )] ( 1 [ 1 ) 1 ( ) 1 , ( 1 2 0 + − = + + −       ₋ + − Β +

=

∑

m s j _k

j j w s j

s m s m s k o (77) It follows from (25) and (74) that

o o o o o o o o o o o o o 2 2 2 2 1 0 2 2 2 2 0

2 ( , ) ( ) ( )

)

(w v f wv v dv v f wv f v dv

h

∫

∞ ∞ = = . )] ( 1 [ 1 ) 1 ( ) 1 , ( 1 0 + − = + + −       ₋ + − Β

=

∑

m s j _k

j j w s j

s m s m s k o (78) If c/c₁+c/c₂<1 then the optimal invariant linear-loss interval predictor of Ys based on X is given by

d*=(Xk+η₁oSk, Xk+η₂oSk), (79)

where           = =

∫

o o o o o 1 0 1

1 arg ( )

η η c c dw w

q (80)

and . 1 ) ( arg 2 0 2

2 _

       − = =

∫

o o o o o η η c c dw w

q (81)

The confidence coefficient associated with the optimum prediction interval d*=(d₁,d₂), where d₁=Xk+η₁oSk,

d2=Xk+η2oSk, is given by

} | :

{

Pr d∗ d1<Ys <d2 σ

∫

= − = o o o o o o o o o 2 1 . ) ( ] [ ]

[ 2 1

η

η H h w dw

H (82)

C. New-Within-Sample Prediction of Order Statistic Exponential distribution. Consider the prediction problem of the sth order statistic Ys, 1 ≤ s ≤ m, in a new sample of

size m from the exponential distribution with the probability density function (57) for the situation where both the first r observations Y1< Y2< ⋅⋅⋅ <Yr, 1≤r < s ≤m, from the new data

sample and the first k observations X₁ < X₂ < ⋅⋅⋅< Xk, 1≤k≤n,

from the previous data sample of size n have been observed. We are now in need to optimize the prediction interval for Ys

under the loss function (2).

Let X=(X₁, X₂, …, Xk) for k ≤n. Then the joint probability

density function of X1, X2, …, Xk is given by

∏

= − − − = k i k n k i

k f x F x

k n n x x x f 1 2

1 , ,..., | ) ₍ ! _)! ( | )[1 ( | )]

( σ σ σ

              − + − − =

∑

= σ σ k i k i k x k n x k n

n ₁ ( )

exp 1 )! (

! _. ₍₈₃₎

Let Y=(Y₁, …, Yr) and Ys > Yr for s ≤ m. Then the joint

probability density function of Y and Ys is given by

)! ( )! 1 ( ! ) | , ..., , ( 1 s m r s m y y y

f r s

− − − = σ

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(11)

s m r r s r

s F y F x

y

F − −− − −

×[ ( |_σ) ( |_σ)] 1[1 ( |_σ)]

∏

=

× r

i

s i f y

y f 1 ) | ( ) |

( σ σ

              − + − − − − =

∑

= + _σ σ r i r i r y r m y s m r s m ₁ 1 ) ( exp 1 )! ( )! 1 ( ! . exp exp 1 1

1 −+

− −            ₋ −            ₋ − − × s m r s r s r

s y y y

y

σ

σ (84)

Let

σ r

s Y

Y

V₁+= − (85)

and . ) ( ) ( 1 1

2 _σ _σ

∑

= = + + ₌ ₌ + − + + − r i r i k i k

i n k X Y m rY

X S

V (86)

Using the invariant embedding technique [14-23], we then find in a straightforward manner that the joint density of

+

1

V ,V₂+ is

), ( ) ( ) ,

( 1+ 2+ 1+ 1+ 2+ 2+

+ _v _v ₌ _f _v _f _v

f (87)

where ) 1 , (