• No results found

Online Full Text

N/A
N/A
Protected

Academic year: 2020

Share "Online Full Text"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Abstract— The purpose of this paper is to develop technique

appropriate for construction of frequentist prediction intervals or prediction bounds for order statistics in future samples when only the functional form of the underlying distribution is specified, but some or all of its parameters are unspecified. Prediction intervals for order statistics are widely used for reliability problems and other related problems. The determination of these intervals has been extensively investigated. But the optimal properties of these intervals in the sense of minimal area for a given probability content or maximal probability content for a given area have not been fully explored. In this paper, in order to discuss this problem, a specific loss function is introduced to compare prediction intervals. In particular, within-sample prediction based on the early observed data from a current experiment (i.e., when for predicting the future observation in a sample there are available the early observed data only from that sample), new-sample prediction based on a previous new-sample (i.e., when for predicting the future observation in a new sample there are available the data only from a previous sample), and new-within-sample prediction based on both the early-failure data from that sample and the data from a previous sample (i.e., when for predicting the future failure time of an unit in a new sample there are available both the early-failure data from that sample and the data from a previous sample). We restrict attention to families of distributions invariant under location and/or scale changes. The technique used here for optimization of prediction intervals based on censored data emphasizes pivotal quantities relevant for obtaining ancillary statistics. It allows one to solve the optimization problems in a simple way. Illustrative examples are given.

Index Terms—Order statistic, prediction interval, loss

function, optimization

I. INTRODUCTION

REDICTION is perhaps one of the most commonly undertaken activities in the physical, the engineering, and the biological sciences. In the econometric and the social sciences, prediction generally goes under the name of

Manuscript received October 7, 2011. This work was supported in part by Grant No. 06.1936, Grant No. 07.2036, Grant No. 09.1014, and Grant No. 09.1544 from the Latvian Council of Science and the National Institute of Mathematics and Informatics of Latvia.

N.A. Nechval is with the Statistics Department, EVF Research Institute, University of Latvia, Riga LV-1050, Latvia (e-mail: [email protected]).

M. Purgailis is with the University of Latvia, Riga LV-1050, Latvia (e-mail: [email protected]).

K.N. Nechval is with the Applied Mathematics Department, Transport and Telecommunication Instutute, Riga LV-1019, Latvia (e-mail: [email protected]).

V.F. Strelchonok is with the Baltic International Academy, Riga LV-1019, Latvia (e-mail: [email protected]).

forecasting, and in the actuarial and the assurance sciences under the label life-length assessment. Automatic process control, filtering, and quality control, are some of the engineering techniques that use prediction as a basis of their modus operandus. Statistical techniques play a key role in prediction, with regression, time series analysis, and dynamic linear models (also known as state space models) being the predominant tools for producing forecasts. The importance of statistical methods in forecasting was underscored by Pearson [1] who claimed that prediction is the “fundamental problem of practical statistics.” Patel [2] provides an extensive survey of literature on this topic. In the areas of reliability and life-testing, lifetime data are often modeled via the Exponential and the Weibull in order to make predictions about future observations. Prediction intervals are constructed to have a reasonably high probability of containing a specified number of such future observations. Interval prediction is an important part of the forecasting process aimed at enhancing the limited accuracy of point estimation. An interval forecast usually consists of an upper and a lower limit between which the future value is expected to lie with a prescribed probability. The limits are sometimes called prediction limits or prediction bounds. These limits may be helpful in establishing warranty policy, determining maintenance schedules, etc. For a very readable discussion of prediction limits and related intervals, see Hahn and Meeker [3].

Many authors have reported their efforts for constructing prediction limits for the Weibull and for the related extreme value distributions (see Patel [2]). Mann and Saunders [4] proposed prediction limits for the Weibull which make use of only two or three order statistics (see also Mann [5]). Antle and Rademaker [6] used simulation to produce a table of factors to use with ML estimates to obtain prediction limits. Lawless [7] proposed prediction limits based on a conditional confidence approach; his limits require both determination of the ML estimates and numerical integration. Engelhardt and Bain [8-9], and Fertig, Meyer and Mann [10] have proposed various approximate prediction limits for the Weibull. Mee and Kushary [11] provided a simulation based procedure for constructing prediction intervals for Weibull populations for Type II censored case. This procedure is based on maximum likelihood estimation and requires an iterative process to determine the percentile points. Bhaumik and Gibbons [12], and Krishnamoorthy et al. [13] proposed approximate methods for constructing upper prediction limits for a gamma distribution.

Consider the following examples of practical problems

Optimal Predictive Inferences for Future Order

Statistics via a Specific Loss Function

Nicholas Nechval,

Member, IAENG

, Maris Purgailis, Konstantin Nechval, and Vladimir Strelchonok

P

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(2)

which often require the computation of prediction bounds and prediction intervals for future values of random quantities: (i) a consumer purchasing a refrigerator would like to have a lower bound for the failure time of the unit to be purchased (with less interest in distribution of the population of units purchased by other consumers); (ii) financial managers in manufacturing companies need upper prediction bounds on future warranty costs; (iii) when planning life tests, engineers may need to predict the number of failures that will occur by the end of the test to predict the amount of time that it will be take for a specified number of units to fail.

Some applications require a two-sided prediction interval that will, with a specified high degree of confidence, contain the future random variable of interest. It is important to note that in the context of this paper, a prediction interval is not to be viewed as a confidence interval. The former is an estimate of a future observable value; the latter an estimate of some fixed but unknown (and often unobservable) parameter. Prediction intervals are produced via frequentist or Bayesian methods, whereas confidence intervals can only be constructed via a frequentist argument. The discussion of this paper revolves around prediction intervals produced by a frequentist approach; thus we are concerned here with frequentist prediction intervals. In many applications, however, interest is focused on either an upper prediction bound or a lower prediction bound (e.g., the maximum warranty cost is more important than the minimum, and the time of the early failures in a product population is more important than the last ones).

Conceptually, it is useful to distinguish between ‘within-sample’ prediction, ‘within-sample’ prediction, and ‘new-within-sample’ prediction.

For within-sample prediction, the problem is to predict future events in a sample or process based on early data from that sample or process. If, for example, m units are followed until trand there are r observed failures, Y1 < Y2 < ⋅⋅⋅< Yr, one could be interested in predicting the time of the

next failure, Yr+1; time until l additional failures, Yr+l; number

of additional failures in a future interval (t1,t2).

For new-sample prediction, data from a previous sample are used to make predictions on a future unit or sample of units from the same process or population. For example, based on previous (possibly censored) life test data, one could be interested in predicting the time to failure of a new unit, time until r failures in a future sample of m units, or number of failures by time t in a future sample of m units.

For new-within-sample prediction, the problem is to predict future events in a sample or process based on the early data from that sample or process as well as on a previous data sample from the same process or population. For example, if m units are followed until trand there are the

r observed failures, Y1, …, Yr, from a current experiment as

well as the k observed failures, X1, …, Xk, from a previous

data sample (possibly censored), one could be interested in predicting the time of the next failure Yr+1; time until l

additional failures, Yr+l; number of additional failures in a

future interval (t1,t2).

In general, to predict a future realization of a random

quantity one needs the following:

1) A statistical model to describe the population or process of interest. This model usually consists of a distribution depending on a vector of parameters θθθθ. In this paper, attention is restricted to families of distributions which are invariant under location and/or scale changes. In particular, the case may be considered where a previously available complete or type II censored sample is from a continuous distribution with cdf F((y-µ)/σ), where F(⋅) is known but both the location (µ) and scale (σ) parameters are unknown. For such family of distributions the decision problem remains invariant under a group of transformations (a subgroup of the full affine group) which takes µ (the location parameter) and σ (the scale) into cµ + b and cσ, respectively, where b lies in the range of µ, c > 0. This group acts transitively on the parameter space.

2) Information on the values of components of the parametric vector θθθθ. It is assumed that only the functional form of the distribution is specified, but some or all of its parameters are unspecified. In such cases ancillary statistics and pivotal quantities, whose distribution does not depend on the unknown parameters, are used.

The technique used here for constructing prediction intervals (or bounds) emphasizes pivotal quantities relevant for obtaining ancillary statistics. It represents a simple procedure that can be utilized by non-statisticians, and which provides easily computable explicit expressions for both prediction bounds and prediction intervals. The technique is a special case of the method of invariant embedding of sample statistics into a performance index (see, e.g., Nechval et al. [14-23]) applicable whenever the statistical problem is invariant under a group of transformations, which acts transitively on the parameter space.

II. WITHIN-SAMPLE PREDICTION PROBLEM

For within-sample prediction, the problem is to predict future events in a sample or process based on early data from that sample or process. For example, if m units are followed until Yrand there are r observed failures, Y1, …, Yr,

one could be interested in predicting the time of the next failure Yr+1; time until l additional failures, Yr+l; number of

additional failures in a future interval.

A. Location-Scale Family of Density Functions

Consider a situation described by a location-scale family of density functions, indexed by the vector parameter

θθθθ=(µ,σ), where µ and σ(>0) are respectively parameters of location and scale. For this family, invariant under the group G of positive linear transformations: yay+b with a>0, we shall assume that there is obtainable (from some informative experiment) the first r order statistics Y1<Y2<⋅⋅⋅ <Yr from a

random sample of size m with cumulative distribution function

, )

, |

( 

     − ≡

σ µ σ

µ F y

y F

. 0 ,

, )

(−∞ µ<y<∞ −∞<µ<∞ σ > (1)

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(3)

If Ys is a future observation (sth order statistic) from the

same sample of size m, then W=(YYr)/Sr (or W=

r r s Y Y

Y )/

( − ) is an ancillary statistic, the distribution of which does not depend on (µ,σ); Sr is a sufficient statistic

(or a maximum likelihood estimatorσk

) ) for σ based on

Y=(Y1, …, Yr).

B. Piecewise-Linear Loss Function

We shall consider the interval prediction problem for the sth order statistic Ys, r<sm, in the same sample of size m for

the situation where the first r observations Y1 < Y2 < ⋅⋅⋅< Yr,

1≤r<m, have been observed. Suppose that we assert that an interval d=(d1,d2) contains Ys. If, as is usually the case, the

purpose of this interval statement is to convey useful information we incur penalties if d1 lies above Ys or if d2

falls below Ys. Suppose that these penalties are c1(d1− Ys)

and c2(Ysd2), losses proportional to the amounts by which

Ys escapes the interval. Since c1 and c2 may be different the

possibility of differential losses associated with the interval overshooting and undershooting the true µ is allowed. In addition to these losses there will be a cost attaching to the length of interval used. For example, it will be more difficult and more expensive to design or plan when the interval d=(d1,d2) is wide. Suppose that the cost associated with the interval is proportional to its length, say c(d2d1). In the specification of the loss function, σ is clearly a ‘nuisance parameter’ and no alteration to the basic decision problem is caused by multiplying all loss factors by 1/σ. Thus we are led to investigate the piecewise-linear loss function

  

  

 

> −

+ −

≤ ≤ −

< −

+ −

=

). (

) ( ) (

), (

) (

), (

) ( ) (

) , (

2 2

2 1 2

2 1

1 2

1 1

2 1

1

d Y d

Y c d d c

d Y d d

d c

d Y d

d c Y d c

r

s s

s s s

σ σ

σ

σ σ

d

θθθθ (2)

The decision problem specified by the informative experiment distribution function (1) and the loss function (2) is invariant under the group G of transformations. Thus, the problem is to find the best invariant interval predictor of Ys,

), , ( min

arg d

d

d θθθθ

R

D

= (3)

where D is a set of invariant interval predictors of Ys, R(θθθθ,d)=Eθθθθ{r(θθθθ,d)} is a risk function.

C. Transformation of the Loss Function

It follows from (2) that the invariant loss function, r(θθθθ,d), can be transformed as follows:

), , ( ) ,

(θθθθd r V ηηηη

r =&& (4)

where

    

> −

+ −

≤ ≤ −

< +

+ − =

), (

)

( ) (

), (

)

(

), (

)

-( ) (

) , (

2 2 1 2

1 2 2 2 1 2

2 2 1 2 1 2

1 2

2 1 1 2

1 2 2 1 1 1

V V V

c V V c

V V V V

c

V V V

c V V c r

η η

η η

η η

η η

η η

η η

ηηηη

V

& &

(5)

V=(V1,V2), V1=(YsYr)/σ, V2=Sr/σ;

ηηηη=(η1,η2), η1=(d1−Yr)/Sr, η2=(d2−Yr)/Sr.

(6) D. Risk Function

It follows from (5) that the risk associated with d and θθθθ can be expressed as

{

( , )

} {

( , )

}

) ,

(θθθθd Eθθθθ r θθθθd Er V ηηηη

R = = &&

∫ ∫

+ − =

0 0

2 1 2 1 2 1 1 1

2 1

) , ( ) (

v

dv dv v v f v v c

η

η

∫ ∫

∞ ∞

− +

0

2 1 2 1 2 2 1 2

2 2

) , ( ) (

v

dv dv v v f v v c

η

η

( ) ( , ) ,

0 0

2 1 2 1 2 1

2

∫ ∫

∞ ∞

+cη η v f v v dvdv (7)

which is constant on orbits when an invariant predictor (decision rule) d is used, where f(v1,v2) is defined by the joint probability density of the first r observations Y1 < Y2 <

⋅⋅⋅< Yr and Ys,

)! ( )! 1 (

! )

, | , ..., , ( 1

s m r s

m y

y y

f r s

− − − =

σ µ

s m r

r s r

s F y F y

y

F − −− − −

×[ ( |µ,σ) ( |µ,σ)] 1[1 ( |µ,σ)]

( | , ) ( | , ).

1

σ µ σ

µ s

r

i

i f y

y f

=

× (8)

E. Risk Minimization and Invariant Prediction Rules The following theorem gives the central result in this section.

Theorem 1 (Optimal predictor of Ysbased onY). Suppose

that (u1, u2) is a random vector having density function

( , ) ( , ) ( 1 , 2 0),

1

0 0

2 1 2 1 2 2 1

2  >

 

  

∞ ∞ −

∫ ∫

u f u u dudu u u u

u f

u (9)

where f is defined by f(v1,v2), and let Q be the probability distribution function of u1/u2.

(i) If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on Y is d*=(Yr+η1Sr, Yr+η2Sr),

where

Q1)=c/c1, Q(η2)=1−c/c2. (10)

(ii) If c/c1+c/c2≥1 then the optimal invariant linear-loss interval predictor of Ys based on Y degenerates into a point

predictor YrSr, where

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(4)

). /( )

( c2 c1 c2

Qη = + (11)

Proof. From (7)

{

}

1

) , (

η ∂ ∂E r&&V ηηηη

∫ ∫

∫ ∫

∞ ∞

− =

0 0 2 1 2 1 2 0 0 2 1 2 1 2

1 ( , ) ( , )

2 1

dv dv v v f v c dv dv v v f v c

v

η

( , ) [ 1 ( 1) ],

0 0

2 1 2 1

2f v v dvdv cQ c

v

=∞ ∞

∫ ∫

η (12)

and

{

}

2

) , ( η ∂ ∂E r&&V ηηηη

,] )) ( 1 ( [ ) , (

0

2 2

0

2 1 2 1 2

∫ ∫

∞ ∞

+ −

= v f v v dvdv c Qη c (13)

where

η =η

0

, ) ( )

( qw dw

Q (14)

, ) , (

) , ( )

(

0 0

2 1 2 1 2 0

2 2 2 2 2

∫ ∫

∞ ∞ ∞

=

dv dv v v f v

dv v wv f v

w

q (15)

. / 2 1 V V

W= (16)

Now ∂E{r&&(V,ηηηη)}/∂η1 = ∂E{r&&(V,ηηηη)}/∂η2 = 0if and only if (10) hold. Thus, E{r&&(V,ηηηη)} provided (10) has a solution with η1<η2 and this is so if 1−c/c2>c/c1. It is easily confirmed that this ηηηη=(η1,η2) gives the minimum value of E{r&&(V,ηηηη)}. Thus (i) is established.

If c/c1+c/c2≥1 then the minimum of E{r&&(v,η)} in the

region η2≥η1 occurs where η1=η2=η, η being determined by setting

E{r&&(V,(η,η))}/∂η=0 (17)

and this reduces to

, 0 )] ( 1 [ )

( 2

1Qη −cQη =

c (18)

which establishes (ii). 

Corollary 1.1 (Minimum risk of the optimal invariant predictor of Ysbased on Y).The minimum risk is given by

{

( , )

}

{

( , )

}

) ,

(θθθθd Eθθθθ r θθθθd E r V ηηηη

R ∗ = ∗ = &&

∫ ∫

∫ ∫

∞ ∞

+ −

=

0

2 1 2 1 1 2 0 0

2 1 2 1 1 1

2 2 2

1

) , ( )

, (

v v

dv dv v v f v c dv dv v v f v c

η η

(19) for case (i) with ηηηη=(η1,η2) as given by (10) and for case (ii) with η1=η2=η as given by (11).

Proof. These results are immediate from (7) when use is made of ∂E{r&&(V,ηηηη)}/∂η1 = ∂E{r&&(V,ηηηη)}/∂η2 = 0 in case (i) and ∂E{r&&(V,(η,η))}/∂η=0 in case (ii). 

The underlying reason why c/c1+c/c2 acts as a separator of interval and point prediction is that for c/c1+c/c2≥1 every interval predictor is inadmissible, there existing some point predictor with uniformly smaller risk.

Theorem 2 (Optimal invariant predictor of Ys based on

Yr). Suppose that µ=0 and

V=(V1,V2), V1=(YsYr)/σ, V2=Yr/σ;

ηηηη=(η1,η2), η1=(d1−Yr)/Yr, η2=(d2−Yr)/Yr.

(20)

Let us assume that (u1, u2) is a random vector having density function

( , ) ( , ) ( 1 , 2 0),

1

0 0

2 1 2 1 0 2 2 1 0

2 >

   

  

∞ ∞ −

∫ ∫

u f u u dudu u u u

u f

u (21)

where f0 is defined by f0(v1,v2), and let Q0 be the probability distribution function of u1/u2.

(i) If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on Yr is d*=((1+η1)Yr,

(1+η2)Yr), where

. / 1 ) ( Q , / )

( 1 1 0 2 2

0 c c c c

Q η = η = − (22)

(ii) If c/c1+c/c2≥1 then the optimal invariant linear-loss interval predictor of Ys based on Yr degenerates into a point

predictor (1+η)Yr, where

Q0(η)=c2/(c1+c2). (23)

Proof. For the proof we refer to Theorem 1. 

Corollary 2.1 (Minimum risk of the optimal invariant predictor of Ysbased on Yr).The minimum risk is given by

{

( , )

}

{

( , )

}

) ,

(θθθθd Eθθθθ r θθθθd E r V ηηηη

R ∗ = ∗ = &&

∫ ∫

∫ ∫

∞ ∞

+ −

=

0

2 1 2 1 0 1 2 0 0

2 1 2 1 0 1 1

2 2 2

1

) , ( )

, (

v v

dv dv v v f v c dv dv v v f v c

η η

(24)

for case (i) with ηηηη=(η1,η2) as given by (22) and for case (ii) with η1=η2=η as given by (23).

Proof. For the proof we refer to Corollary 1.1. 

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(5)

III. EQUIVALENT CONFIDENCE COEFFICIENT

For case (i) when we obtain an interval predictor for Ys we

may regard the interval as a confidence interval in the conventional sense and evaluate its confidence coefficient. The general result is contained in the following theorem.

Theorem 3 (Equivalent confidence coefficient for dbased on Y). Suppose that V=(V1,V2) is a random vector having density function f(v1,v2) (v1,v2>0) where f is defined by (8) and let H be the distribution function of W=V1/V2, i.e., the probability density function of W is given by

( ) ( 2, 2) 2.

0

2f wv v dv v

w

h

= (25)

Then the confidence coefficient associated with the optimum prediction interval d*=(d1,d2), where d1=Yr+η1Sr,

d2=Yr+η2Sr, is

} , | :

{

Pr dd1<Ys<d2 µσ

)]. / ( [ )] / 1 (

[Q 1 c c2 H Q 1 c c1

H − − − −

= (26)

Proof. The confidence coefficient for d∗ corresponding to (µ,σ) is given by

} , | :

) ,

Pr{(Yr Sr Yr+η1Sr <Ys<Yr+η2Sr µσ

} /

: ) ,

Pr{( 1 2 η1< 1 2 <η2

= v v v v

)]. / ( [ )] / 1 ( [ ) ( )

( 2 H 1 HQ 1 c c2 H Q 1 c c1

H − = − − − −

= η η

(27)

This is independent of (µ,σ). 

Theorem 4 (Equivalent confidence coefficient for dbased on Yr). Suppose that V=(V1,V2) is a random vector

having density function f0(v1,v2) (v1 real, v2>0), where f0 is defined by

) 1 ,

( ) , (

1 )

, | , (

+ − Β − Β =

s m s r s r y

y

f r s µσ

1 1[ ( | , ) ( | , )] )]

, | (

[ − − −

× s r

r s

s

r F y F y

y

F µσ µσ µσ

) , | ( ) , | ( )] , | ( 1

[ µσ r µσ s µσ

s m

s f y f y

y

F

× , (28)

and let H0 be the distribution function of W=V1/V2, i.e., the probability density function of W is given by

( ) 0( 2, 2) 2. 0

2

0 w v f wv v dv

h

= (29)

Then the confidence coefficient associated with the optimum prediction interval d* = (d1,d2), where d1 = (1+η1)Yr, d2 =

(1+η2)Yr, is

} , | :

{

Pr dd1<Xr <d2 µσ

)]. / ( [ )] / 1 (

[ 1 1

0 0 2 1

0

0 Q c c H Q c c

H − − − −

= (30)

Proof. For the proof we refer to Theorem 3. 

The way in which (26) (or (30)) varies with c, c1 and c2, and the fact that c1 and c2 are the factors of proportionality associated with losses from overshooting and undershooting relative to loss involved in increasing the length of interval, provides an interesting interpretation of confidence interval prediction.

IV. NEW-SAMPLE PREDICTION PROBLEM

For new-sample prediction, data from a past sample are used to make predictions on a future unit or sample of units from the same process or population. For example, based on previous (possibly censored) life test data, one could be interested in predicting the time to failure of a new item, time until l failures in a future sample of m units, or number of failures by time tin a future sample of m units.

A. Location-Scale Family of Density Functions

Consider a situation described by a location-scale family of density functions, indexed by the vector parameter

θθθθ=(µ,σ), where µ and σ(>0) are respectively parameters of location and scale. For this family, invariant under the group of positive linear transformations: xax+b with a>0, we shall assume that there is obtainable from some informative experiment (the first k order statistics X1<X2<⋅⋅⋅ <Xk from a

previous random sample of size n) a sufficient statistic (Mk,Sk) (or a maximum likelihood estimator (µ)k,σ)k)) for

(µ,σ) based on X=(X1, …, Xk) with density function

] / , / ) [( )

, | ,

(mk sk µσ σ 2p0 mk µ σ sk σ

p = − −

−∞<mk <∞, 0<sk <∞, −∞<µ>∞, σ >0.

(31) We are thus assuming that for the family of density functions an induced invariance holds under the group G of transformations: mkamk+b, skask or µk

)

k

aµ) +b,

k

σ) →aσk

) (a>0). The family of density functions satisfying the above conditions is, of course, the limited one of normal, negative exponential, Weibull and gamma (with known index) density functions. The structure of the problem is, however, more clearly seen within the general framework.

Let Ys be an independent future observation (sth order

statistic) from a new sample. If Ys is invariantly predictable

then W=(YsMk)/Sk (or W= Ys µk σk

) ) )/

( − ) is a maximal invariant pivotal, conditional on X.

B. Piecewise-Linear Loss Function

We shall consider the interval prediction problem for the sth order statistic Ys, 1 ≤s m, in a future sample of size m

for the situation where the first k observations X1 < X2 < ⋅⋅⋅< Xk, 1≤k<n, from a past sample of size n have been observed.

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(6)

Suppose that we deal with the piecewise-linear loss function (2).

The decision problem specified by the informative experiment density function (31) and the loss function (2) is invariant under the group G of transformations. Thus, the problem is to find the optimal interval predictor of Ys,

d argmin ( ,d),

d θθθθ

o R

D

= (32)

where D is a set of invariant interval predictors of Ys, )

, (θθθθd

o

R =Eθθθθ{r(θθθθ,d)} is a risk function. C. Transformation of the Loss Function

It follows from (2) that the invariant loss function, r(θθθθ,d), can be transformed as follows:

), , ( ) ,

(θθθθd &r&Vo ηηηηo

r = (33)

where

    

> −

+ −

≤ ≤ −

< +

+ − =

), (

)

( ) (

), (

)

(

), (

)

-( ) (

) , (

2 2 1 2

1 2 2 2 1 2

2 2 1 2 1 2

1 2

2 1 1 2

1 2 2 1 1 1

o o o o

o o o o o

o o o o o o

o o

o o o o

o o o o o

o o

& &

V V V

c V V c

V V V V

c

V V V

c V V c r

η

η

η

η

η

η

η

η

η

η

η

η

ηηηη

V

(34)

), , ( 1o 2o o = V V

V V1o=(YsMk)/σ, V2o =Sk/σ;

), , ( 1o 2o o= η η

ηηηη η1o=(d1Mk)/Sk, η2o=(d2Mk)/Sk.

(35) D. Risk Function

It follows from (34) that the risk associated with d and θθθθ can be expressed as

{

( , )

}

{

( , )

}

) ,

( o o

o θθθθ θθθθ && ηηηη

θθθθ d V

d E r E r

R = =

∫ ∫

∞ −

+ − =

0

2 1 2 1 2 1 1 1

2 1

) , ( ) (

o o

o o o o o o o o

v

dv dv v v f v v c

η

η

∫ ∫

∞ ∞

− +

0

2 1 2 1 2 2 1 2

2 2

) , ( ) (

o o

o o o o o o o o

v

dv dv v v f v v c

η

η

( ) ( , ) ,

0

2 1 2 1 2 1

2

∫ ∫

∞ ∞

∞ −

+cηo ηo vofo vo vo dvodvo (36)

which is constant on orbits when an invariant predictor (decision rule) d is used, where fo(v1o,v2o) is defined by the joint probability density of the first k observations X1 < X2 <

⋅⋅⋅< Xk from the past random sample of size n and the sth

order statistic Ys in the future sample of size m,

)! ( )! 1 (

! )!

( ! ) , | , ..., , ,

(x1 x2 x y nnk s mm s

f k s

− − − =

σ µ

k n k

k

i

i F x

x

f

=

×

( | , [)1 ( | , )]

1

σ µ σ

µ

). , | ( )] , | ( 1 [ )] , | (

[ µσ 1 µσ µσ

s s m s

s

s F y f y

y

F − − −

× (37)

E. Risk Minimization and Invariant Prediction Rules The following theorem gives the central result in this section.

Theorem 5 (Optimal invariant predictor of Ys based on

X). Suppose that (u1, u2) is a random vector having density function

), 0 real, ( )

, ( )

,

( 1 2

1

0

2 1 2 1 2 2 1

2 >

   

  

∞ ∞ −

∞ −

∫ ∫

u f u u dudu u u u

u f

u o o

(38)

where fo is defined by ( , ) 2 1o o o v v

f , and let Qo be the probability distribution function of u1/u2.

(i) If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on X is d*=(Mk1oSk,

Mk2oSk), where

Qo(η1o)=c/c1, Qo(η2o)=1−c/c2. (39)

(ii) If c/c1+c/c2≥1 then the optimal invariant linear-loss interval predictor of Ys based on X degenerates into a point

predictor Mk+ηoSk, where

). /( )

( c2 c1 c2

Qo ηo = + (40)

Proof. For the proof we refer to Theorem 1. 

Corollary 5.1 (Minimum risk of the optimal invariant predictor of Ysbased on X).The minimum risk is given by

{

( , )

} {

( , )

}

)

,

( o o

o θθθθ θθθθ && ηηηη

θθθθ d V

d E r Er

R ∗ = ∗ =

∫ ∫

∫ ∫

∞ ∞

∞ −

+ −

=

0

2 1 2 1 1 2 2 1 0

2 1 1 1

2 2 2

1

) , ( )

, (

o o o

o

o o o o o o o

o o o o o

v v

dv dv v v f v c dv dv v v f v c

η η

(41)

for case (i) with ηηηηo=(η1o,η2o) as given by (39) and for case (ii) with ηo=ηo =η

2

1 as given by (40).

Proof. For the proof we refer to Corollary 1.1. 

Theorem 6 (Equivalent confidence coefficient for dbased onX). Suppose thatVo=(V1o,V2o), is a random vector having density function fo(v1o,v2o) (v1o real, vo2> 0) where

o

f is defined by (38) and let Ho be the distribution function of Wo=V1o/V2o, i.e., the probability density function of Wo is given by

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(7)

( ) ( 2, 2) 2.

0

2o o o o o o o

o w v f wv v dv

h

= (42)

Then the confidence coefficient associated with the optimum prediction interval d*=(d1,d2), where d1=Mk+η1oSk,

d2=Mk2oSk, is

} , | :

{

Pr dd1<Ys<d2 µσ

=Ho[Qo( −1)(1−c/c2)]−Ho[Qo( −1)(c/c1)]. (43)

Proof. For the proof we refer to Theorem 3.  V. NEW-WITHIN-SAMPLE PREDICTION PROBLEM For new-within-sample prediction, the problem is to predict future events in a sample or process based on early data from that sample or process as well as on a previous data sample from the same process or population. For example, if m units are followed until trand there are the r

observed failures, Y1, …, Yr, from a current experiment as

well as the k observed failures, X1, …, Xk, from a previous

data sample (possibly censored), one could be interested in predicting the time of the next failure Yr+1; time until l

additional failures, Yr+l; number of additional failures in a

future interval (t1,t2).

A. Location-Scale Family of Density Functions

Consider a situation described by a location-scale family of density functions, indexed by the vector parameter

θθθθ=(µ,σ), where µ and σ(>0) are respectively parameters of location and scale. For this family, invariant under the group of positive linear transformations: xax+b with a>0, we shall assume that there is obtainable from both some current informative experiment (the first r order statistics Y1<Y2<⋅⋅⋅ <Yr from a random sample of size m) and the k observed

failures, X1, …, Xk, from a previous censored data sample of

size n a sufficient statistic (M+,S+) (or a maximum likelihood estimator (µ)+,σ)+)) for (µ,σ) based on Y=(Y

1,…, Yr) and

X=(X1,…, Xk) with density function

] / , / ) [( )

, | ,

(m+ s+ µσ =σ−2p0 m+−µ σ s+ σ p

−∞<m+<∞, 0<s+<∞, −∞<µ>∞, σ >0.

(44)

We are thus assuming that for the family of density functions an induced invariance holds under the group G of transformations: m+→am++b, s+→as+ or µ)+→aµ)++b,

+

σ) → σ)+

a (a>0). The family of density functions satisfying the above conditions is, of course, the limited one of normal, negative exponential, Weibull and gamma (with known index) density functions. The structure of the problem is, however, more clearly seen within the general framework.

Let Ys be a future observation (sth order statistic) from the

same sample of size m. If Ys is invariantly predictable then

+ + += Y M S

W ( s )/ (or W+=(Ys−µ)+)/σ)+) is a maximal

invariant pivotal, conditional on (X,Y).

B. Piecewise-Linear Loss Function

We shall consider the interval prediction problem for the sth order statistic Ys, 1 ≤s m, in the same sample of size m

for the situation where both the first k observations X1 < X2 <

⋅⋅⋅< Xk, 1≤k<n, from the previous data sample of size n and

the early observed r order statistics Y1<Y2< ⋅⋅⋅ <Yr from a

random sample of size m are available. Suppose that we deal with the piecewise-linear loss function (2).

The decision problem specified by the informative experiment density function (31) and the loss function (2) is invariant under the group G of transformations. Thus, the problem is to find the optimal interval predictor of Ys,

d argmin ( ,d),

d θθθθ

+ ∈

= R

D (45)

where D is a set of invariant interval predictors of Ys,

d , (θθθθ

+

R =Eθθθθ{r(θθθθ,d)} is a risk function. C. Transformation of the Loss Function

It follows from (2) that the invariant loss function, r(θθθθ,d), can be transformed as follows:

), , ( ) ,

(θθθθd =r V+ ηηηη+

r && (46)

where

) , (V+ ηηηη+

r& &

    

> −

+ −

≤ ≤ −

< +

+ − =

+ + + +

+ + + + +

+ + + + + +

+ +

+ + + +

+ + + + +

), (

)

( ) (

), (

)

(

), (

)

-( ) (

2 2 1 2

1 2 2 2 1 2

2 2 1 2 1 2

1 2

2 1 1 2

1 2 2 1 1 1

V V V

c V V c

V V V V

c

V V V

c V V c

η η

η η

η η

η η

η η

η η

(47)

), , ( 1+ 2+ += V V

V V1+=(YsM+)/σ, V2+=S+/σ;

), , ( 1+ 2+ +=

η

η

ηηηη

η

1+=(d1M+)/S+, ( )/ .

2

2+= dM+ S+

η

(48) D. Risk Function

It follows from (47) that the risk associated with d and θθθθ can be expressed as

{

( , )

}

{

( , )

}

) ,

( + +

+ θθθθ = θθθθ = ηηηη

θθθθ d V

d E r Er

R &&

∫ ∫

∞ −

+ + + + + + + +

+ +

+ − =

0

2 1 2 1 2 1 1 1

2 1

) , ( ) (

v

dv dv v v f v v c

η

η

∫ ∫

∞ ∞

+ + + + + + + +

+ +

− +

0

2 1 2 1 2 2 1 2

2 2

) , ( ) (

v

dv dv v v f v v c

η

η

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(8)

( ) ( , ) ,

0

2 1 2 1 2 1

2

∫ ∫

∞ ∞

∞ −

+ + + + + + + +

+cη η v f v v dv dv (49)

which is constant on orbits when an invariant predictor (decision rule) d is used, where f+(v1+,v2+) is defined by the joint probability density of the first k observations X1 < X2 < ⋅⋅⋅< Xk from the previous random sample of size n, the

early observed r order statistics Y1<Y2<⋅⋅⋅ <Yr from a random

sample of size m, and the future sth order statistic Ys (s>r) in

the same sample of size m,

) , | , ..., , , ..., , ,

(x1 x2 xk y1 yr ys µσ

f

)! ( )! 1 (

! )!

( !

s m r s

m k

n n

− − − − =

k n k

k

i

i F x

x

f

=

×

( | , [)1 ( | , )] 1

σ µ σ

µ

s m r

r s r

s F y F y

y

F − −− − −

×[ ( |µ,σ) ( |µ,σ)] 1[1 ( |µ,σ)]

( | , ) ( | , ).

1

σ µ σ

µ s

r

i

i f y

y f

=

× (50)

E. Risk Minimization and Invariant Prediction Rules The following theorem gives the central result in this section.

Theorem 7 (Optimal invariant predictor of Ys based on

(X,Y)). Suppose that (u1, u2) is a random vector having density function

), 0 real, ( )

, ( )

,

( 1 2

1

0

2 1 2 1 2 2 1

2  >

 

  

∞ ∞ −

∞ −

+

+ u u

∫ ∫

u f u u dudu u u

f u

(51)

where f+ is defined by ( , )

2 1+ +

+ v v

f , and let Q+ be the probability distribution function of u1/u2.

(i) If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on (X,Y) is d*= (M++η1+S+,

+ + ++ S

M η2 ), where

. / 1 ) ( , / )

( 1 c c1 Q 2 c c2

Q+ η+ = +η+ = − (52)

(ii) If c/c1+c/c2≥1 then the optimal invariant linear-loss interval predictor of Ys based on (X,Y) degenerates into a

point predictorM++η+S+, where

). /( )

( c2 c1 c2

Q+η+ = + (53)

Proof. For the proof we refer to Theorem 1. 

Corollary 5.1 (Minimum risk of the optimal invariant predictor of Ysbased on (X,Y)).The minimum risk is given

by

{

( , )

} {

( , )

}

)

,

( ∗ ∗ + +

+ θθθθ = θθθθ = ηηηη

θθθθ d V

d E r Er

R &&

∫ ∫

∫ ∫

+ + ∞ ∞ + + + + + + ∞

∞ −

+ + + +

+ + +

+

+ −

=

0

2 1 2 1 1 2 2 1 0

2 1 1 1

2 2 2

1

) , ( )

, (

v v

dv dv v v f v c dv dv v v f v c

η η

(54)

for case (i) with ηηηη+=(η1+,η2+) as given by (52) and for case (ii) with η+=η+ =η

2

1 as given by (53).

Proof. For the proof we refer to Corollary 1.1. 

Theorem 8 (Equivalent confidence coefficient for dbased on (X,Y)). Suppose thatV+=(V1+,V2+), is a random vector having the probability density function f+(v1+,v2+) (v1+ real, v2+> 0), where f+ is defined by (50), and let H+ be the probability distribution function of += + +

2 1 /V V

W ,

where the probability density function of W+ is given by

( ) ( 2, 2) 2.

0

2 + + + + +

∞ + +

+ w =

v f w v v dv

h (55)

Then the confidence coefficient associated with the optimum prediction interval d*=(d1,d2), where d1=M++η1+S+, d2=M++η2+S+, is

} , | :

{

Pr dd1<Ys<d2 µσ

[ (1 / )] [ ( 1)( / 1)].

2 )

1

( c c H Q c c

Q

H+ + − − − + + −

= (56)

Proof. For the proof we refer to Theorem 3. 

VI. EXAMPLES

A. Within-Sample Prediction of Order Statistic

Exponential distribution. Let Y1< Y2< ⋅⋅⋅ <Ym be order

statistics of size m from the exponential distribution with the density

. 0 , 0 , exp 1 ) |

(  > >

     −

= σ

σ σ

σ y y

y

f (57)

Consider the prediction problem of Ys for the situation

where the first r observations Y1< Y2< ⋅⋅⋅ <Yr, 1≤r < s m,

have been observed. We are now concerned with optimization of the prediction interval for Ys under the loss

function (2).

Let Y=(Y1, …, Yr) and Ys > Yr for s m. Then the joint

probability density function of Y and Ys is given by

)! ( )! 1 (

! )

| , ..., , ( 1

s m r s

m y

y y

f r s

− − − = σ

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(9)

s m r r s r

s F y F x

y

F − −− − −

×[ ( |σ) ( |σ)] 1[1 ( |σ)]

=

× r

i

s i f y

y f 1 ) | ( ) |

( σ σ

              − + − − − − =

= + σ σ r i r i r y r m y s m r s m 1 1 ) ( exp 1 )! ( )! 1 ( ! . exp exp 1 1

1 −+

− −             −             − − × s m r s r s r

s y y y

y

σ

σ (58)

Let

σ r

s Y

Y

V1= − (59)

and

.

) (

1

2 σ σ

= − + = = r i r i r Y r m Y S V (60)

Using the invariant embedding technique [14-23], we then find in a straightforward manner that the joint density of V1, V2 is

), ( ) ( ) ,

(v1 v2 f1 v1 f2 v2

f = (61)

where ) 1 , ( ] [ ] 1 [ )

( 1 1 1

1 1 1 + − − Β − = − −− − −+ s m r s e e v f s m v r s v , ) 1 ( 1 ) 1 , (

1 1 ( 1 )

0

1m s j

v j r s j e j r s s m r s + + − − − − = −       − − + − − Β =

v1>0, (62) and , ) ( 1 )

( 1 2

2 2

2 vr e v

r v

f − −

Γ

= v2>0. (63)

It follows from (15) and (61) that

2 2 2 2 1 0 2 2 0 0 2 1 2 1 2 0 2 2 2 2 2 ) ( ) ( 1 ) , ( ) , ( )

( v f wv f v dv

r dv dv v v f v dv v wv f v w q

∫ ∫

∞ ∞ ∞ ∞ = = ) 1 , ( 1 + − − Β + = s m r s r . )] 1 ( 1 [ 1 ) 1 ( 1 2 1 0 + − − = + − + + −      

×

j r

r s

j j wm s j

r s

(64)

It follows from (25) and (61) that

2 2 2 2 1 0 2 2 2 2 0

2 ( , ) ( ) ( )

)

(w v f wv v dv v f wv f v dv

h

∞ ∞ = = ) 1 , ( − − + Β = s m r s r . )] 1 ( 1 [ 1 ) 1 ( 1 1 1 0 + − − = + − + + −      

×

j r

r s

j j wm s j

r s

(65)

If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on Y is given by

d*=(Yr+η1Sr, Yr+η2Sr), (66)

where         =

=

1

0 1

1 arg ( )

η η c c dw w q (67) and . 1 ) ( arg 2 0 2

2

      − = =

η η c c dw w q (68)

The confidence coefficient associated with the optimum prediction interval d*=(d1,d2), where d1=Yr+η1Sr,

d2=Yr+η2Sr, is given by

} , | :

{

Pr dd1<Ys<d2 µσ

= − = 2 1 . ) ( ] [ ]

[ 2 1

η

η

η

η H hw dw

H (69)

B. New-Sample Prediction of Order Statistic

Exponential distribution. Consider the problem of prediction of the sth order statistic Ys, 1 ≤s m, in a new

(future) sample of size m from the exponential distribution with the probability density function (57) for the situation where the first k observations X1 < X2 < ⋅⋅⋅< Xk, 1≤kn, from

the previous data sample of size n have been observed. We are now concerned with optimization of the prediction interval for Ys under the loss function (2).

Let X=(X1, X2, …, Xk) for k n. Then the joint probability

density function of X1, X2, …, Xk is given by

= − − − = k i k n k i

k f x F x

k n n x x x f 1 2

1 , ,..., | ) ( ! )! ( | )[1 ( | )]

( σ σ σ

              − + − − =

= σ σ k i k i k x k n x k n

n 1 ( )

exp 1 )! (

! . (70)

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(10)

The probability density function of the sth order statistic Ys

in the future sample of size m is given by

)! ( )! 1 ( ! ) | ( s m s m y f s − − = σ ) | ( )] | ( 1 [ )] | (

[ σ 1 σ σ

s s m s s

s F y f y

y

F − − −

× . 1 ) ( exp ) 1 ( ) 1 , ( 1

0 σ σ

      + − −       + − Β =

− = s j s m j y s j j s m s m

s (71)

Let

σs Y

V1o = (72)

and

. ) (

1

2 σ σ

= − + = = k i k i k X k n X S

Vo (73)

Using the invariant embedding technique [14-23], we then find in a straightforward manner that the joint density of V1, V2 is

), ( ) ( ) ,

( 1o 2o 1o 1o 2o 2o o v v f v f v

f = (74)

where

(

( )

)

, exp ) 1 ( ) 1 , ( 1 ) ( 1 0 1

1o o j svo

j s m s m s v f j s m j + − −       + − Β =

− =

v1o>0, (75) and , ) ( ) ( 1 )

( 1 2

2 2 2 o o o

o k v

e v k v

f − −

Γ

= v2o >0. (76)

It follows from (15) and (74) that

∫∫

∞ ∞ ∞ = 0 0 2 1 2 1 2 0 2 2 2 2 2 ) , ( ) , ( ) ( ) ( o o o o o o o o o o o o o o dv dv v v f v dv v v w f v w q o o o o o o o 2 2 2 2 1 0 2

2) ( ) ( )

( 1 dv v f v w f v k

∞ = . )] ( 1 [ 1 ) 1 ( ) 1 , ( 1 2 0 + − = + + −       + − Β +

=

m s j k

j j w s j

s m s m s k o (77) It follows from (25) and (74) that

o o o o o o o o o o o o o 2 2 2 2 1 0 2 2 2 2 0

2 ( , ) ( ) ( )

)

(w v f wv v dv v f wv f v dv

h

∞ ∞ = = . )] ( 1 [ 1 ) 1 ( ) 1 , ( 1 0 + − = + + −       + − Β

=

m s j k

j j w s j

s m s m s k o (78) If c/c1+c/c2<1 then the optimal invariant linear-loss interval predictor of Ys based on X is given by

d*=(Xk1oSk, Xk2oSk), (79)

where           = =

o o o o o 1 0 1

1 arg ( )

η η c c dw w

q (80)

and . 1 ) ( arg 2 0 2

2 

       − = =

o o o o o η η c c dw w

q (81)

The confidence coefficient associated with the optimum prediction interval d*=(d1,d2), where d1=Xk1oSk,

d2=Xk+η2oSk, is given by

} | :

{

Pr dd1<Ys <d2 σ

= − = o o o o o o o o o 2 1 . ) ( ] [ ]

[ 2 1

η

η

η

η H h w dw

H (82)

C. New-Within-Sample Prediction of Order Statistic Exponential distribution. Consider the prediction problem of the sth order statistic Ys, 1 ≤ s m, in a new sample of

size m from the exponential distribution with the probability density function (57) for the situation where both the first r observations Y1< Y2< ⋅⋅⋅ <Yr, 1≤r < s m, from the new data

sample and the first k observations X1 < X2 < ⋅⋅⋅< Xk, 1≤kn,

from the previous data sample of size n have been observed. We are now in need to optimize the prediction interval for Ys

under the loss function (2).

Let X=(X1, X2, …, Xk) for k n. Then the joint probability

density function of X1, X2, …, Xk is given by

= − − − = k i k n k i

k f x F x

k n n x x x f 1 2

1 , ,..., | ) ( ! )! ( | )[1 ( | )]

( σ σ σ

              − + − − =

= σ σ k i k i k x k n x k n

n 1 ( )

exp 1 )! (

! . (83)

Let Y=(Y1, …, Yr) and Ys > Yr for s m. Then the joint

probability density function of Y and Ys is given by

)! ( )! 1 ( ! ) | , ..., , ( 1 s m r s m y y y

f r s

− − − = σ

IAENG International Journal of Applied Mathematics, 42:1, IJAM_42_1_06

(11)

s m r r s r

s F y F x

y

F − −− − −

×[ ( |σ) ( |σ)] 1[1 ( |σ)]

=

× r

i

s i f y

y f 1 ) | ( ) |

( σ σ

              − + − − − − =

= + σ σ r i r i r y r m y s m r s m 1 1 ) ( exp 1 )! ( )! 1 ( ! . exp exp 1 1

1 −+

− −             −             − − × s m r s r s r

s y y y

y

σ

σ (84)

Let

σ r

s Y

Y

V1+= − (85)

and . ) ( ) ( 1 1

2 σ σ

= = + + = = + − + + − r i r i k i k

i n k X Y m rY

X S

V (86)

Using the invariant embedding technique [14-23], we then find in a straightforward manner that the joint density of

+

1

V ,V2+ is

), ( ) ( ) ,

( 1+ 2+ 1+ 1+ 2+ 2+

+ v v = f v f v

f (87)

where ) 1 , (

References

Related documents

A previous study has obtained 22 time-course transcriptomes of leaf development, starting from dry seeds to hour 192 post imbibition, including gene expression profiles

Returning to our ATM example, we should first create test cases for verifying the behaviour in each of the normalized scenarios (and the corresponding state machine diagrams) is

The results confirmed the asserted hypotheses by showing that narrower optimum grip-span and lower performance in the O’Connor test, stabilimeter test, and

This paper use eye-track to test fixation time (fixation duration) and regressive eye movement, and collect user’s operation time, completion rate, operation error and

1) Preventive maintenance should include periodic, predictive and planned maintenance activities performed prior to failure of an SSC so as to maintain this service life

In this case, the offshore facility under studied had relevant and systematically recorded maintenance data based on the calendar time which consist of detailed failure

12 shows plots for the disparities measured and calculated traveled distance for the test on sugar beet sawed on uneven soil... Example of measured translation, disparity

The failure symptoms identification approach proposed in Chapter 4 is here adapted for selecting the best variables for predicting failures. Specifically, it is divided in