Bayes Analysis of a Three-Parameter Pareto Distribution via Sample Based Approaches

(1)

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=umms20

American Journal of Mathematical and Management Sciences

ISSN: 0196-6324 (Print) 2325-8454 (Online) Journal homepage: https://www.tandfonline.com/loi/umms20

Bayes Analysis of a Three-Parameter Pareto Distribution via Sample Based Approaches

S.K. Upadhyay, M. Peshwani, I.A. Javed, S. Tripathi & Rajeev Pandey

To cite this article: S.K. Upadhyay, M. Peshwani, I.A. Javed, S. Tripathi & Rajeev Pandey (2007) Bayes Analysis of a Three-Parameter Pareto Distribution via Sample Based Approaches, American Journal of Mathematical and Management Sciences, 27:1-2, 213-242, DOI:

10.1080/01966324.2007.10737698

To link to this article: https://doi.org/10.1080/01966324.2007.10737698

Published online: 14 Aug 2013.

Submit your article to this journal

Article views: 31

View related articles

(2)

BAYES ANALYSIS OF A THREE-PARAMETER PARETO DISTRIBUTION VIA SAMPLE BASED APPROACHES

S.K. Upadhyai, M. Peshwani², I.A. Javed¹, S. Tripathi³, & Rajeev Pande/

'Department of Statistics, Banaras Hindu University, Varanasi, India

2ICFAI National College, Varanasi,India

'Department of Statistics, MMV, Banaras Hindu University, Varanasi, India

4Department of Statistics, Kurnaun University Campus, Almora, India

SYNOPTIC ABSTRACT

This paper considers an important but less explored model, namely the three- parameter Pareto distribution that, besides having applications in many important areas, has been advocated in the context of failure time data analysis. The analysis has been done using both the Gibbs sampler vs. the Metropolis algorithm. and it has been shown that the Metropolis algorithm offers significant improvement over the Gibbs sampler. The extension of the two algorithms for censored situations is also given. For numerical illustration, we have considered both the real and the simulated data sets from the model. In the second part of the study, we illustrate the assumed model on the basis of a real data set using the tool kits of Bayesian predictive simulation ideas.

Key words: failure time; Markov chain Monte Carlo procedure; Gibbs sampler;

Metropolis algorithm; Bayesian predictive simulation; censored data.

2007, VOL. 27, NOS. I & 2, 213-242 0196-6324f07f010213-30 $35.00

213

(3)

214 UPADHYAY, PESHWANI, JAVED, TRJPATHI & PANDEY

1. INTRODUCTION

The exponential, Weibull, gamma, & lognormal are some of the most common distributions in the life testing and reliability studies. In this paper, a recent but comparatively less explored model, namely the three-parameter form of the Pareto distribution is considered; it was advocated in the life testing and reliability context by Abdei-Ghaly, Attia, and Aly (1998), among others. Before considering the distribution in detail, let us begin with the simplest member of the Pareto family which is known as the Pareto type II or Lomax distribution with hazard rate and cumulative distribution function (edt) given, respectively, by

h(x}=~

9+x (1)

(2) The Pareto distribution is reversed J-shaped and a positively skewed distribution with decreasing hazard rate. Although the distribution was originally applied in context of analyzing certain socio-economic and naturally applied phenomena with observations in very long tail, it was pointed out by Arnold (1983) that the distribution has good potential for modeling reliability and failure time data as well. The distribution might be a suitable life span model in situations where product or system developments result in an improved performance as the development proceeds. Davis and Feldstein (1979) have also discussed situations where Pareto distribution can be used extensively as a model for failure time or life time data studies.

The three-parameter Pareto distribution is one possible extension of model (2).

A random variable X is said to have the three-parameter Pareto distribution if its probability density function (pdf) is given by

( ) aea

f xja,e,A

= ( )

_{1 ;}x ~A, a,B,J >

o

9+ X -A a+ (3)

(4)

where a and

e

are the shape and scale parameters of the distribution, respectively,

& the parameter A determines the threshold of the distribution. Noted that if a

random variable X follows the two-parameter exponential distribution with scale parameter ~ and threshold parameter A, and if the parameter

P

follows the gamma density with shape a and scale

e,

the Pareto distribution can be obtained by integrating the product of the two densities with respect to

P

(see, for example, Abdel-Ghaly, Atti~ and Aly (1998), Carpenter and Hebert (1994)).

The mean time to failure, reliability and the hazard rate for the three- parameter Pareto distribution model (3) are

MTF =A+--,

e

a> 1 a-1

ea

R(x)= (e+x-A.r,

h(x)= a , 8+x-A

(4)

(5)

(6)

respectively, and the cdf of the distribution is simply 1- R( x). Obviously, the hazard rate is decreasing for all x ~ A. and approaches to zero as t tends to infinity.

Noted that in bathtub shape, the initial failure region is characterized by decreasing hazard rate, which in turn, represents early failures. The Pareto distribution is, therefore, natural in situations where failure may occur due to initial weaknesses or defects, poor insulation or bad assembling, etc.

The classical inferences for the three-parameter form of the model were considered by Abdel-Ghaly, Attia, and Aly (1998), who considered maximum likelihood estimators of the Pareto parameters and among a number of other things obtained the prediction of model parameters under the normal use or design condition when an accelerated life test experiment was conducted to induce early failures of highly reliable device~. Most of their results were based on the experiment involving type II censored data.

As regards Bayesian inference, for some references to work on Pareto type II distributions see Lwin (1972), Arnold and Press (1983, 1986, 1989), Geiser ( 1984, 1985) and Singh ( 1992), etc. However, we do not know of any references

(5)

216 UPADHY AY, PESHWANI, JAVED, TRIPATHI & PANDEY

on Bayesian inference related to the three-parameter form of the model. It seems as if, like other complicated distributions here also, the density given in (3) is not that straightforward and the situation appears to be slightly unmanageable if the data available are compounded by censoring mechanism.

The plan of the present paper is: The details of the Gibbs sampler implementation for the three-parameter form of the Pareto model are provided in the Section 2. Section 2 also describes briefly implementation details for censored data problems. Noted that we have considered in this paper the scheme involving right type II censoring only although other schemes can be similarly tackled. Section 3 provides numerical illustration of the Gibbs sampler algorithm based on a simulated data set from the three-parameter model. Section 4 considers another Markov chain Monte Carlo procedure known as the Metropolis algorithm for extracting samples from the typically available posteriors. The implementation details of the algorithm for the Pareto model are also provided in this section. In Section 5, illustration of Metropolis algorithm is given on the basis of real data. Finally, Section 6 provides some recently developed concepts for validating or checking a given model in a Bayesian framework. The idea is based on predictive simulation techniques, which utilize the final output from Markov Chain Monte Carlo to determine whether the model fitting is supported by the data in hand. The numerical results for the model validation task based on real data are provided in Section 7.

2. GIBBS SAMPLER AND ITS IMPLEMENTATION

The Gibbs sampler algorithm is a simulation tool for obtaining samples from a non-normalized joint density function. Gelfand and Smith (1990), and Gelfand, Smith & Lee (1992) offered the Gibbs sampler as a general approach for fitting complex statistical models in a surprisingly very short period. First consider the case of complete sampling. Let n items with failure time distribution given in (3) be subjected to testing and let ~

=

(x₁, ... ,x

0

) be the observed failure

(6)

times. Naturally, the failure times occur in increasing order. The likelihood function is

(7)

If we now consider the parameters to be random with independent priors given as g1

(a)ac U(O,M)

g₂

(9)ac

IG(c,d) (8)

g₃

(A.) ac

a constant

where M, c and d are the prior hyper-parameters, the joint posterior for a, 9 and

A.

can be obtained, up to proportionality, as a ne(na-d-1)

p(a,

e, A.l.!) ac

n

e

-c/8 .

n ^~9+x; ^-A.r'}

i=l

(9)

The posterior (9) can be used to specify the different full conditional forms required for the implementation of the Gibbs sampler algorithm. These full conditionals can be written, up to proportionality, as

(10)

ia:l

(II)

(12)

From the full conditional forms given in (10)-(12), it can be easily shown that the form given in (10) is a gamma density with the scale parameter

y[

^{-nloge +}

^~log(e

^+X;-

^A.)]

and shape parameter (n+l). Thus, samples of a can be easily generated using any standard gamma generating routine (see,

(7)

218 UPADHYAY, PESHWANI, JAVED, TRIPATHI & PANDEY

for example, Devroye (1986), Ripley (1987), etc.). (11) can be shown to be logconcave if we use the transformation ~ = log 8. So, 0 can be generated by the adaptive rejection sampling procedure of Gilks and Wild (1992). Finally, the full conditional form (12) can be handled using rejection method with a choice of envelope density g.("-} given below (see, for example, Devroy (1986)):

a8a(8+ X )a

g(A.}= {(8+xl)a

-8a}(8~xl

^-A.)a+l ⁽¹³⁾

Note that the form of the envelope density given in (13) was found to work well using a variety of data sets. Thus, all the full conditionals for the Gibbs sampler algorithm are 'available,' i.e. samples from these various forms can be easily and efficiently generated using appropriate techniques. Moreover, once these sample generation schemes are finalized, the implementation of Gibbs sampler becomes quite straightforward (see Upadhyay, Vasistha, and Smith (2001) for details of Gibbs sampler implementation).

The extension of the algorithm for censored data problems is routine especially when the concerned cdf is available in closed form. Even in the situation where the cdfis not available in closed form, this can be easily done although sometimes at the cost of slow generation from the truncated distributions corresponding to censored data. In this paper, our discussion will remain confined to right type II censoring and that too with a particular reference to the model under consideration. An almost similar discussion can, however, be presented for other censoring schemes. Noted that for the implementation of the Gibbs sampler in censored data problems, we have to work with the same full conditional forms, which are obtained for complete sampling case, except that additional unknowns are introduced corresponding to unobserved censored data. Analysts are only required to ascertain that full conditionals corresponding to these additional unknowns are available from the point of view of sample generation, that is, samples can be conveniently and straightforwardly drawn from these additional full conditionals (see Gelfand et al. (1992), Upadhyay and Smith (1993a, 1993c), Upadhyay et al. (2001), etc.). If this can be done for any particular censoring

(8)

scheme, the implementation of the algorithm becomes a routine task.

Thus in order to implement the scheme for the three-parameter Pareto model when only first r observations x~, ... , Xr are available and rest censored, we proceed with the full conditionals given in (1 0)-( 12). The additional full conditionals are nothing but the joint sampling distribution corresponding to (n-r) independent censored observations. The generation from the full conditionals ( 1 0)-(12) will be same as discussed earlier for the complete sampling case. The generation from the full conditionals corresponding to censored data can, however, be achieved as independent draws from the truncated three-parameter Pareto distribution.

Moreover, since the cdf corresponding to Pareto model is available in closed form, the censored observations form the truncated distribution can be easily achieved as 'one to one' corresponding to each generated uniform variate. This can be done using the formula (see also Upadhyay eta!. (2001)),

(14)

where j = r + 1, ... , n.

3. NUMERICAL ILLUSTRATION BASED ON SIMULATED DATA The numerical illustration presented in this section is based on a simulated sample of size n = 20 from the model (3) using a= 4.0,

e

= 10.0 and A.= 20.0.

These values are absolutely arbitrary and meant for illustration only. The simulated data set in ordered form is presented below in table 1 for a reference.

In order to provide the complete posterior analysis, the Gibbs sampler algorithm was run on the posterior given in (9) taking maximum likelihood estimators of the parameters (see, for example, Abdel-Ghaly et a!. (1998)) as the initial values for starting the chain. Although maximum likelihood estimators of the parameters were considered as the initial values for starting the chain, it was observed that even with some arbitrary choices of initial values, we do not loose

(9)

much except, sometimes, in slightly delayed convergence of the chain. In either case, we do not

Table 1: Simulated sample of size 20 from the three parameter Pareto distribution using a= 4.0,

e

= 10.0 and A= 20.0.

20.166 20.325 20.482 20.513 20.730

20.975 21.132 21.208 21.594 21.864

21.952 21.963 22.027 22.339 22.563

23.295 23.469 23.514 29.347 33.291

consider the choice of initial values as a major issue as we never intend to develop a most efficient Markov chain Monte Carlo procedure. We also considered a range of type II censored data assuming as if 20%, 40% and 60% of the highest observations are censored. Thus, 20%, 40% and 60% censored observations result, respectively, in 4, 8 and 12 largest observations as censored. Moreover, to run the Gibbs sampler algorithm in censored case, each censored observation was given an initial value exactly equal to Xr, the point of truncation for censored observations.

We considered a single long run of the chain and the convergence monitoring was done using ergodic averages. It was found at about 25,000 iterations for complete sampling case. For censored data cases, however, the convergence was slightly delayed, that is, something between 30,000 to 40,000 iterations depending on the number of observations being censored. Finally, after the convergence was assessed, we took equally spaced (every 1Oth) realization from the chain to form a random sample of size I 000 from the concerned joint posterior of a,

e

and A. This was done separately for complete as well as censored cases.

Marginal posterior density estimates for complete sampling and censored data cases are shown in figures 1-3 using boxplot representations. The boxplot representations have been given in order to minimize the space for presentation and also to provide a comparative scenario among the estimates corresponding to

(10)

complete and censored data. Some of the numerical values of various posterior characteristics are also shown (see Table 2) to provide an overall idea of the concerned posteriors although these characteristics are meant for illustration only.

One can similarly work for other desired characteristics too once a random sample from the corresponding posterior is made available. All these estimates are based on samples of size I 000 each.

0

10

0

!

.

^I^I_I

i

J_

---!- .

j_ ---+--

^I^I

I

I I

~ _..___

- ' - ^~

complete 20% 40% 60%

Figure 1: Boxplots showing the density estimates of a (based on Gibbs output).

(11)

222

0 C\1

0

UPADHYAY, PESHWANI, JAVED, TRIP A TID & PANDEY

I

_J

ⁱ^I

_j_

complete 20%

! _L

I

--+-

I

- - - ' - - ----'---

40% 60%

Figure 2: Boxplots showing the density estimates of9 (based on Gibbs output).

(12)

0 N

CX)

----,-- ----,-- ----,--

I _I _:

I _I

I I I I

---+-

^___.___ -+----^I --'----^I

complete 20% 40% 60%

Figure 3: Boxplots showing the density estimates of A.

(based on Gibbs output).

(13)

Table 2: Posterior characteristics based on Gibbs sampler.

Posterior Mean Median Mode Variance Minimum Maximum characteristics

Complete ^2.575 ^2.157 ^1.755 ^2.697 ^0.370 ^18.548 sampling

a. _20% _2.139 _1.729

1.613 2.325 0.409 17.242 Censoring

40% 1.464 1.239 1.017 0.863 0.317 10.748 60% 1.437 1.120 1.026 1.276 0.276 15.754 Complete ^4.866 ^3.838 3.163 13.143 1.001 34.813 9 sampling

Censoring ^20% ^4.113 ^3.045 ^3.355 ^13.453 ^1.000 ^51.826 40% 3.219 2.636 2.485 5.077 1.000 25.634 60% 3.529 2.682 2.498 7.882 1.002 31.183 Complete ^19.754 ^19.800 20.009 O.o78 18.657 20.166 sampling

A.

Censoring ^20% ^19.760 ¹⁹^.794 ^20.053 ^O.o78 ¹⁸^.774 ^20.166 40% 19.738 19.778 20.013 0.090 18.387 20.166 60% 19.699 19.730 20.039 0.094 18.624 20.166

Among the various findings that can be made obvious from the reported results, the most striking one is the skewed density in almost every case although it is not too far away from symmetry in a few cases (see table 2). Thus, in general, the results advocate for the use of modal value as the point estimate or highest density region as the interval estimate. Normally, it has been a practice to go for the posterior mean or the equal tail limits merely for the convenience and this practice should be avoided at least in such cases (see, for example, Upadhyay et al. (2001)). The other important finding is the posterior variability which is highest in case of

e

and lowest in case of/.... The density estimates corresponding to censored data are also shown in figures l-3. It can be seen that although the estimated numerical values are changing from one censoring fraction to another, the trend and the overall appearance of different posterior density estimates are more or Jess same. It, therefore, provides evidence that high censoring fractions do not result in much loss over low censoring fractions or even with complete

(14)

sampling case when the Pareto model is analyzed using Gibbs sampler algorithm.

The plots of bivariate density estimates are not shown although it was found that a posteriori the pairs (land

e

are highly positively correlated; the coefficients of correlation remain high for almost all censoring percentages considered here (see table 3). The pairs (a, A.) and (8, A.), however, appear to be very little correlated or almost uncorrelated a posteriori as the values were found to be quite close to zero in most of the considered cases.

Table 3: Estimated Posterior Correlation Coefficients based on Gibbs output (for simulated data set).

Pair of Complete Censoring

Variates 20% 40% 60%

a,e 0.892 0.908 0.846 0.828

a, A. ^-0.027 -0.051 -0.011 -0.041

e,A. ^-0.053 -0.078 -0.025 -0.110

4. METROPOLIS ALGORITHM AND ITS IMPLEMENTATION The Metropolis algorithm is a Markovian updating scheme for extracting samples from typically available posteriors, often specified up to proportionality. The algorithm has several advantages over the Gibbs sampler.

The most important is the possibility of introducing orthogonalization through reparameterization of the original parameter vector. This, in turn, reduces the serial correlation among the generating variates and, thereby, improves the convergence of the chain. The readers are referred to Metropolis et a!. (1953), Smith and Roberts (1993), Upadhyay eta!. (2001), Chen eta!. (2000), etc. for details about the Metropolis algorithm and other related issues. In this paper, we discuss its implementation for the model under consideration only.

Referring to Section 2, we have the joint posterior distribution given in (9). In order to apply the Metropolis algorithm, we first consider the transformations

q> 1 =In 8, <P2 =In a and q>3 =In [ _1-

}t

_A./¹_x ^{) ]}

1

(15)

226 UPADHYAY, PESHW ANI, JAVED, TRIPATHI & PANDEY

to change each component of original parameters (8, a, A.) to real space. Here X1 is the smallest observation and the new parameter vector is denoted by q> = (q>1,

<p_2,<p3). The joint posterior of q> may then be obtained from (9) using the inverse transformation 8 = exp (q>1), a= exp (q>z) and A.= [x1 exp (q>3)/ (l+exp (q>J))], where the Jacobian of transformation may be written as

IJI

=

exp(q>J + q>2 + q>J

xI

[1

+ exp(q>JY . (16)

We next assume I to be the Hessian-based approximation of the posterior covariance matrix of <p and consider the matrix L for which L TIL is diagonal.

Obviously, Lis an orthogonal matrix. We finally consider the transformation 'I'=

Lq> to get the posterior of 'I' and work on the components of 'I' by generating them from independent uniforms with ranges around the components of 'I' covering

± C times the Hessian-based estimates of the posterior standard deviations of the corresponding components of '1'. For the initial values of various components of '1', we may consider generating from independent uniforms in the same ranges around the components of initial estimate of '1', say 'I', where 'I' is the approximate posterior mean of '1'. Moreover, for getting the initial estimate of posterior mean 'I' and Hessian-based estimates of posterior standard deviations, we consider, as the simplest way, an initial run of the Gibbs chain for few iterations and correspondingly evaluate the approximate sample valued estimates of posterior mean and standard deviations of 'I' (see, for example, Vasishta (1998), Upadhyay et al. (2001)). The same set of transformations, which have been discussed earlier, are required for getting the estimates corresponding to '1'.

Sometimes, the approximate estimates based on maximum likelihood and the corresponding Hessian-based matrix can also be used in situations where such computations are feasible. Before we close this section, a brief comment on the choice of C. As mentioned earlier, there is no specific suggestion given in the literature on the choice of C. However, to choose it, a number of values between 0.5 and 1 may be arbitrarily assigned to it and the value, which provides most satisfactory response (in terms of convergence monitoring) to the generating chain, may be recommended (see Upadhyay et al. (2001)).

(16)

5. NUMERICAL ILLUSTRATION BASED ON REAL DATA

In this section, we consider a real data set that was initiaUy reported by Dyer (1981) and later on reanalyzed by Arnold and Press (1989). The data set reported in Table 4 concerns the aruiUal wages (in multiple of 100 dollars) of 30 production-line workers in a large industrial firm in the United States. Since the purpose of the illustration is simply to show the scope of the Metropolis algorithm in such a low dimensional problem, it is immaterial whether we consider data related to either failure times or even wages.

Table 4: Annual wage data (in multiple of 100 U.S. dollars).

112.0 154.0 119.0 108.0 112.0 156.0 123.0 103.0 115.0 107.0 125.0 119.0 128.0 132.0 107.0 151.0 103.0 104.0 116.0 140.0 108.0 105.0 158.0 104.0

119.0 111.0 101.0 157.0 112.0 115.0

- -

The Metropolis algorithm was applied on the posterior given in (9) as per the details given in Section 4 using a three-dimensional properly centered, scaled and oriented rectangular kernel as the candidate generating density q(9, 9'). A number of values were arbitrarily assigned to the scaling constant C and it was found that C

=

0.5 works reasonably well in the analysis of Pareto model. For the initial estimates of '¥ and the initial estimates of corresponding posterior standard deviations, we had run a Gibbs sampler chain for 2000 iterations and the sample valued estimates based on the Gibbs output were considered for the same. This was done to facilitate the implementation of Metropolis algorithm although initial estimates may be chosen in a variety of ways.

We once again considered a single long run of the Metropolis chain and the convergence monitoring was done using ergodic averages. As expected, the convergence was found quite early in comparison to one based on pure Gibbs

(17)

implementation. Once the convergence was assessed, we picked up 1000 equally spaced outcomes (every 1Oth). This was done to minimize serial correlation among the generating variates. Here we did not consider the case of censored data although the implementation can be easily done using the hybrid strategy based on Gibbs within the Metropolis steps. Readers are referred to Smith and Roberts (1993), Upadhyay and Smith (1993c) and Vasishta (1998), etc. for the details of the implementation of the Metropolis algorithm in censored data problems.

The results are shown in the form of marginal posterior density estimates drawn using histograms (see Figures 4-6). Sample based estimates of some important posterior characteristics are also provided in the Table 5. Most of the results are quite evident and can be discussed on the lines parallel to those given in Section 3. Few important results include the unimodal and skewed estimated densities for all the three variates with positive co-efficients of skewness in case of a and

e

and negative in case of A. The estimate in case of

e

shows highest variability with the values ranging from a minimum of 2.266 to a maximum of 213.210.

Table 5: Posterior characteristics for complete sampling case based on Metropolis algorithm.

Posterior a

e

^A

characteristics

Mean 2.336 35.584 98.423

Median 1.962 27.911 98.524

Mode 1.828 23.033 100.424

Variance 2.018 758.040 2.741

Minimum 0.440 2.266 92.725

Maximum 12.203 213.210 100.999

(18)

:;::.

"iii c Q)

0

10 c)

~ 0

("')

c)

N c)

...

c)

0 c)

0 2 4 6 8 10 12

Figure 4: Histogram showing the density estimates of

a

^for

complete case (based on Metropolis output).

14

(19)

230

0 N 0

c:i

10 ...

0

c:i

~ "iii

c:: 0

Q) ...

0 <::!

0

10 0 0

c:i

0 0 0

c:i

UPADHYAY, PESHWANI, JAVED, TRIPATHI & PANDEY

-

rh

0 50 100 150 200

Figure 5: Histogram showing the density estimates of8 for complete case (based on Metropolis output).

250

(20)

10 N 0

0 C'! 0

10

....

;::. 0

·u;

r:: Q)

Cl ₀

0

10 0

0

0 0

0

...--

-

...--

I

90 92 94 96 98 100

Figure 6: Histogram showing the density estimates of A. for complete case (based on Metropolis output).

102

Pairwise scatter plots are shown in Figures 7-9. These can be regarded as bivariate density estimates and show that a posteriori the pair (a, 8) are highly positively correlated, the estimated correlation coefficient is quite close to unity in this case (see also Table 6). On the other hand, the other two pairs of variates are almost uncorrelated with estimated coefficients of correlation, although showing negativity, being close to zero, a conclusion similar to what has been observed for simulated data set using Gibbs sampler algorithm.

(21)

232 UP ADHY A Y, PESHW ANI, JAVED, TRIP A THI & PANDEY

g_

N

0 0

0 - I

0

I

5 10

Figure 7: Scatter plot between the variates a and 8 for complete sampling case (based on Metropolis output).

I

15

(22)

N o -

~- .-~~~L

~ - {:.:.~~i,'.~,:t/.::'·· .

"_~-'::·;tl:::_.~~-.:.:~:

...

. :··..:.:.,···:

I

0

I

5

I

10

Figure 8: Scatter plot between the variates a and A. for complete sampling case (based on Metropolis output).

15

(23)

234 UPADHYA Y, PESHWANI, JAVED, TRIPATHI & PANDEY

0 50 100 150 200 250

Figure 9: Scatter plot between the variates 8 and A. for complete sampling case (based on Metropolis output).

Table 6: Estimated posterior correlations for complete sampling based on Metropolis algorithm.

Pair of Correlation Variates coeficients

a,e ^0.938

a, 'A -0.085

e,'A -0.087

(24)

6. MODEL VALIDATION: PREDICTIVE SIMULATION TECHNIQUES Generalizing a model by simply inserting additional parameters often provides an impression that unnecessarily complications are being created and it may not possibly result in some extra benefits. It is, therefore, desired to consider the validation of the three-parameter Pareto distribution for a given data set so that the proper authentication of the model may be assured. The developments given in the section will be based on predictive simulation ideas. The discussion presented here will, however, be absolutely informal and does not consider any particular model form rather assumes a model f(.) indexed with an unknown parameter

e

where, for generality, the parameter

e

is supposed to be a vector valued.

To begin with, let us consider the output obtained from either the Gibbs or the Metropolis chain. Needless to mention that other forms of Markov Chain Monte Carlo can also be considered besides the usual Gibbs or Metropolis algorithms considered in the paper. Therefore, it is better to call Markov Chain Monte Carlo, in general, to cover every such procedure. Having obtained the output from the Markov Chain Monte Carlo procedure, simulate the predictive/future samples y from the model distribution f( yJ8 ). These predictive or future samples can be utilized in a variety of ways to examine the validity of the assumed model for a given set of data and we shall call the procedure as the predictive simulation technique. The teclmique is based on the comparison of predictive samples y with the observed sample x with ultimate aim to see if the two data sets appear to have come from the same model. If the comparison is satisfactory, we can say that the given model is well fitted for the data in hand.

The comparison can be carried out in a variety of ways. Say, for example, the edf plots of the two data sets on the same scale can tell us at least informally that the two data sets might have arisen from the same model. Similarly, the two data sets can be used to obtain the posterior predictive p-values for a particular measure of discrepancy (see, for example, Gelman et al. (1996), Upadhyay and Smith (1993b), Upadhyay et al. (2001), etc.). If the evaluated p-value is large, the assumed model can be considered to provide a good fit for the data in hand. A

(25)

word of caution: we definitely do not suggest that the p-values form the basis of any model choice criterion rather they should be used to draw an informal message that the given data set can be assumed to have come from the model under consideration. It is usually recommended that one should go for complete Bayesian analysis to come across any final conclusion regarding the choice of a particular model. Moreover, the version of p-value considered here has received a number of adverse comments by Bayesians (see, for example, Bayarri and Berger (1998)) but due to non-availability of good and computationally efficient alternatives we shall remain confined to the posterior predictive version based on certain classical discrepancy measures. Before we finish, let us comment that the Bayesian versions of various classical discrepancy measures have already been defined in the literature by a number of Bayesians (see, for example, Gelman et al. (1996), Upadhyay et al. (2001), etc.).

6.1 POSTERIOR PREDICTIVE P-VALVES BASED ON CERTAIN CLASSICAL DISCREPANCY MEASURES

To provide the details, let T(.) be the discrepancy measure between the observed data and the posited model quantities. The corresponding posterior p- value can then be defined as

p = Pr [(T(y) ~ T(x) If,~)]

=

f

^{Pr (T(y)}~ T(x) I f, 6).p(6 I f, ~ d

e

⁽¹⁷⁾

where p(9lf.2i) is the posterior distribution obtained under the model f.

Obviously, (I 7) is the classical p-value averaged over the posterior distribution of 8 under the model f. It is to be noted that in the Bayesian analysis, rejection of a model may be caused either by a badly chosen prior or by an ill-defined likelihood or even by both. Thus the model checking in Bayesian analysis tests for the entire package, that is, the prior and the likelihood both. However, if the effect of assumed prior is negligible in comparison to the likelihood, the rejection can be solely accredited to the latter. Undoubtedly where a thorough study is desired, one should definitely consider the study of prior dominance before any final conclusion is drawn. W~ shall, however, restrict ourselves to a limited study

(26)

and assume that prior has no considerable affect in comparison to the likelihood. For calculating posterior .predictive p-value given in (17), we first obtain samples of 9 from the posterior p(9jf,.!) using possibly the Markov chain Monte Carlo technique. Then for each generated outcome of 9 from its posterior distribution, the predictive data y is obtained from f( y\8 ). Further, if the discrepancy measure is assumed to be chi-square, we have,

(18)

where ti may be Xi or Yi (i = 1, ... , n) as the case may be and n is the sample size.

On the other hand, if the discrepancy measure is assumed to be one by Kolmogorov-Smimov, we have,

(19) where Fo(t

I

9) is the cdf of the posited model that is completely specified and

Fn

(t) is the empirical distribution function given by

• number ofti 's :5 t

Fn(t)

=

n ⁽²⁰⁾

Thus in order to calculate the posterior predictive p-value, we have to calculate T(x) and T(y) using both the informative and the predictive data. For the discrepancy measure based on chi-square, the corresponding formula is given in (18) whereas for the measure based on Kolmogorov-Smimov; the same is given in (19). The process may be repeated a number of times for different 9 and finally the posterior predictive p-value can be estimated by the proportion of times T(y) exceeds T(x).

7. REAL DATA ILLUSTRATION

The purpose of this section is to examine the validity of the three-parameter Pareto distribution for the annual wage data given in Table 4. This data set will henceforth be referred to as the observed data. To apply the ideas discussed in the previous section, we first obtained a sample of size 50 from the posterior corresponding to three-parameter Pareto model using the Gibbs sampler

(27)

238 UP ADHY A Y, PESHW ANI, JAVED, TRIP A THI & PANDEY

algorithm. We then obtained 50 predictive samples, each of size equal to the size of the observed data, using the samples of a, 8 and A.. The empirical distribution function plots of both the observed and the predictive data are shown in Figure l.Q,_ Besides, we also have a continuous curve in the figure and it represents the plot of cdf from the three-parameter Pareto model. The single line step function shows the edf corresponding to the observed data. It is obvious from the figure that three-parameter Pareto model appears to be a good choice and we do not have enough evidence to reject the model. We can, therefore, conclude on the basis of the figure that the model fits the data well.

u !l_:,.~ ,i. ^_ ^_ ^:.:Jit, ^; I ';,' ' ' ' ; ' ^L

~.; '

1· - ;-r~-i

. "1"'f"'

:rr·

6-

..

~-

I I I

100 200 300 400 500

Figure 10: edfplots from the observed and the predictive data (continuous line corresponds to edt).

600

We next evaluated the posterior predictive p-value for the same data set

(28)

reported in Table 4 using the Kolmogorov-Smimov discrepancy measure. The discrepancy measure based on chi-square was not evaluated in this case; as for the Pareto model, both mean and variance are non-existent for a < 2. In order to evaluate the posterior p-value based on Kolmogorov-Smimov discrepancy measure, we generated I 000 posterior samples and correspondingly obtained the predictive samples of size equivalent to the size of the observed data from the three-parameter Pareto model. The p-value for the model was found to be 0.424;

a value that in no way advocates for the rejection of the model. Besides, we also observe an important point in the favor of the model if we see the estimates of threshold parameter A in Table 5. It appears that the estimates are quite large (far away from zero) and, therefore, we may say that if a model is capable of providing information on minimum guarantee then why not to consider it. Instead if a two-parameter form of the model (with A = 0) is considered, it will remain silent on the threshold, that is, a parameter below which the annual wage level will never fall.

8. CONCLUSION

The present paper provides at least two ways of simulating posterior samples from the three-parameter Pareto model. Both complete and censored data cases are detailed with each of the two procedures although not illustrated numerically for the Metropolis algorithm. It is obvious that the procedures are quite routine and user-friendly in comparison to other non-:;ample based approaches.

Undoubtedly, the Markov Chain Monte Carlo procedures are meant for very high dimensional posteriors but our description shows their validity for low dimensional posterior arising from the three-parameter Pareto model as well.

Moreover, with the availability of censored data, the number of unknowns is not just confined to three (unknown parameters) rather they include a number of unknown censored data also.

The paper also provides the scope of Markov Chain Monte Carlo procedure in an important issue of checking the validity of the model. It shows that once the output from the Markov Chain Monte Carlo algorithm is obtained, everything else

(29)

is quite straightforward. Our final observation on the estimates of threshold parameter (see Section 7) indicates that the three-parameter model, which appears to provide a good fit for the data reported in Table 4, is definitely a good candidate as it provides other important information as well. We therefore feel that instead of simplifYing the model by reducing the number of parameters, one should consider the generalized version and try analyzing using Markov Chain Monte Carlo technique. No doubt, the generalized version is slightly more complicated but it has an important advantage over other simplified versions as pointed out in the present paper.

ACKNOWLEDGMENTS

The authors wish to express their thankfulness to the anonymous member of the editorial board and the referee for their comments that improved the earlier version of the manuscript.

(30)

REFERENCES

Abdel-Ghaly, A. A., Attia, A. F. and Aly, H. M. (1998). Estimation of the parameters of Pareto distribution and the reliability function using accelerated life testing with censoring. Commun. Statist. - Simula., 27(2), 469-484.

Arnold, B. C. (1983). Pareto Distribution. Marcel Dekker: New York.

Arnold, B. C. and Press, S. J. (1983). Bayesian inference for Pareto populations. L

Econonrics,21,287-306.

Arnold, B. C. and Press, S. J. (1986). Bayesian analysis of censored or grouped data from Pareto populations. In Bayesian Inference and Decision Technique, eds. P. Goel and A. Zellner, Amsterdam: North-Holland, 157-173.

Arnold, B. C. and Press, S. J. (1989). Bayesian estimation and prediction for Pareto data.

J. Amer. Statist. Assoc., 84, 1079-84.

Bayarri, M. J. and Berger, J. 0. (1998). Quantifying surprise in the data and model verification. Bayesian Statistics 6, eds. J. M. Bernardo, J. 0. Berger, A. P. Dawid, and A.

F. M. Snrith, Oxford University Press, 53-82.

Carpenter, M. and Hebert, J. L. (1994). Estimating the nrinimum and maximum of two exponential location parameters in a gamma-exponential nrixture. Commun. Statist:

Theory & Methods, 23(8), 2367-2377.

Chen, M. H., Shao, Q. M. and Ibrahim, J. G. (2000). Monte Carlo Methods in Bayesian Computation. Springer-verlag: New York.

Davis, H. T. and Feldestein, M. L. (1979). The generalized Pareto law as a model for progressively censored survival data. Biometrica, 66, 2, 299-306.

Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer-verlag: New York.

Dyer, D. (1981). Structural probability bounds for the strong Pareto law. Canadian J.

Statistics, 9, 71-77.

Geiser, S. (1984). Predicting Pareto and exponential observables. Canadian J. Statistics, 12, 143-152.

Geiser, S. (1985). Interval prediction for Pareto and exponential observables. L Econometrics, 29, 173- 185.

Gelfand, A. E. and Snrith, A. F. M. (1990). Sampling based approaches to calculating marginal densities. J. Amer. Statist. Assoc., 85, 398-409.

Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Snrith, A. F. M. (1990). lllustration of Bayesian inference in normal data models using Gibbs sampling. J. Amer. Statist.

Assoc., 85, 972-985.

(31)

242 UPADHY AY, PESHWANI, JAVED, TRIPATHI & PANDEY

Gelfand, A. E., Smith, A. F. M. and Lee, T. M. (1992). Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. J. Amer. Statist. Assoc., 87, 523-532.

Gelman, A., Meng, X. L. and Stern, H. S. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733-807.

Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling.

A1ml

Statist., 41, 337-348.

Lwin T. (1972). Estimation of the tail of the Paretian law. Scand. Actuarial J., 55, 170 - 178.

Metropolis N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953).

Equations of state calculations by fast computing machine. J. Chern. Phys., 21, 1087- 1091.

Ripley, B.D. (1987). Stochastic Simulation. Wiley: New York.

Singh, B. P. (1992). Estimation of Shape Parameter of Classical Pareto Distribution Using Prior Information. Unpublished Ph.D. Thesis. Banaras Hindu University, V aranasi, India

Smith, A. F. M. and Roberts, G. 0. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. J. Roy. Statist. Soc. B, 55, 2-23.

Upadhyay, S. K. and Smith, A. F. M. (1993a). A Bayesian approach to model comparison in reliability via predictive simulation. TR-93-18, Department of Mathematics, Imperial College, London.

Upadhyay, S. K. and Smith, A. F. M. (1993b). Bayesian inference in the life testing and reliability via Markov chain Monte Carlo simulation. TR-93-19, Department of Mathematics, Imperial College, London.

Upadhyay, S. K. and Smith, A. F. M. (1993c). Simulation based Bayesian approaches to the analysis of lognormal regression model. Proceedings SRE Symposium: Reliability a Competitive Edge, KEMA, The Netherlands, 193-196.

Upadhyay, S. K., Vasistha, N. and Smith, A. F. M. (2001). Bayes inference in the life testing and reliability via Markov chain Monte Carlo simulation. Sankhya, Ser. A., 63(1), 15-40.

Vasishta, N. (1998). Sample Based Approaches for the Analysis of Certain Reliability Models. Unpublished Ph.D. Thesis. Banaras Hindu University, Varanasi, India.

Bayes Analysis of a Three-Parameter Pareto Distribution via Sample Based Approaches

American Journal of Mathematical and Management Sciences

Bayes Analysis of a Three-Parameter Pareto Distribution via Sample Based Approaches

S.K. Upadhyay, M. Peshwani, I.A. Javed, S. Tripathi & Rajeev Pandey

= ( )

o

e

P

e,

P

e

ea

=

(a)ac U(O,M)

(9)ac

(A.) ac

A.

e, A.l.!) ac

e

n ~9+x; -A.r'}

y[

~log(e

A.)]

-8a}(8~xl

e

e

e

.

J_

---!- .

j_ ---+--

I

___J__

_j_

! _L

--+-

I I :

---+-

e

e

}t

=

[1

- -

=

e

e

e

a

-

rh

....

...--

...--

I

g_

~- .-~~~L

~ - {:.:.~~i,'.~,:t/.::'·· .

...

e

e

f

e

I

Fn

=

u !l_:,.~ ,i. _ _ :.:Jit, ; I ';,' ' ' ' ; ' L

1· - ;-r~-i

:rr·

6-

A1ml

n ^~9+x; ^-A.r'}

^~log(e

^A.)]

_J

I _I _:

u !l_:,.~ ,i. ^_ ^_ ^:.:Jit, ^; I ';,' ' ' ' ; ' ^L