E-BAYESIAN AND HIERARCHICAL BAYESIAN ESTIMATION IN A FAMILY OF DISTRIBUTIONS

(1)

are responsible for its content and its originality; (ii) any possible co-authors agreed to its submission to W-OSDCE.

E-BAYESIAN AND HIERARCHICAL BAYESIAN ESTIMATION IN A FAMILY OF DISTRIBUTIONS

KIAPOUR, A.1∗_{AND NAGHIZADEH QOMI, M.}2

1_{Department of Statistics, Babol branch, Islamic Azad University, Babol, Iran}

azadeh [email protected]

2 _{Department of Statistics, University of Mazandaran, Babolsar, Iran}

[email protected]

Abstract. In this paper, we deal with Bayesian, E-Bayesian and hierarchical Bayesian estimation in a family of distributions under a squared log error loss function. Specially, E-Bayesian and hierarchical Bayesian estimators for the shape parameter of a Pareto distribution is provided when the scale parameter is known. A monte carlo simulation is conducted for comparision of Bayes and E-Bayesian estimators. A real data set is used for illustrating the proposed estimators.

1. Introduction

A Bayesian approach to a statistical problem requires defining a prior distri-bution over the parameter space and loss function. Many Bayesian believe that just one prior can be elicited. In practice, the prior knowledge is vague and any elicited prior distribution is only an approximation to the true one. So, we elect to restrict attention to a given flexible family of priors. Various solutions to this problem have been proposed. One of the proposed solution is E-Bayesian approach, which has been applied over the last decades. The E-Bayesian method was first introduced by Han (1997). The E-Bayesian estimator of unknown parameter is obtained on the basis of distribution of the hyperparameter(s), for more details, see Han (2007,2009,2011), Jaheen and Okasha (2011) and Kiapour (2018). In some situation, prior distribution parameters may be depend on the hyper parameters. In this situation, we often use of the hierarchical Bayesian estimation method. The hierarchical Bayes method were first introduced by Lindly and Smith (1972).

2010 Mathematics Subject Classification. 62F15.

Key words and phrases. E-Bayesian estimation , Hierarchical Bayes, Pareto distribution. ∗_Speaker.

(2)

In Bayesian inference, the most commonly used loss function is convex and sym-metric Squared Error loss (SEL) function which is widely used in decision theory due to its simple mathematical properties. But in some cases, it does not represent the true loss structure. For example it is not useful for estimation of the scale pa-rameter and it assigns the same penalizes to overestimation and underestimation. For estimation of the scale parameter θ, Brown (1968) proposed the Squared Log Error Loss (SLEL) function, which is given by

L(θ, δ) = (ln δ− ln θ)2= [ lnδ θ ]2 , (1.1)

where both θ and δ are positive. This loss is not symmetric and convex, it is convex when δ_θ ≤ e and concave otherwise, but has a unique minimum at δ = θ and L(θ, δ) is increasing as δ moves away from θ in either direction. In the estimation problems that underestimation is more serious than overestimation, this loss is appropriate to use, see Kiapour and Nematollahi (2011).

In this paper, Bayes, E-Bayesian and hierarchical Bayesian estimators in a fam-ily of distributions have been obtained under the loss function (1.1). In section 2, we state preliminary definitions and formulas of Bayesian, E-Bayesian and hierar-chical Bayesian estimation of unknown parameter. In section 3, we find the Bayes estimator for the parameter θ in a family of distributions under the loss function (1.1). E-Bayesian estimators are developed in section 4. A Monte Carlo simulation is used for a comparision of the E-Bayesian estimators of the shape parameter of a Pareto distribution in section 5. Hierarchical Bayesian estimators are obtained in section 6. The golfers income data is used for practical illustration in section 7. Finally, we end the paper by a discussion.

2. preliminaries

let Xn= (X1, ..., Xn) be independent and identically distributed (i.i.d.) random

variables from a distribution pθ indexed by a real unknown parameter θ. Also, let

(χ, B, p) denoted the probability space generated by X, where χ ⊂ Rn_{, B is the}

σ-field of χ, p ={pθ(x)|θ ∈ θ} and θ is the space parameter. In estimation of θ, let

L(θ, δ) be the loss function (1.1). Then, the posterior risk of δ bases on observations xn _{= (x}

1, ..., xn) can be expressed as

ρ(π, δ) = ln2δ(xn) + E[ln2θ| xn]− 2 ln δ(xn)E[ln θ| xn]. (2.1) The Bayes estimate of θ based on observation xn _{is any estimate δ}B_(xn_{) that}

minimizes the posterior risk (2.1), which is given by

δB(xn) = eE[ln θ|xn]. (2.2) Information on the appropriate prior is often inadequate to unambiguously spec-ify a prior distribution. The problem of expressing uncertainty regarding prior information can be solved by using a class of prior distributions.

E-Bayesian inference deals with such a problem by constructing methods which are stable to such a lack of information. Cosider a prior π(θ|a, b) for θ with hyper-parameters a and b. The E-Bayesian estimator of θ is the expectation of the Bayes estimator for the all hyperparameters and is defined as

δEB(xn) = ∫ ∫

D

(3)

Table 1. Representation of the family p

Distribution pθ(x) s(x) t(x)

Poisson P oi(θ) x 1

Exponential E(θ) 1 x

Gamma G(α, θ), α > 0 known α x

Pareto P ar(α, θ), α > 0 known 1 ln(x_α) Power P (λ, θ), λ > 0 known 1 ln(λ

x)

Negative exponential N E(µ, θ), µ > 0 known 1 x− µ

Inverse gamma IG(α, θ), α > 0 known α 1 x

Inverse gaussian IGa(µ, θ), α > 0 known 1 2

(x_−µ)2

2µ2_x

where π(a, b) is the prior density function of hyperparameters a and b.

According to Lindley and Smith (1972), one prior distribution may be adapted to the hyper parameters while the prior distribution includes hyper parameters. The corresponding hierarchical prior density function of θ is

π(θ) =

∫ ∫

D

π(θ|a, b)π(a, b)dadb (2.4)

Therefore, the hierarchical Bayesian estimator is obtained based on the hierarchical posterior distributon using (2.2) as δHB(xn) = eE[ln θ|xn] .

3. Bayesian estimation strategy

Let {pθ|θ ∈ Θ} be an one-parameter family of distributions with probabilty

density function (p.d.f.)

fθ(x) = c(x, n)θs(x)e−t(x)θ, x∈ R,

where c(x, n) is a function of x and n, and t, s are fixed. Examples of such distri-butions are given in Table 1.

Let X1, X2, ..., Xn be a sequence of i.i.d. random variable with distribution fθ.

Set X = (X1, X2, ..., Xn). Also, let πa,b be a conjugate family of distribution with

p.d.f.

π(θ|a, b) = b

a

Γ(a)θ

a−1_e−bθ_{, θ > 0,} _(3.1)

where Γ(a) =∫₀∞xa−1e−xdx is the gamma function, and hyper parameters a > 0

and b > 0. It is easy to verify that the posterior distribution of θ given x is

Gamma(S + α, T + β), where S =∑n_i=1s(xi) and T =

∑n

i=1t(xi). Therefore, the

Bayes estimator of θ under the loss function (1.1) is given by

δB(x) = e

ψ(S+a)

T + b , (3.2)

where ψ(ν) = _d(ν)d ln Γ(ν) = Γ_Γ(ν)′(ν) is the digamma function. 4. E-Bayesian estimation

According to Han (1997) , a and b should be selected to guarantee that π(θ|a, b) is a decreasing function of θ. If we take the conjugate prior (3.1), hyperparameters

(4)

0. Prior distribution with thinner tail would make worse robustness of Bayesian distribution. Accordingly, b should not too big while 0 < a < 1. It is better to choose 0 < a < 1 and 0 < b < c (c > 0, and c is a constant).

Suppose that the prior distributions of a and b are uniform distribution in (0, 1) and uniform distribution in (0, c), respectivelly, when a and b are independent. Therefore, the joint prior distribution of a and b is given by

π1(a, b) =

1

c, 0 < a < 1, 0 < b < c. (4.1)

In the following theorem, we obtain E-Bayesian estimator of θ under the loss func-tion (1.1) and prior distribution prior distribution (4.1).

Theorem 4.1. Let xn _{= (x}

1, x2, ..., xn) be the sample observations from the

one-parameter exponential family. Then, the E-Bayesian estimator of θ corresponding to the prior given in (4.1) under the loss function (1.1) is equal to

δEB1(xn) = 1 c ln(1 + c T) ∫ 1 0 eψ(S+a)da. (4.2)

Proof. For π(α, β), the E-Bayesian estimator under the function(1.1) is given by

δEB1(xn) = ∫ 1 0 ∫ c 0 eψ(S+a) c(T + b)dbda = 1 cln(1 + c T) ∫ 1 0 eψ(S+a)da.

which ends the proof. □

Also, suppose that the prior distribution of a is Beta distribution Beta(u, v), and the prior distribution of b is uniform distribution in (0, c), when a and b are independent. Then, the joint prior distribution of a and b is given by

π2(a, b) =

1

cB(u, v)a

u−1₍₁_{− a)}v−1_{, 0 < a < 1, 0 < b < c,} _(4.3)

where B(u, v) =∫₀1xu−1(1−x)v−1dx is the beta function. In the following theorem,

we obtain the E-Bayesian estimator of θ under the loss function (1.1) and prior distribution (4.3).

Theorem 4.2. If xn _{= (x}

1, x2, ..., xn) are the sample observations from the

one-parameter exponential family, then, the E-Bayesian estimator of θ corresponding to the prior given in (4.3) under the loss function (1.1) is all equal to

δEB2(xn) =1 cln(1 + c T) ∫ 1 0 eψ(S+a) 1 B(u, v)a u−1₍₁_{− a)}v−1_da. _(4.4)

Proof. The E-Bayesian estimator under the function(1.1) is given by

δEB2(xn) = ∫ 1 0 ∫ c 0 eψ(S+a) (T + b) 1 cB(u, v)a u₋₁₍₁_{− a)}v₋₁_dbda = 1 cln(1 + c T) ∫ 1 0 eψ(S+a) 1 B(u, v)a u₋₁₍₁_{− a)}v₋₁_da.

(5)

5. Simulation study

In this section, we perform a numerical comparison between the Bayes and E-Bayesian estimators for the shape parameter of a Pareto distribution. For this purpose, we generate sequences n of independent random samples from Pareto distribution with true value of parameter α = 200 and θ = 3.

Let δk

i, k = 1, 2, 3 stands for δB(xn) with a = 0.6 and b = 2 given by (3.2)

and E-Bayesian estimators δEBi_(xn_{), i = 1, 2 given by (}_4.1_{) and (}_4.3_{) for selected} values c = 2.5, 3, 3.5, u = 3 and v = 2. in ith replication, respectively. Repeat these tasks M = 104 _{times and calculate the value of Estimated Risk (ER) using}

the following formula

ER(δk) = 1 M M ∑ i=1 (ln δ_ik− ln θ)2. (5.1)

The results are summarized in Table 2. It is seen from Table 2 that the perfor-mance of the E-Bayes estimators are quite satisfactory than the Bayes estimator. Moreover, the estimated risk decreases as the sample size increases.

Table 2. Results of ER for Bayes and E-Bayesian estimators

n c δB δEB1 δEB2 20 2.5 0.10209 0.07367 0.07223 3 0.08048 0.07875 3.5 0.08870 0.08669 50 2.5 0.03120 0.02572 0.02545 3 0.02718 0.02686 3.5 0.02898 0.02860 100 2.5 0.01910 0.01637 0.01624 3 0.01715 0.01699 3.5 0.01807 0.01789

6. Hierarchical Bayesian estimation

In this section, we obtain hierarchical Bayesian estimators of θ Based on two pro-posed prior distributions π1(a, b) and π2(a, b). First, consider the prior distributions

π1(a, b). Then, the hierarchical prior distrbution is given by

π1(θ) =

∫ 1 0

∫ c 0

π(θ|a, b)π(a, b)dbda

= 1 c ∫ 1 0 ∫ c 0 ba Γ(a)θ a−1_e−bθ_{dbda, θ > 0.} _(6.1)

In the following theorem, we obtain the hierarchical Bayesian estimator of θ under the loss function (1.1) and the hierarchical prior distribution of θ in (6.1). Theorem 6.1. Let xn _{= (x}

one-parameter exponential family. Then, the hierarchical Bayesian estimator of θ under the loss function (1.1) is equal to

δHB1(xn) = exp ( ∫1 0 ∫c 0 ba_Γ(S+a)

Γ(a)(T +b)S+a(ψ(S + a)− ln(T + b))dbda ∫1

0

∫c 0

baΓ(S+a) (T +b)S+a_Γ(a)dbda

(6)

Proof. The hierarchical posterior density function of θ is given by π1(θ|xn) = π1(θ)L(θ|xn) ∫_∞ 0 π1(θ)L(θ|x n_)dθ = ∫1 0 ∫c 0 βα Γ(a)θ S+a₋₁_e_{−(T +b)θ}_dbda ∫1 0 ∫c 0 ba Γ(a) ∫_∞ 0 θ S+a₋₁_e_{−(T +b)θ}_dθdbda = ∫1 0 ∫c 0 ba Γ(a)θ S+a−1_e−(T +b)θ_dbda ∫1 0 ∫c 0 ba_Γ(S+a) (T +b)S+a_Γ(a)dbda

(6.3) We have E[ln θ|xn] = ∫ _∞ 0 (ln θ)π1(θ|xn)dθ = ∫1 0 ∫c 0 ba Γ(a) ∫_∞ 0 (ln θ)θ S+a₋₁_e_{−(T +b)θ}_dθdbda ∫1 0 ∫c 0 ba_Γ(S+a) (T +b)S+a_Γ(a)dbda = ∫1 0 ∫c 0 ba_Γ(S+a) Γ(a)(T +b)S+α(ψ(S + a)− ln(T + b))dbda ∫1 0 ∫c 0 ba_Γ(S+a) (T +b)S+a_Γ(a)dbda

. (6.4)

Thus, the proof is completed. □

Now, consider the prior distributions π2(a, b). Then, the hierarchical prior

dis-trbution is given by π2(θ) = 1 cB(u, v) ∫ 1 0 ∫ c 0 ba Γ(a)θ a−1_e−bθ _au−1 ₍₁_{− a)}v−1_{dbda, θ > 0.} _(6.5)

In the following theorem, we obtain the hierarchical Bayesian estimator of θ under the loss function (1.1) and the hierarchical prior distribution of θ in (6.5). Theorem 6.2. Let xn _{= (x}

one-parameter exponential family. Then, the hierarchical Bayesian estimator of θ under the loss function (1.1) is equal to

δHB2_(xn_{) = exp (} ∫1

0

∫c 0

baΓ(S+a)au−1(1−a)v−1

Γ(a)(T +b)S+a (ψ(S + a)− ln(T + b))dbda ∫1

0

∫c 0

ba_Γ(S+a)au−1(1−a)v−1 (T +b)S+a_Γ(a) dbda

) (6.6)

Proof. The hierarchical posterior density function of θ is given by π2(θ|xn) = π2(θ)L(θ|xn) ∫_∞ 0 π2(θ)L(θ|xn)dθ = ∫1 0 ∫c 0 baau−1(1−a)v−1 Γ(a) θ S+a−1_e−(T +b)θ_dbda ∫1 0 ∫c 0 ba_au−1(1−a)v−1 Γ(a) ∫_∞ 0 θ S+a−1_e−(T +b)θ_dθdbda = ∫1 0 ∫c 0 baau−1(1−a)v−1 Γ(a) θ S+a₋₁_e_{−(T +b)θ}_dbda ∫1 0 ∫c 0

ba_Γ(S+a)au−1(1−a)v−1 (T +b)S+a_Γ(a) dbda

(6.7) We have E[ln θ|xn] = ∫ _∞ 0 (ln θ)π2(θ|xn)dθ

(7)

Table 3. the golfers income data 3581 1960 1433 1184 1066 1005 883 841 778 753 2474 1684 1410 1171 1056 1001 878 825 778 746 2202 1627 1374 1109 1051 965 871 820 771 729 1858 1537 1338 1095 1031 944 849 816 769 712 1829 1519 1208 1092 1016 912 844 814 759 708 = ∫1 0 ∫c 0 ba_au−1₍₁_−a)v−1 Γ(a) ∫_∞ 0 (ln θ)θ S+a−1_e−(T +b)θ_dθdbda ∫1 0 ∫c 0 ba_Γ(S+a)au−1(1_−a)v−1 (T +b)S+a_Γ(a) dadb =

∫1 0

∫c 0

ba_Γ(S+a)au−1₍₁_−a)v−1

Γ(a)(T +b)S+a (ψ(S + a)− ln(T + b))dbda ∫1

0

∫c 0

ba_Γ(S+a)au−1(1_−a)v−1 (T +b)S+α_Γ(a) dbda

(6.8)

which ends the proof. □

7. A real example

Consider the golfers incomae data (Arnold, 2015). The given 50 golfers earning more than 70000 dollar, their income by the end of the 1980 years data are shown in Table 3 (unit: 1000 dollar). A Pareto distribution with scale parameter α = 703 and the shap2e parameter θ = 2.23 has a good fit to data. The Bayesestimates with a = 0.6 and b = 2, E-Bayesian and hierarchical Bayesian estimates with

u = 3, v = 2 and selected values of c = 2.5, 3, 3.5 are summarized in Table 4. It

is observed that the E-Bayesian and hierarchical Bayesian estimates are very close. Also, these estimates are all robust.

Table 4. Results for Bayes, E-Bayesian and hierarchical estimates

c δB δEB1 δEB2 δHB1 δHB2

2.5 2.1084 2.1749 2.1793 2.2327 2.2318

3 2.1524 2.1567 2.2306 2.2301

3.5 2.1305 2.1348 2.2296 2.2293

8. Discussion

Our aim of this paper is to study the Bayes, E-Bayesian and hierarchical Bayesian estimation of the unknown scale parameter for an exponential family of distributions under the SLEL function. First, we derive the Bayes estimator by choosing an explicit prior distribution over the parameter of interest. In practical situations, the prior knowledge is vague and any elicited prior distribution is only an approximation to the true one. So, the E-Bayesian and the hierarchical Bayesian analysis can be employed. Therefore, we investigated the performance of E-Bayesian estimators for selected values of c (an upper bound for b) in comparison with the Bayes estimator. Our ndings in a simulation study showed that E-Bayesian estimators work better than the Bayes estimator. We also considered the golfers income data. In this case, the E-Bayesian estimators performed better than other estimators.

(8)

References

1. Arnold, B. C. (2015), Pareto distributions, Chapman and Hall/CRC Press.

2. Brown, L. D. (1968), Inadmisibility of the usual estimator of scale parameters in problems with unknown location and scale parameters, Annals of Mathematical Statistics, 39, 29-48. 3. Han, M. (1997), The structure of hierarchical prior distribution and its applications, Chin.

Oper. Res. Manag. Sci. 6, 31-40.

4. Han, M. (2007), E-Bayesian estimation of failure probability and its application, Math. Chin,

Comput. Model. 45, 1272-1279.

5. Han, M. (2009), E-Bayesian estimation and hirarchical Baysian estimation of failure rate,

Appl.Math. Model. 33, 1915-1922.

6. Han, M. (2011), E-Bayesian estimation and hirarchical Baysian estimation of failure probability,

Commun. Stat. Theory Methods. 40, 3303-3314.

7. Jaheen, Z. F. and Okasha, H. M. (2011), E-Bayesian estimation for the burr type XII model based on type- 2 censoring, Appl. Math. Model. 35, 4730-4737.

8. Kiapour, A. (2018), Bayes, E-Bayes and robust Bayes premium estimation and prediction under the squared log error loss function, Journal of the Iranian Statistical Society, 17, 33-47. 9. Kiapour, A., and Nematollahi, N. (2011), Robust Bayesian prediction and estimation under a

square error loss function, Statistics and Probability Letters, 81, 1717-1724.

10. Lindley, D. V, and Smith, A. F. M. (1971), Bayes estimates for the linear model, J. Stat. Soc.