Prior specication

In the previous sections, some plausible priors for the variance components have been listed, if the posterior moments of a log-normal mixed model are of interest. It is worth to highlight that the constraints on the hyperparameters might produce highly informative priors for the variances, since their tails are required to be notably light. In this sense, it is necessary to propose a weakly informative prior specication that avoids an excessive underestimation of the variance components and that preserves the balance among them. On the other hand, if any information about the problem is available, then it is possible to specify the triplet of hyperparameters considering the GIG properties like mean and variance, always taking into consideration the existence conditions.

In order to present the weakly informative strategy, the simplest case with two variance components σ2 _{and τ}2 _{will be studied, then the results will be extended to the general} situation. However, this is a remarkable case since it represents the random intercept model and in this framework it is possible to dene the intraclass correlation coecient:

ρ = τ

σ2_{+ τ}2, (5.82)

that is a quantity of interest in the analysis of hierarchical models, both from a statistical viewpoint and from the applied perspective.

By studying the marginal prior of this quantity when two GIG priors like (5.47) are specied for σ2 _{and τ}2_{, some indications might be deduced in order to x a weakly informative prior} that respects the constraints on the parameters and, at the same time, it controls the prior balance among the variance components. In fact, the γ parameters could be xed according to conditions of proposition 5.4 by evaluating the expressions with (r +1), in order to assure the proper existence of the required moment of order r. Considering the three conditions:

(i) γ2 σ = (r + 1) + (r + 1)2maxi=1,...,ñ ˜ xT_i XTX−1x˜i; (ii) γ2 σ = (r + 1) + (r + 1)2maxi=1,...,ñ ˜ xT_i XTX−1 ˜ xi and γ2 τ,s = (r + 1) + (r + 1)2_max i=1,...,ñ ˜ xT_o,iLsxõi , _∀s; (iii) γ2 σ = (r + 1)2+ (r + 1)2maxi=1,...,ñ ˜ xT_i XTX−1x˜i.

Several prior selections strategies pointed out that a uniform prior on the intraclass correlation coecient leads to good frequentist properties for the parameters estimates. The idea of specifying a uniform distribution for ρ was presented by Chaloner (1987) and it con- stitutes the heuristic argument that justies the uniform shrinkage prior, which have been extensively studied and used (Daniels, 1999; Natarajan and Kass, 2000).

As a rst step, it is possible to observe that the prior on ρ follows a normalized generalized inverse Gaussian distribution: ρ ∼ N − GIG(λτ, δτ, γτ, λσ, δσ, γσ), a distribution whose density has been derived and studied in a work by Favaro et al. (2012). The density is:

p(ρ) = γσ δσ λσ γτ δτ λτ 2Kλσ(γσδσ)Kλτ(γτδτ) ρλτ−1₍₁_{− ρ)}λσ−1   δ2 τ ρ + δ2 σ 1−ρ γ2 τρ + γσ2(1− ρ)   λτ +λσ 2 × × Kλτ+λσ s δ2 τ ρ + δ2 σ 1_{− ρ} (γ2 τρ + γσ2(1− ρ)) ! , ρ∈ (0, 1). (5.83)

To build a strategy that is based on the prior balance among variances, it is reasonable to assume the same marginal prior GIG(λ, δ, γ) for σ2 _{and τ}2_{, inducing a prior on ρ controlled} only by three hyperparameters, whose density is:

p(ρ) = K2λ γ2_δ2h1 ρ+ 1 1−ρ i 2 [Kλ(γδ)] [ρ(1− ρ)]λ−1, ρ∈ (0, 1). (5.84) Moreover, evaluating the target functionals of the analysis, the most restrictive threshold should be chosen as unique value of γ.

Because of the presence of ρ inside a Bessel K function, the analytic derivation of the distribution characteristics like moments is not possible. However, it is interesting to consider that the parameters δ and γ enter the density through a multiplication only. This fact allows to compensate high values of the tail parameter γ, caused by the constraints, by decreasing the value of the concentration parameter δ. A notable simplication of the prior on ρ is evident taking into consideration the limiting case δ → 0: it removes from p(ρ) the potential eect of the constraint on γ and in previous applications of the GIG distribution as variance prior in the log-normal context this provides good frequentist properties (Fabrizi and Trivisano, 2012).

The density (5.84), in the limiting case δ → 0, becomes: f (ρ) = Γ(|2λ|)

Γ(_|λ|)2 [ρ(1− ρ)] |λ|−1

, ρ_{∈ (0, 1);} (5.85)

exploiting the small argument approximation of the Bessel K function (A.7).

Remembering that the limiting case of the GIG distribution when λ > 0 and δ → 0 is the gamma distribution Ga(λ, β), with β = γ2_/2 _{and having density:}

f (x) = β λ Γ(λ)x

λ−1_exp_{{−βx} ;} _(5.86)

if the priors of the variances are gamma distributions with equal parameters, then the prior on ρ is a beta distribution with equal parameters λ. Moreover, if λ = 1 then ρ ∼ U(0, 1) a priori. In addition, if two dierent shape parameters λσ and λτ are chosen, then it is possible to control the prior information given on ρ.

Another quantity of interest in the variance components models is the ratio φ = τ2_/σ2_{. It is} a one-to-one transformation of ρ and strategies to x its prior distribution have been studied (Chaloner, 1987; Ye, 1994). In particular, the latter proposed a solution for the problem in

the reference prior framework (Berger and Bernardo, 1992). Repeating the previous steps, then the prior of the ratio of variances GIG distributed has the following density:

p(φ) = Z +∞ 0 |σ 2_|p τ2 φσ2 p_σ2 σ2 dσ2 = γτ δτ λτ γσ δσ λσ 2Kλσ(γσδσ)Kλτ(γτδτ) φλτ−1 δ 2 τ+ φδ2σ φ 1 γ2 τφ + γσ2 λτ +λσ₂ × × Kλτ+λσ p (δ2 τ + φδσ2) (γτ2+ γσ2/φ) , (5.87)

with equal hyperparameters. Then, letting δ → 0, the distribution p(φ) reduces to the following particular form of beta prime distribution:

f (φ) = Γ(|2λ|) Γ(|λ|)2φ

|λ|−1

(1− φ)−|2λ|. (5.88)

Finally, choosing λ = 1, the prior of the ratio is:

f (φ) = (1 + φ)−2. (5.89)

The latter result conrms the prior for φ found by Chaloner (1987) when ρ ∼ U(0, 1) and is similar to the reference priors that was deduced by Ye (1994).

This strategy developed for models that have two variance components could be extended to the more general models with q + 1 variances. The basic idea is to consider that for every ρs = τs(τs+ σ2)−1 a uniform prior is specied. This occurs when all the priors are equal gamma distributions with shape 1 and scale parameter xed respecting the existence condition.

To sum up, under the described setting, if the λ parameter is set to be positive, a gamma prior G λ, γ2_/2

for each variance component is approximately assumed when δ = ε. Clearly, γ2 _{is considered xed accordingly to the rules of theorem 5.3. As a consequence, a normal-} gamma prior is specied marginally for the random eects vector u. This prior setting is not new to the literature: it was introduced by Grin and Brown (2010) as prior for the coecients of a linear model. Its main characteristic is the notable degree of shrinkage towards 0 because of the light prior tail.

In the proposed setting with λ = 1, the gamma distribution degenerates to the exponential distribution, and in that case the normal-gamma is a Laplace distribution. This particular prior is known also as Bayesian Lasso and is characterized by a spike in 0. The degree of shrinkage determined by this prior could be further enhanced setting λ near to 0, whereas increasing the parameter value, the amount of shrinkage xed decreases.

Fruhwirth-Schnatter and Wagner (2011) and Fabrizi et al. (2018) already used this prior also as prior for the random eects. The main dierence between these applications and the present proposal is represented by the approach used to deal with the scale (or rate) parameter of the gamma prior. In fact, the cited works specify an hyperprior on it. Unfortunately, in this situation this solution is not viable because of the restrictions on the parameter space

due to the posterior moments existence condition. However, thanks to this interpretation of the proposed prior specication setting, an interpretation of the λ parameter is oered: its value controls the prior shrinkage and it could be xed accordingly.

The R function LN_hierarchical automatically implements the prior strategy described in this section, choosing the more general existence condition after the user species the functionals of interest. In fact, the argument functional can receive as input the strings "Marginal" for θm(˜x), "Subject" for θc(˜x, ˜z) and "PostPredictive" for the posterior pre- dictive distribution. The matrices ˜X and ˜Z can be provided as input of the respective arguments and, as point values for the tail parameters γ-s, the expression determining the existence conditions are evaluated with r = 3, in order to assure the well denition of the posterior mean and the posterior variance. The multiple points prediction setting is considered and the conditions of proposition 5.4 are implemented.

In document Bayesian inference for quantiles and conditional means in log-normal models (Page 111-114)

Prior specication