Adjusting z-tests in regression settings - Reducing the Impact of Bias in Likelihood Inference

can observe that, aside from those regions of the panels where the aforesaid instability caused by values close to 0 of its denominator manifests itself, bT(ls) _{generally proves to}

behave reasonably well with respect to both Zu and Z.

The present section has shown how the idea of correcting the moments of the z- statistic to better match those of the standard normal distribution may be successful in some single-parameter models. In fact, not only in most cases was accuracy of Wald- based inferential procedures improved, but also their essential simplicity was maintained. In the next part of this chapter, the same approach will be reformulated in such a way as to cope also with more complex scenarios.

2.3 Adjusting z-tests in regression settings

2.3.1 Notation and setup

Let us now introduce a standard regression model, where the mean of the dependent variable is related to a set of covariates through some specified function. To formalize the problem, consider a random sample y = (y1, . . . , yn) of independent observations

from the generic distribution

Yi ∼ pYi(yi; θ, xi), θ ∈ Θ ⊆ IR

k_, _{i = 1, . . . , n,} _(2.24)

where xi = (xi1, . . . , xik0) is the k0-dimensional vector of fixed covariates for the ith

unit and the global parameter can be partitioned as θ = (ψ, λ). In particular, let the component of interest ψ = β = (β1, . . . , βk0) ∈ IR

k0 _{be the vector of scalar regression}

coefficients, while λ = (λ1, . . . , λk−k0) ∈ Λ ⊆ IR

k−k0 _{contains the remaining unknown}

quantities supposed by the model (e.g. dispersion/precision parameters as defined in Section 2.5.1). It is then possible to link the mean of the ith response variable with the corresponding k0 so-called regressors in xi as:

Eθ(Yi) = µi = h k0 X j=1 βjxij , i = 1, . . . , n, (2.25)

where h is some suitably smooth function typically selected according to the support of Yi. Notice that modeling frameworks like those considered in the last section are

in fact special cases of this more general scenario. Indeed, specification (2.1) follows straightforwardly from (2.24) by setting k = k0 = 1 with xi = 1 for every i = 1, . . . , n

and by choosing an appropriate function h. Below, we shall use the notation defined in Section 1.1 to refer to the usual likelihood quantities.

In regression settings one of the most common ways to investigate the effect of a specific covariate, accounting for all the others, on the dependent variable is via z-tests. The procedure for testing H0: βj = β0j (j = 1, . . . , k0) is the same as the one exposed

in Section 2.2.1 for models with scalar global parameter. However, here the Wald z- statistic for the jth coefficient which is standard output of many statistical software takes the form

b Tj = Tj(ˆθ; β0j) = ˆ βj − β0j q κj(ˆθ) , (2.26)

where κj indicates the (j, j)th element in the block iββ of the inverse Fisher information

matrix. Clearly, bTj = bT defined in (2.3) if k = 1. We stress that in the current context the standard error of ˆβj is usually evaluated at the global ML estimate so that to avoid

fitting the restricted model under the null hypothesis, which might be time-consuming for large datasets and/or in the presence of many parameters.

As repeatedly emphasized in the preceding parts, the N (0, 1) distribution can be a very poor approximation for the null behaviour of the pivot (2.26) in small-to-moderate- sized samples. Moreover, in multiple regression models the failure of such asymptotic result may occur also whether k is large relative to n (see, for example, McCullagh and Nelder, 1989, Section 6.2.4). Thus, in the same vein as what already suggested for the one-parameter case, the next section will present a convenient procedure to enhance Wald-type inferences while allowing the overall parameter to be multidimensional.

2.3.2 Location adjusted z-statistic

The Wald combinant in (2.26) is undoubtedly not as easy to deal with as its analogue (2.3) in the setting with scalar parameter is. In particular, the explicit computation of the former’s cumulants is tedious and results in expressions that are much less handy than those reported in Section 2.2.3. Consequently, under the present regression scenario an alternative approach for obtaining the quantities required to perform the moments correction of the z-statistic might be desirable.

The backbone of the insight behind the modification of the Wald pivot we are going to propose is seeing the function

Tj = Tj(θ; β0j) =

βj− β0j

pκ_j(θ) (2.27)

48 Section 2.3 - Adjusting z-tests in regression settings its ML estimator. Then, similarly to ˆθ, bTj may be considered to suffer from finite-sample bias, which one can try to reduce by applying, for instance, the standard technique for asymptotic bias correction described by Efron (1975, Remark 11, p. 1214).

In order to derive a general formula for the bias of the z-statistic, assume Tj _{in (2.27)}

is at least three times differentiable in the argument θ. Given the consistency of the ML estimator, the Taylor expansion of bTj _{− T}j _{about θ, written by adopting the Einstein}

summation convention, is Tj(ˆθ; β0j)− Tj(θ; β0j) = (ˆθs− θs)Tsj(θ; β0j) + 1 2(ˆθ s − θs_)(ˆ_θt − θt_)Tj st(θ; β0j) (2.28) +1 6(ˆθ s_{− θ}s_)(ˆ_θt_{− θ}t_)(ˆ_θu_{− θ}u_)Tj stu(θ; β0j) + Op n−3/2, with Tj

s(θ; β0j), Tstj(θ; β0j) and Tstuj (θ; β0j) gradient, hessian and third derivative, respec-

tively, of function (2.27) (s, t, u = 1, . . . , k), all of order O n1/2. Then the following expression ensues straightforwardly from taking expectations in both sides of (2.28) and applying result (2.4), as done in Remark 3 of Kosmidis and Firth (2010, Section 4.3):

EθTj(ˆθ; β0j)− Tj(θ; β0j) = Bs(θ)Tsj(θ; β0j) + 1 2ξ s,t_(θ)Tj st(θ; β0j) + O n−3/2 = BTj(θ; β_0j) + O n−3/2, (2.29)

where Bs_{(θ) is such that E}

θ θˆs−θs = Bs(θ)+o n−1 and ξs,t(θ) is the (s, t)th element of

i(θ)−1(s, t = 1, . . . , k). The first term in the asymptotic bias expansion of bTj = Tj(ˆθ; β0j)

may thus be estimated by BTj(ˆθ; β_0j), so that to define the location adjusted z-statistic

in regression settings as

Tj,∗ = Tj,∗(ˆθ; β0j) = bTj − BTj(ˆθ; β_0j). (2.30)

Henceforth, we will refer to the test based on bTj,∗ _{as the adjusted z-test. Note that the}

advantage of viewing bTj _{as an estimator of a transformation of θ lies in the simplicity}

of the procedure to derive its bias. Indeed, B_Tj(θ; β_0j) in formula (2.29) depends only

on quantities which are normally computed with no effort in regression frameworks. The importance of our expedient justifies the choice of considering for correction Wald pivots which use the expected information matrix to approximate the standard error of ˆβj. On this basis, the reparametrization trick is in fact readily applicable,

as data enter the expression only through the ML estimates. We are also aware that definition (2.30) does not completely agree with what recommended for one-parameter models. As the primary objective is approaching the null distribution of the z-statistic

to the N (0, 1), in that case the correction was sensibly performed by using its moments under H0. In the general scenario under analysis, the composite null hypothesis admits

the specification H0: θ = θ0 with θ0 = (β1, . . . , β0j, . . . , βk0, λ1, . . . , λk−k0)∈ Θ0 ⊆ IR

k−1_,

so the null expected value of (2.26) can be expressed as Eθ0( bT

) = BTj(θ₀; β_0j) + O n−3/2.

The most natural estimator of θ0, now partially unknown, is obviously ˆθβ0j, thus in prin-

ciple the adjustment in location should be accomplished via B_Tj(ˆθ_β

0j; β0j). The decision

to lean rather on BTj(ˆθ; β_0j) is taken with the aim of keeping the computational cost of

classical Wald-type procedures unchanged, by avoiding the constrained maximization of the log-likelihood function. However, such a resolution rests also on practical grounds: simulation results not shown here have not detected sensitive improvements in the general performance of the adjusted z-test when evaluation of the bias at the constrained ML estimate is preferred. In closing, we acknowledge that a scale correction of the Wald combinant is not being considered in this multiparameter setting because of the difficulty implicit in the derivation of a convenient expression for the variance of bTj.

In document Reducing the Impact of Bias in Likelihood Inference for Prominent Model Settings (Page 68-71)