Estimation of the Coefficients - Regularization Methods for Predicting an Ordinal Response usin

To estimate the unknown parameters in an ordinal model, a maximum likelihood approach is used. We first construct the likelihood function for the four different ordinal models. Since generally, there is no closed-form of the MLE for the ordinal model. An iterative optimization method is required to obtain the MLE. We then introduce several general-purpose iterative optimization methods as well as the R functions and SAS procedures for fitting the ordinal models.

1.3.1 Maximum Likelihood Estimate

Since the ordinal response Yifollows a multinomial distribution with trial size 1, the likelihood function for n observation is

L(α, β; x) = n Y i=1 f (xi;α, β) = n Y i=1 C Y c=1 πc(xi)yic. (1.3.1)

As discussed in Section 1.2, πc(xi) is a function of unknown parametersα, β and xi where xi are the independent covariates. πc(xi) has a specified form determined by the ordinal model type. Here, we illustrate the maximum likelihood estimate approach using cumulative logit ordinal model, where (1.3.1) can be further written as:

L(α, β; x) = n Y i=1 C Y c=1 πc(xi)yic = n Y i=1 C Y c=1 exp(αc+ xTi β) 1 + exp(αc+ xTi β) − exp(αc−1+ x T i β) 1 + exp(αc−1+ xTi β) yic . (1.3.2)

Since the unknown parameters αc and β are nonlinear in the likelihood function (as well as the log likelihood function) to be optimized, an iterative method is required to suc- cessively find optimum roots for the function. McCullagh [1980] and Walker and Dun- can [1967] proposed Fisher’s scoring to obtain the maximum likelihood estimate. Notice there is an inequality constraint on the intercept for the cumulative logit model where −∞ ≤ α0 ≤ α1 ≤ · · · αC = ∞ as discuss in Section 1.2.1. We used the constrained nonlinear optimization algorithm Augmented Lagrangian Adaptive Barrier Minimization proposed by Varadhan [2011] to obtain the maximum likelihood estimate of the cumulative logit model. For other types of ordinal models, the iterative algorithm for solving unconstrained nonlinear optimization problem would be appropriate.

1.3.2 Optimization Technique

There are a variety of algorithms suitable for solving unconstrained nonlinear optimization problems. The three commonly used algorithms based on the second-order Taylor series expansion of the likelihood function are: Newton-Raphson, Fisher’s Scoring, and Iteratively Reweighted Least Squares (IRLS). Here, we introduce the Newton-Raphson and Fisher’s Scoring algorithms and leave the IRLS to be introduced in Chapter 3.

We first briefly explain the mechanics of the Newton-Raphson algorithm, which got its name from the two inventors: Isaac Newton and Joseph Raphson. Let L(β) be the log- likelihood where β = (β1, · · · , βp) is a vector of p unknown parameters that needs to be estimated. We denote vector uT _{as the first-order partial derivatives of L(β) with respect to}

each βj, j = 1, · · · , p where uT = ∂L(β)_∂β = ∂L(β)_∂β

1 , · · · ,

∂L(β)

∂βp . We also denote H as a p × p

Hessian matrix consisting of second-order partial derivatives of L(β) where each entry has the form hij = ∂

2_L(β)

∂βi∂βj, i = 1, · · · , p, j = 1, · · · , p. According to second-order Taylor series

expansion, L(β) can be approximated as:

L(β) ≈ L(β(s)) + uT(s)(β − β(s)) + 1

2(β − β (s)

)0H(s)(β − β(s)) (1.3.3)

whereβ(s+1) ₌_β(s)_{− (H}(s)₎−1_u(s)_{is the approximation to the root evaluated at the (s + 1)}th iteration by solving the first order partial derivative of second-order Taylor series expansion of L(β), that is, ∂L(β)_∂β ≈ u(s)_{+ H}(s)_{(β − β}(s)_{) = 0. This iteration is repeated until the difference} between L(βs_{) and L(β}s−1_{) is negligible, that is where the optimum value of L(β) is reached.}

Analogous to the Newton-Raphson algorithm, Fisher’s Scoring is also based on the second order Taylor series expansion of the likelihood function. Instead of using the Hessian matrix directly in the approximation, it uses the p × p Fisher’s information matrix which is the negative expectation of the second-order partial derivatives of L(β) where we denote it as ι and ιij = −E ∂

2_L(β)

∂βi∂βj, i = 1, · · · , p, j = 1, · · · , p. By substituting H

(s) _{with −ι}(s) _in

(1.3.3),β(s+1) _{at the (s + 1)}th _{iteration can be evaluated as}_β(s+1) ₌_β(s)_{+ (ι}(s)₎−1_u(s)

The two optimization algorithms discussed above require calculation of the second order derivatives of L(β) which can be tremendously computationally expensive when L(β) has a complex form. Alternatively, algorithms based on the approximation of the second-order partial derivative such as quasi-Newton BFGS [Broyden, 1970, Fletcher, 1970], L-BFGS-B [RH. Byrd and Zhu, 1995], Dual quasi-Newton with dogleg strategy [Dennis and Mei, 1979]

and BHHH [E. Berndt and Hausman, 1974] are actively involved to solve the unconstrained nonlinear optimization problems. Other algorithms based on a different mechanism, such as Nelder-Mead method [Nelder and Mead, 1965], which is a heuristic search method to mini- mize an objective function in a multi-dimensional space, and the conjugate gradient method [Fletcher and Reeves, 1964], which provides numerical solution in a sparse system, are also implemented to optimize the likelihood function.

1.3.3 Software Implementation

Currently, there are several software packages capable of fitting ordinal models. In SAS ver- sion 9.2, PROC LOGISTIC procedure is capable of fitting the cumulative logit model and PROC NLMIXED procedure provides the flexibility to fit all types of ordinal models. It is worth mentioning that the optimization techniques in these two procedures are different, which may yield slightly different results under certain situations. The PROC LOGISTIC procedure implements the Newton-Raphson algorithm and Fisher’s Scoring algorithm (default) while PROC NLMIXED has a larger selection of optimization methods including Dual Quasi-Newton (default), Conjugate Gradient methods and Nelder-Mead simplex method, etc. For R ver- sion 2.13.1, the package VGAM [T.W.Yee, 2013] is capable of fitting different types of ordinal models by creating the class of Vector Generalized Linear Models (VGLMs) using the vglm function. The default and currently the only optimization method implemented in the vglm function is Iteratively Reweighted Least Squares (IRLS). In addition, we also wrote our own code for fitting different types of ordinal models. For the cumulative logit model, we optimize the model using the nonlinear constrained optimization method incorporated in the R pack-

age alabama [Varadhan, 2011]. For other types of ordinal models, the parameter estimates are obtained using the Newton-Raphson algorithm in the universal nonlinear optimization function nlm.

In document Regularization Methods for Predicting an Ordinal Response using Longitudinal High-dimensional Genomic Data (Page 35-39)