Logistic Regression Models
4.3 Nonlinear Logistic Regression Models
The logistic regression models that we have considered to this point all suppose monotonically increasing or decreasing response rates for
0 50 100 150 200 250
0.00.20.40.60.81.0
Figure 4.6 Plot of post-operative kyphosis occurrence along Y = 1 and non-occurrence along Y= 0 versus the age (x; in months) of 83 patients.
changing stimulus levels. In some cases, however, this assumption does not hold and it is therefore necessary to consider model nonlinearization.
Figure 4.6 represents one such case. It is a plot of post-operative kyphosis occurrence along Y = 1 and non-occurrence along Y = 0 ver-sus the age (x; in months) of 83 patients at the time of laminectomy, a corrective spinal surgery for kyphosis, which is a severe curving of the spine (Hastie and Tibshirani, 1990, p. 301). The objective, in this case, is to predict the timing of laminectomy that will result in a small-est probability of post-operative kyphosis occurrence P(Y = 1|x) = π(x), where π(x) is the probability of post-operative occurrence relative to age in months at the time of surgery x.
If the probability of post-operative kyphosis is taken as monotonic with respect to age, then logistic regression model (4.7) is appropriate. If the estimated model shows an increase in occurrence probability with increasing age, then surgery should be performed during early child-hood, and conversely, if the probability decreases with increasing age, then surgery should be performed after the completion of body growth.
It is not clear from the data plotted in Figure 4.6, however, that the rela-tion between probability and age is monotonic. In this case, it is therefore necessary to consider the fit of a logistic regression model for nonlinear-ity.
By taking the logit transformation for the logistic regression model in (4.7), we have
log π(x)
1− π(x) = β0+ β1x. (4.24) Conceptually, nonlinearization of the logistic regression model essen-tially consists of nonlinearization of the right-hand side of this equation.
For a cubic polynomial model, nonlinearization yields log π(x)
1− π(x) = β0+ β1x+ β2x2+ β3x3. (4.25) As in the nonlinearization of linear regression models, nonlineariza-tion of the logistic regression model is performed by substitunonlineariza-tion of a polynomial model, spline, or B-spline. In general we consider the fol-lowing nonlinear logistic regression model based on the basis functions b0(x)≡ 1, b1(x), b2(x),· · ·, bm(x)
· · · , wm)Tis an (m+1)-dimensional parameter vector. Then the nonlinear logistic regression model linking multiple risk factors and a risk proba-bility can be given by
π(x) =
Nonlinear logistic regression models can be constructed using the Gaus-sian basis function described in Section 3.2.3 as the basis function. Lo-gistic regression models can also be applied to linear and nonlinear dis-criminant analysis by Bayes’ theorem. This logistic discrimination will be described in Section 7.3.
4.3.1 Model Estimation
The nonlinear logistic regression model (4.27) based on the observed data{(xi, yi); i = 1, 2, · · · , n} is estimated by the method of maximum
likelihood. Let y1, · · · , ynbe an independent sequence of binary random variables taking values of 0 and 1 with conditional probabilities
P(Y= 1|xi)= π(xi) and P(Y = 0|xi)= 1 − π(xi). (4.28) The yiis a discrete random variable with Bernoulli distribution
f (yi|xi;w) = π(xi)yi{1 − π(xi)}1−yi, yi= 0, 1. (4.29)
The solutionw = ˆw, which maximizes (w), is obtained using the nu-merical optimization method described in Section 4.2.1. In the estimation process, the following substitutions are made in (4.18) and (4.19):
β ⇒ w, X⇒ B, b(xn))T. Substituting the estimator ˆwλobtained by the numerical opti-mization method into (4.27), we have the nonlinear logistic regression model
ˆπ(x)= exp wˆTb(x) 1+ exp
wˆTb(x). (4.31)
4.3.2 Model Evaluation and Selection
When addressing data containing a complex nonlinear structure, we need to construct a model that provides flexibility in describing the structure.
One approach for this issue is to capture the structure by selecting the number of basis functions included in (4.26). It may therefore be useful to consider selection of the number of basis functions by applying model evaluation and selection criteria.
Replacing the parameter vector w in the log-likelihood function (4.30) with the maximum likelihood estimator ˆw yields
( ˆw) = Then the AIC for evaluating the nonlinear logistic regression model es-timated by the maximum likelihood method is given by
AIC= −2 ( ˆw) + 2(m + 1) (4.33)
Out of the statistical models constructed by the various values m, the number of basis functions, the optimal model is selected by minimizing the information criterion AIC.
Example 4.2 (Probability of occurrence of kyphosis) Figure 4.7 shows a plot of data for 83 patients who received laminectomy, in terms of their age (x, in months) at the time of operation, and Y = 1 if the patient devel-oped kyphosis and Y = 0 otherwise. As the figure indicates, the proba-bility of onset is not necessarily monotone with respect to age expressed in months. Therefore, we fit the following nonlinear logistic regression model based on polynomials
π(x) = exp
β0+ β1x+ β2x2+ · · · + βpxp 1+ exp
β0+ β1x+ β2x2+ · · · + βpxp. (4.34) By applying the AIC in (4.33), we selected the model with 2nd order polynomial (AIC= 84.22). The corresponding logistic curve is given by
ˆπ(x)= exp(−3.0346 + 0.0558x − 0.0003x2)
1+ exp(−3.0346 + 0.0558x − 0.0003x2). (4.35) The curve in Figure 4.7 represents the estimated curve. It can be seen from the estimated logistic curve that while the rate of onset increases with the patient’s age in months at the time of surgery, a peak occurs at approximately 100 months, and the rate of onset begins to decrease thereafter.
0 50 100 150 200 250
0.00.20.40.60.81.0
Figure 4.7 Fitting the polynomial-based nonlinear logisitic regression model to the kyphosis data.
Exercises
4.1 Consider the logistic model
y = f (x) = exp(β0+ β1x) 1+ exp(β0+ β1x). (a) Show that f (x) is a monotonic function.
(b) Finding the inverse function of f (x), derive the logit transforma-tion in (4.4).
4.2 Show that the log-likelihood function for the parameter vectorβ of the multiple logistic regression model in (4.11) is given by (4.17).
4.3 Show that the first and second derivatives of the log-likelihood func-tion (β) in (4.17) with respect to β are given by (4.18) and (4.19), respectively.
4.4 Consider the function
f (t)= β1exp #(β0+ β1t)− exp(β0+ β1t)$, (β1> 0).
Let
y = F(x) =
" x
−∞f (t)dt, −∞ < x < ∞.
(a) Show that F(x) is given by
y = F(x) = 1 − exp#− exp(β0+ β1x)$, 0 < y < 1, called the complementary log-log model.
(b) Show that F(x) is a strictly increasing function.
(c) Show that the inverse linearizing transformation g(y) (e.g., g(y) = β0+ β1x) for y= F(x) is given by
g(y) = log{− log(1 − y)}.
4.5 Consider the monotonic increasing function y = Φ(x− μ
σ )
= 1
√2πσ
" x
−∞exp
−1 2
(t− μ σ
)2 dt,
whereΦ(z) is the standard normal distribution function. Then, taking
−μ/σ = β0and σ−1= β1, the inverse transformationΦ−1(y) yields a linear model β0+ β1x, called the probit model.
Chapter 5