Constrained Effects for Bivariate P-Splines

4. Constrained Regression in Multivariate STAR Models

4.4. Constrained Effects for Bivariate P-Splines

In Section 2.3.3, bivariate P-splines were introduced. As bivariate P-splines are directly derived from univariate P-splines, they can be extended — basically in the same manner as univariate P-splines — to include monotonicity constraints or cyclicity constraints. In the following sections we will shortly derive both concepts for bivariate P-splines.

4.4.1. Bivariate, Monotonic P-Splines

If it is desired to have a monotonic interaction it is not sufficient to specify an interaction of monotonic effects. Monotonicity of marginal effects does not transfer to monotonicity of an interaction surface: This can be easily verified by considering

4.4 Constrained Effects for Bivariate P-Splines 83

the interaction of two increasing, linear functions

f(x1,x2) = x1·x2, x1,x2 ∈ [−1, 1].

For any (fixed) value x2 < 0, the function f(x1,x2) is linear decreasing in x1 and

for any x2 > 0, the function f(x1,x2) is linear increasing in x1. The function f

is monotonic in both, x1 and x2 given the other variable. However, the direction

of monotonicity changes depending on the sign of the other variable. The resulting surface is highly non-monotonic (cf. Figure 4.7). Thus, instead of the marginal functions, the complete surface needs to be constrained in order to achieve a monotonic surface. x1 −1 0 1 x2 −1 0 1 f(x1, x2) −1 0 1

Figure 4.7.:Non-monotonic interaction surface of f(x1,x2) =x1·x2, x1,x2∈ [−1, 1]. To obtain a monotonic surface estimate, we utilize bivariate P-splines and add monotonic constraints for the row- and column-wise differences of the matrix of coefficients (Eq. 2.21). As proposed by Bollaerts et al. (2006), one can use two independent asymmetric penalties to allow for different directions of monotonicity — i.e., increasing in one variable, sayx1, decreasing in the other variable, say x2 —

or different prior beliefs in monotonicity resembled by different penalty parameters

λ. Letudenote the(n×1)negative gradient vector (2.4), or an arbitrary continuous

response in settings other than boosting. Letx_i = (xi1,xi2), i =1, . . . ,n, denote the

observations of variables x1 and x2, and let B =

B(x1), . . . ,B(xn) >

denote the

(n×JK) design matrix comprising the bivariate B-spline bases ofxi(Eq. 2.18) with J knots in the direction ofx1 andK knots in the direction ofx2. The corresponding

coefficient vector (2.17) of dimension (JK×1) is denoted by β. Monotonicity is

enforced by the asymmetric difference penalties

Jasym,1(β;c) = J

∑

j=c+1 K

∑

k=1 v(_jk1)(∆₁cβjk)2 =β>(D1⊗IK)>V1(D1⊗IK)β (4.12) J_asym,2(β;c) = J

∑

j=1 K

∑

k=c+1 v(_jk2)(∆₂cβjk)2 =β>(IJ⊗D2)>V2(IJ⊗D2)β (4.13)

where ∆₁c are the column-wise differences and ∆c₂ the row-wise differences, i.e.

∆1

1βjk = βjk −β(j−1)k and ∆12βjk = βjk −βj(k−1). The corresponding difference

matrices are denoted by D1and D2, andIJ andIK denote identity matrices of rank J and K, respectively. The weightsv(_jkl), l =1, 2 are specified in analogy to (4.3) for monotonic increasing estimates (c=1) as

v(_jkl) =    0 if∆c_lβjk >0 1 if∆c_lβjk ≤0. (4.14)

For the matrix notation, the weights are collected in the diagonal matrices V_l =

diag(v(l)). Changing the inequality sign in (4.14) leads to monotonic decreasing function estimates, while differences of order c = 2 lead to convex or concave constraints.

The constraint estimation problem for monotonic surface estimates in matrix notation becomes

Q(β) = (u−Bβ)>(u−Bβ) +λ1Jspatial(β;d)

+λ21Jasym,1(β;c)

+λ22Jasym,2(β;c),

(4.15)

where J_spatial(β;d) is the standard bivariate P-spline penalty of order d and λ1 is

the corresponding penalty parameter.

For univariate monotonic P-splines, the usual P-spline penalty and the asymmetric penalty have the same form except for the weights V. This can be also shown forbivariate P-splines:

Theorem 4.1 LetVbe an identity matrix of appropriate dimension, i.e.,Vis dropped from

the equation. Then Equation(4.12)can be simplified toβ>(K1⊗IK)βand Equation(4.13) can be simplified to β>(IJ ⊗K2)β. With λ21 = λ22, the sum of the penalties(4.12) and

4.4 Constrained Effects for Bivariate P-Splines 85

(4.13)is equal to the bivariate P-spline penalty(2.20).

Proof As the transpose is distributive over the Kronecker product (Steeb 1991, p.

12) and as we assumeV=I

(D₁⊗I_K)>V(D₁⊗I_K) = (D>₁ ⊗I>_K)(D₁⊗I_K). As the matricesD>₁D1and I>_KIK exist, it holds (Steeb 1991, p. 16)

(D>₁ ⊗I>_K)(D₁⊗IK) = (D>1D1)⊗(I>KIK)

= (K1⊗IK).

The proof that Equation (4.13) can be simplified toβ>(IJ⊗K2)βis straightforward

along the lines of this proof.

From Theorem 4.1, one can conclude that the asymmetric penalties only differ from the usual bivariate P-spline penalty in the weight matrixV.

Example for monotonic surface estimation As an example for bivariate, mono-

tonic smoothing, let us consider uniformly distributed observations xi1,xi2

i.i.d.

∼

U[−2, 3], i = 1, . . . ,n, and the outcome yi = 0.5+x2_i₁·(xi2+3) +εi, where εi i.∼i.d.

N(0, 0.52). The function (without the error term εi) is depicted in Figure 4.8(a). There is a clear non-monotonic effect, which becomes especially apparent for large values ofx2. Figure 4.8(b) shows the estimated function where the non-monotonic

function is estimated by monotonic, bivariate P-splines. The resulting estimated function appears to be monotonic. As a comparison we estimated the same model without monotonicity constraint. To check if the apparent monotonicity truly holds, we plotted the row- and column-wise differences of the coefficients (see Figure 4.9). The differences of the unconstrained model vary from negative to pos- itive values while differences from the constrained model are always greater or equal to zero. Hence, one can conclude that the monotonicity constrained estimate is monotonic while the unconstrained estimate is clearly non-monotonic.

4.4.2. Bivariate, Cyclic P-Splines

Based on bivariate P-splines, cyclic constraints in both directions of x1 and x2 are

f(x1, x2)

(a) True Function

f(x1, x2)

(b) Estimated, Monotonic Function

Figure 4.8.:Bivariate, true effect and bivariate, monotonic estimate thereof. Both figures are plotted using the same range for comparability.

in the direction of x2, one builds the univariate, cyclic design matrices B(_cyclic1) and

B(_cyclic2) for x1 and x2, respectively. The bivariate design matrix then is

B_cyclic = (B(_cyclic1) ⊗e>_K)(e>_J ⊗B(_cyclic2) ), as in Equation (2.19).

With the univariate,cyclicdifference matricesDe(1) andDe(2) forx₁andx₂, respectively (see (Eq. 4.8) and (Eq. 4.9) for cyclic first and second order differences), we obtain cyclic penalty matrices K1 = (De(1))>De(1) and K₂ = (De(2))>De(2). Thus, in analogy to the usual bivariate P-spline penalty (Eq. 2.20) the bivariate cyclic penalty can be written as

J_{cyclic, spatial}(β) = β>(K1⊗IK+IJ⊗K2)β,

where βcollects the coefficients of the bivariate P-spline as in Equation (2.17), and

IJ andIK denote identity matrices of rankJandK, respectively. Estimation is then a straight forward application of the penalized least squares criterion as in (Eq. 2.22). An example of bivariate, cyclic splines is given in Section 4.6, where a cyclic surface is used to estimate the effect of time (during the day) and calender day on Roe Deer activity.

4.5 Monotonicity-Constrained Species Distribution Models for Red Kite 87

In document Hofner, Benjamin (2011): Boosting in structured additive models. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 94-99)