• No results found

2.2 Monotonic Regression

2.2.5 Summary

but the fit may also be defined in terms of the likelihood.

Bacchetti (1989) proposes the non-parametric cyclic pool adjacent violators (CPAV) algo-

rithm to obtain the estimatesfb1, . . . ,fbm in (2.2.5). The CPAV algorithm combines the PAVA

(Section 2.2.2) and the alternating conditional expectations algorithm (Breiman and Friedman,

1. Set initial valuesfbj(xi,j) =byi,j, i∈ {1, . . . , n}, j ∈ {1, . . . , m}.

2. Optimize the fit with respect to f1b via the PAVA while keeping f2, . . . ,b fbm fixed. Then

optimize f2b subject to f1,b f3, . . . ,b fbm being fixed, followed by f3b and so on. The initial

setup for each update of fbj is as in the PAVA: fbj(xi,j) =ybi,j i∈ {1, . . . , n}.

3. Apply Step 2 until convergence is reached.

Morton-Jones et al. (2000) extend the CPAV algorithm to settings with additional linear pre- dictors. The authors apply their algorithm in an epidemiological framework in order to model the relationship between risk and exposure. Such regression problems with both linear and

smooth functions are termed semi-parametric additive monotonic and are widely studied; see

Cheng (2009); Cheng et al. (2012); Rueda (2013); Yu (2014); Chen and Samworth (2016) and references therein.

A substantially different class of approaches considers the estimation off1,. . . ,fm via mono-

tonic regression splines. Hence, the estimates f1, . . . ,b fbm in (2.2.5) are constrained to be both

monotonic and continuous (smooth). The additional constraint of continuity is plausible in some applications, e.g. for the estimation of dose-response curves. In such cases, the application of

regression splines may be preferred to the CPAV algorithm which estimatesf1, . . . , fm in form

of step functions. Ramsay (1988) proposes to fit monotone regression splines using integrated splines (I-splines) which are derived using M-splines (Curry and Schoenberg, 1966). The mod- elling approach is applied in several areas, e.g. toxicology (De Boer et al., 2002), and is used in a setting with multiple covariates by Tutz and Leitenstorfer (2007).

Kelly and Rice (1990) and He and Shi (1998) consider monotone regression using B-splines

with each estimatefbj, j = 1, . . . , mbeing of the form

b fj = S X s=1 αj,sBj,s(x), (2.2.6)

subject to αj,s ∀ j ∈ {1, . . . , m}, s ∈ {1, . . . , S} preserving monotonicity. A similar setting to

(2.2.6) is also considered by Wang and Small (2015) and Wang and Xue (2015), and applied by

Leitenstorfer and Tutz (2007). While these approaches estimate f1, . . . , fm using optimization

methods, Neelon and Dunson (2004) and Cai and Dunson (2007) propose a Bayesian setting for single and multiple outcomes, respectively.

dimensional problems with the number of covariates,m, exceeding the number of observations,

n. In a linear regression setting, the least absolute shrinkage and selection operator (LASSO)

(Tibshirani, 1996) is a well-known technique to minimize the number of non-zero covariate effects. The approach by Fang and Meinshausen (2012) is motivated by the LASSO and extends

(2.2.5) by adding a penalty for high variations of fb1, . . . ,fbm. Estimates fb1, . . . ,fbm are then

obtained via a modified CPAV algorithm and Fang and Meinshausen (2012) term their method

the LASSO Isotone (LISO). While LISO yields step-wise constant functionsf1, . . . ,b fbm, Bergersen

et al. (2014) propose an alternative method which derives estimates f1, . . . ,b fbm using I-splines.

2.2.4 Bayesian Nonparametrics

Bayesian nonparametric methods typically refer to Bayesian models with a very large or infinite number of parameters. Several monotonic regression approaches in Bayesian nonparametrics are based on the Dirichlet process by Ferguson (1973). A Dirichlet process is defined by a base

distributionF0 and a concentration parameterα, DP(F0, α), and its realizations are probability

distributions. The sampled distributions are almost surely discrete even though F0 may be

continuous. Dirichlet processes are widely applied in Bayesian nonparametrics (Antoniak, 1974; Jain and Neal, 2004; Kim et al., 2006; Zhou et al., 2012), e.g. for clustering, infinite mixture models and hidden Markov models (Hjort et al., 2010). Several extensions are proposed in the literature, e.g. Teh et al. (2006) and Rodr´ıguez et al. (2008) introduce a hierarchical and nested modelling framework, respectively. Gelfand et al. (2005) and Duan et al. (2007) define spatial Dirichlet process models from a geostatistical perspective.

In the context of monotonic regression, several authors propose a Dirichlet process based ap- proach. Gelfand and Kuo (1991) consider the estimation of monotone dose-response or potency curves and propose an ordered Dirichlet process prior. The authors further propose a second prior which, in contrast to the Dirichlet process prior, is conjugate and based on a product of Beta distributions. Additionally, Gelfand and Kuo (1991) propose an inference procedure for

two functionsλ1, λ2 with λ1(x)≤λ2(x), ∀ x∈ X ⊆R. Lavine and Mockus (1995) and Kottas

et al. (2002) extend this approach, e.g. the former incorporates the estimation of an unknown error density via a second Dirichlet process. The models considered in these three publications are, however, slightly limited since the estimated functions are strictly increasing. Therefore, Dunson (2005) introduce a mixture prior which allows for flat areas of the unknown regression function.

A different set of approaches consider monotonic regression under the constraint ofλbeing

differentiable. Hence,λcan be formally represented by

λ(x) =β0+β1

Z x

0

f(u) du, (2.2.7)

where f(x) is non-negative, β0 ∈ R and β1 ≥ 0. Ramsay (1998) consider the case of f in

(2.2.7) being the exponential of an unconstrained function g. Ramsay and Silverman (2005)

then apply this approach to model the tibia data collected by Hermanussen et al. (1998). While

Ramsay and Silverman (2005) estimate g via B-splines, Shively et al. (2009) define a Wiener

process as prior distribution and estimate f via a MCMC algorithm. Similarly to Gelfand and

Kuo (1991), the estimated functions of the form (2.2.7) are strictly monotone. Bornkamp and

Ickstadt (2009) consider a setting as in (2.2.7) and setf as a probability density function of a

continuous bounded random variable. The distribution function associated tof is then modelled

via a mixture of parametric probability distributions, with a general discrete random measure as a prior distribution.

Many other approaches have been developed and are briefly mentioned here. Shively et al. (2009) propose, additionally to a framework as in (2.2.7), a fixed-knot regression spline approach,

allowing the first derivative, λ0(x), to take the value 0. Shively et al. (2011) then extend this

spline approach and introduce a free-knot regression spline model. Riihim¨aki and Vehtari (2010)

estimate monotonic functions using Gaussian processes and results indicate a good performance for smooth surfaces. Lin and Dunson (2014) introduce a monotonic regression method which uses projections of Gaussian processes and optimization based approaches, e.g. the PAVA in Section 2.2.2. Finally, Bornkamp et al. (2010) and Wang and Dunson (2011) consider monotonicity in the wider framework of density functions.

In this work, we focus on the approaches by Holmes and Heard (2003) and Saarela and Arjas (2011), the latter extending the monotone modelling framework of the former to higher dimensions. The principal idea lies in defining the monotone (monotonic) function via a marked point process with constraints which ensure monotonicity. Points are then iteratively added, deleted or shifted using the reversible jump MCMC (RJMCMC) algorithm by Green (1995) which allows to traverse models of varying dimension. Each sampled marked point process then corresponds to a piecewise constant function via imposition of a monotonic relation on the functional space. In conclusion, this method is a Bayesian analogue to the optimization based

approaches in 2.2.2 in the sense that the estimated functions are piecewise constant. Specific details on the marked point process formulation and the RJMCMC algorithm are omitted here since they are detailed in Section 4.2.

2.2.5 Summary

The publications mentioned in the previous Sections 2.2.2 through 2.2.4 illustrate the variety of

methodologies and applications in which monotonicity of the underlying regression functionλ

is considered. Approaches have been proposed to obtain an estimate bλof either continuous or

non-continuous shape. Conditional on the application, smooth or discontinuous functions may be preferred. While the optimization based approaches in 2.2.2 generally lead to a piecewise

constant estimate, smooth estimates for λare obtained via the monotonic regression splines in

Sections 2.2.3 and 2.2.4.

Monotonic regression has two very positive aspects. Firstly, it provides a valuable alternative if multiple linear regression leads to a poor model fit. Since each linear model with non-negative covariate effects fulfills the monotonic constraint in expression (2.2.1), it can be replaced by a monotonic model without the requirement of additional assumptions. Secondly, the assumption of monotonicity can be tested using the methods by Bowman et al. (1998) and Ghosal et al. (2000). Furthermore, Scott et al. (2015) introduce a test for monotonicity based on Bayesian nonparametric statistics. From an applied perspective, aspects of monotonicity may also be discussed with operators and engineers. If, however, there exists uncertainty on the monotonicity

ofλ, Tibshirani et al. (2011) introduce a modelling framework which penalizes non-monotonicity

but does not preclude it.

Nevertheless, there exist some limitations to the monotonic regression framework. Most im- portantly, extrapolation of the regression function is hardly possible since functions are generally estimated locally and non-parametrically. This issue does, however, not solely occur in mono- tonic regression framework but exists for flexible non-parametric regression models in general. Furthermore, the constraint of monotonicity is generally less restrictive for higher dimensions

and leads to overfitting, unlessλis expressed in terms of an additive model. Consider two inde-

pendent samples,u,vfrom the input spaceX,u,v∼ZX, whereZX refers to the distribution of

the covariates. The monotonic constraint (2.2.1) only applies ifu ≤v component-wise or vice

versa. It can be easily verified that for a given dimension m, the probability for a monotonic

number of points,n, required to obtain a sufficiently good fit increases exponentially with the dimension of the covariate space.

Related documents