glmboost(..., center = TRUE) As a consequence of the finding that the centering of base-learners without intercept is of great importance (see Section 2.3.1), we changed the default to center = TRUE. Furthermore, columns of design matrices that correspond to contrasts are now also centered (if an intercept is specified in the model) leading to (much) faster risk minimization.
cvrisk() For parallel computations of the cross-validated risk, the interface of cvrisk() now also allows to use, for example, the packages multicore (Ur- banek 2011) or snow(Tierney et al.2011).
Various improvements in storage, speed, and stability were included intomboost. They make, for example, use of the package Matrix (Bates and Maechler 2011).
Convenience Functions
extract() The newly added generic function allows users to extract various char- acteristics of a single base-learner or a fitted model.
coef.glmboost() The intercept term is now adjusted for internally centered covari- ates (i.e., center = TRUE; see Section 2.3.1). Additionally, the new argument off2int = TRUE adds the offset to the intercept.
plot.mboost The functionality of the (experimental)plot function was improved. For example, the handling of the whichargument is now better. In case of bi- variate effectslatticeplots are used and the resulting lattice object is returned. Furthermore, the handling of varying coefficients was improved.
lines.mboost The newly added (experimental)linesfunction allows to easily plot multiple partial effects into one single devise.
B.2. Different Implementation of
bols
In this thesis an implementation of the linear base-learner (bols) different to the linear base-learner in the current version of mboost was used. They differ in the implementation of the option ‘intercept = FALSE’ for categorical variables. In the 2.x series of mboost, base-learners are used where the design matrix of the categorical variablez with ncat categories is build using
R> model.matrix(~ z - 1)
This leads to models where the explicit intercept (a column of ones) is removed but each category of z is specified by a separate mean value. The result is a design matrix X with ncat columns again, where the element Xij = 1 ⇐⇒ zi = j and Xij = 0 otherwise. This coding results from the above code irrespective of the specified contrasts. We call this ‘mean coding’. Actually, we would prefer to have a base-learner where the coding is specified by the contrasts and the column of ones is completely dropped, i.e., the intercept is truly removed. This is achieved by using something similar to
R> X <- model.matrix(z)
R> X <- X[ , -1, drop = FALSE]
The alternative definition used in this thesis is given in the following code. The main differences reflect the change of code discussed above. They can be found in the function that creates the design matrix for linear base-learnersX_ols. Changes in base-learner bols itself and other changes inX_ols are only required to call the correct functions2 if the modified code is sourced after package mboost is loaded. The changes are highlighted in italics.
bols <- function(..., by = NULL, index = NULL, intercept = TRUE, df = NULL,
lambda = 0, contrasts.arg = "contr.treatment") {
if (!is.null(df)) lambda <- NULL
cll <- match.call()
cll[[1]] <- as.name("bols")
mf <- list(...)
if (length(mf) == 1 && (mboost:::isMATRIX(mf[[1]]) || is.data.frame(mf[[1]]))) {
mf <- mf[[1]]
### spline bases should be matrices
if (mboost:::isMATRIX(mf) && !is(mf, "Matrix")) class(mf) <- "matrix"
B.2 Different Implementation ofbols 131
} else {
mf <- as.data.frame(mf)
cl <- as.list(match.call(expand.dots = FALSE))[2][[1]] colnames(mf) <- sapply(cl, function(x) as.character(x)) }
if(!intercept && !any(sapply(mf, is.factor)) && !any(sapply(mf, function(x){uni <- unique(x);
length(uni[!is.na(uni)])}) == 1)){ ## if no intercept is used and no covariate is a factor
## and if no intercept is specified (i.e. mf[[i]] is constant) if (any(sapply(mf, function(x)
abs(mean(x, na.rm=TRUE) / sd(x,na.rm=TRUE))) > 0.1)) ## if covariate mean is not near zero
warning("covariates should be (mean-) centered if ", sQuote("intercept = FALSE")) } vary <- "" if (!is.null(by)){ stopifnot(is.data.frame(mf)) mf <- cbind(mf, by)
colnames(mf)[ncol(mf)] <- vary <- deparse(substitute(by)) }
CC <- all(mboost:::Complete.cases(mf))
### option
DOINDEX <- is.data.frame(mf) &&
(nrow(mf) > options("mboost_indexmin")[[1]] || is.factor(mf[[1]])) if (is.null(index)) {
### try to remove duplicated observations or ### observations with missings
if (!CC || DOINDEX) {
index <- mboost:::get_index(mf)
mf <- mf[index[[1]],,drop = FALSE] index <- index[[2]]
} }
ret <- list(model.frame = function()
if (is.null(index)) return(mf) else return(mf[index,,drop = FALSE]), get_call = function(){ cll <- deparse(cll, width.cutoff=500L) if (length(cll) > 1) cll <- paste(cll, collapse="") cll }, get_data = function() mf, get_index = function() index,
get_names = function() colnames(mf), get_vary = function() vary,
set_names = function(value) {
if(length(value) != length(colnames(mf)))
stop(sQuote("value"), " must have same length as ", sQuote("colnames(mf)"))
cll[[i+1]] <<- as.name(value[i]) }
attr(mf, "names") <<- value })
class(ret) <- "blg"
ret$dpp <- mboost:::bl_lin(ret, Xfun = X_ols, args = mboost:::hyper_ols(
df = df, lambda = lambda,
intercept = intercept, contrasts.arg = contrasts.arg)) return(ret)
}
X_ols <- function(mf, vary, args) {
if (mboost:::isMATRIX(mf)) { X <- mf
contr <- NULL } else {
### set up model matrix
fm <- paste("~ ", paste(colnames(mf)[colnames(mf) != vary], collapse = "+"), sep = "")
## removed: if (!args$intercept) fm <- paste(fm, "-1")
fac <- sapply(mf[colnames(mf) != vary], is.factor) if (any(fac)){
if (!is.list(args$contrasts.arg)){
txt <- paste("list(", paste(colnames(mf)[colnames(mf) != vary][fac], "= args$contrasts.arg", collapse = ", "),")") args$contrasts.arg <- eval(parse(text=txt)) } } else { args$contrasts.arg <- NULL } X <- model.matrix(as.formula(fm), data = mf, contrasts.arg = args$contrasts.arg) contr <- attr(X, "contrasts")
## newly added: if (!args$intercept)
X <- X[ , -1, drop = FALSE]
if (vary != "") {
by <- model.matrix(as.formula(paste("~", vary, collapse = "")), data = mf)[ , -1, drop = FALSE] # drop intercept DM <- lapply(1:ncol(by), function(i) {
ret <- X * by[, i]
colnames(ret) <- paste(colnames(ret), colnames(by)[i], sep = ":") ret }) if (is(X, "Matrix")) { X <- do.call("cBind", DM) } else { X <- do.call("cbind", DM) } } }