Package ‘metafuse’
November 7, 2015
Type Package
Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.0-1
Date 2015-11-06
Author Lu Tang, Peter X.K. Song
Maintainer Lu Tang<[email protected]>
Description For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models.
License GPL-2
Imports glmnet, Matrix, MASS Repository CRAN NeedsCompilation no Date/Publication 2015-11-07 07:24:15
R
topics documented:
metafuse-package . . . 1 datagenerator . . . 3 metafuse . . . 4 metafuse.l . . . 6 Index 8metafuse-package Fused Lasso Approach in Regression Coefficient Clustering
Description
For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models.
2 metafuse-package Details
Very simple to use. Accepts X, y, and sid (study ID) for regression models. Index of help topics:
datagenerator simulate data
metafuse fit a GLM with fusion penalty for data
integraion
metafuse-package Fused Lasso Approach in Regression Coefficient Clustering
metafuse.l fit a GLM with fusion penalty for data
integraion Author(s)
Lu Tang, Peter X.K. Song Maintainer: Lu Tang <[email protected]> References
Lu Tang, Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. In preparation.
Fei Wang. Development of Joint Estimating Equation Approaches to Merging Clustered or Longi-tudinal Datasets from Multiple Biomedical Studies. PhD Thesis, The University of Michigan, 2012. (A Biometrics paper is tentatively accepted. We will revise this a later time.)
Jiahua Chen and Zehua Chen. Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759-771, 2008.
Xin Gao and Peter X-K Song. Composite likelihood bayesian information criteria for model se-lection in high-dimensional data. Journal of the American Statistical Association, 105(492):1531-1540, 2010.
Examples
########### Generate Data ########### n <- 200 # sample size in each study K <- 10 # number of studies
p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size
# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept
0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)
# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyID)
y <- data$y sid <- data$group
datagenerator 3
########### metafuse runs ###########
# fuse slopes of X1 (it is heterogeneous with 2 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse slopes of X2 (it is heterogeneous with 3 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse all three covariates
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse all three covariates, with sparsity penalty
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fit metafuse at a given lambda
metafuse.l(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, lambda=0.5)
datagenerator simulate data
Description
Simulate data for demonstration of flarcc.
Usage
datagenerator(n = n, beta0 = beta0, family = "gaussian", seed = seed)
Arguments
n sample size for each study, a vector of lengthK, the number of studies; can also be an scalar, to specify equal sample size
beta0 coefficient matrix, with dimensionK * p, where K is the number of studies and p is the number of covariates
family "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response
seed set random seed for data generation
Details
4 metafuse Value
a simulated data frame will be returned, containingy,X, and study IDsid Examples
n <- 200 # sample size in each study K <- 10 # number of studies
p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size
# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept
0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)
# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)
metafuse fit a GLM with fusion penalty for data integraion
Description
Fit a GLM with fusion penalty on coefficients within each covariate, generate solution path for model selection.
Usage
metafuse(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)),
family = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC", verbose = TRUE, plots = TRUE, loglambda = TRUE)
Arguments
X a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies
y a vector of response, with length N, the total sample size of all studies sid study id, numbered from 1 to K
fuse.which a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept
family "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response
intercept if TRUE, intercept will be included in the model
alpha the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) criterion "AIC" for AIC, "BIC" for BIC, "EBIC" for extended BIC
metafuse 5
verbose if TRUE, output fusion events and tuning parameter lambda plots if TRUE, create plots of solution paths and clustering trees loglambda if TRUE, lambda will be plot in log-10 transformed scale Details
Adaptive lasso penalty is used. See Zou (2006) for detail. Value
a list containing the following items will be returned:
family the model type
criterion model selection criterion used
alpha the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat the estimated coefficients
betainfo additional information about the fit, including degree of freedom, lambda opti-mal, lambda fuse, friction of fusion for each covariate
Examples
n <- 200 # sample size in each study K <- 10 # number of studies
p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size
# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept
0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)
# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyID)
y <- data$y sid <- data$group
X <- data[,-c(1,ncol(data))]
# fuse slopes of X1 (it is heterogeneous with 2 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse slopes of X2 (it is heterogeneous with 3 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse all three covariates
6 metafuse.l
criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE) # fuse all three covariates, with sparsity penalty
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
metafuse.l fit a GLM with fusion penalty for data integraion
Description
Fit a GLM with fusion penalty on coefficients within each covariate at given lambda. Usage
metafuse.l(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)), family = "gaussian", intercept = TRUE, alpha = 0, lambda = lambda) Arguments
X a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies
y a vector of response, with length N, the total sample size of all studies sid study id, numbered from 1 to K
fuse.which a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept
family "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response
intercept if TRUE, intercept will be included in the model
alpha the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) lambda tuning parameter for fusion penalty
Details
Adaptive lasso penalty is used. See Zou (2006) for detail. Value
a list containing the following items will be returned:
family the model type
alpha the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat the estimated coefficients
metafuse.l 7 Examples
n <- 200 # sample size in each study K <- 10 # number of studies
p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size
# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept
0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)
# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyID)
y <- data$y sid <- data$group
X <- data[,-c(1,ncol(data))] # fit metafuse at a given lambda
metafuse.l(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, lambda=0.5)