• No results found

Package metafuse. November 7, 2015

N/A
N/A
Protected

Academic year: 2021

Share "Package metafuse. November 7, 2015"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Package ‘metafuse’

November 7, 2015

Type Package

Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.0-1

Date 2015-11-06

Author Lu Tang, Peter X.K. Song

Maintainer Lu Tang<[email protected]>

Description For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models.

License GPL-2

Imports glmnet, Matrix, MASS Repository CRAN NeedsCompilation no Date/Publication 2015-11-07 07:24:15

R

topics documented:

metafuse-package . . . 1 datagenerator . . . 3 metafuse . . . 4 metafuse.l . . . 6 Index 8

metafuse-package Fused Lasso Approach in Regression Coefficient Clustering

Description

For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models.

(2)

2 metafuse-package Details

Very simple to use. Accepts X, y, and sid (study ID) for regression models. Index of help topics:

datagenerator simulate data

metafuse fit a GLM with fusion penalty for data

integraion

metafuse-package Fused Lasso Approach in Regression Coefficient Clustering

metafuse.l fit a GLM with fusion penalty for data

integraion Author(s)

Lu Tang, Peter X.K. Song Maintainer: Lu Tang <[email protected]> References

Lu Tang, Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. In preparation.

Fei Wang. Development of Joint Estimating Equation Approaches to Merging Clustered or Longi-tudinal Datasets from Multiple Biomedical Studies. PhD Thesis, The University of Michigan, 2012. (A Biometrics paper is tentatively accepted. We will revise this a later time.)

Jiahua Chen and Zehua Chen. Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759-771, 2008.

Xin Gao and Peter X-K Song. Composite likelihood bayesian information criteria for model se-lection in high-dimensional data. Journal of the American Statistical Association, 105(492):1531-1540, 2010.

Examples

########### Generate Data ########### n <- 200 # sample size in each study K <- 10 # number of studies

p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size

# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept

0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyID)

y <- data$y sid <- data$group

(3)

datagenerator 3

########### metafuse runs ###########

# fuse slopes of X1 (it is heterogeneous with 2 groups)

metafuse(X=X, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse slopes of X2 (it is heterogeneous with 3 groups)

metafuse(X=X, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse all three covariates

metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse all three covariates, with sparsity penalty

metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fit metafuse at a given lambda

metafuse.l(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, lambda=0.5)

datagenerator simulate data

Description

Simulate data for demonstration of flarcc.

Usage

datagenerator(n = n, beta0 = beta0, family = "gaussian", seed = seed)

Arguments

n sample size for each study, a vector of lengthK, the number of studies; can also be an scalar, to specify equal sample size

beta0 coefficient matrix, with dimensionK * p, where K is the number of studies and p is the number of covariates

family "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response

seed set random seed for data generation

Details

(4)

4 metafuse Value

a simulated data frame will be returned, containingy,X, and study IDsid Examples

n <- 200 # sample size in each study K <- 10 # number of studies

p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size

# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept

0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)

metafuse fit a GLM with fusion penalty for data integraion

Description

Fit a GLM with fusion penalty on coefficients within each covariate, generate solution path for model selection.

Usage

metafuse(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)),

family = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC", verbose = TRUE, plots = TRUE, loglambda = TRUE)

Arguments

X a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies

y a vector of response, with length N, the total sample size of all studies sid study id, numbered from 1 to K

fuse.which a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept

family "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response

intercept if TRUE, intercept will be included in the model

alpha the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) criterion "AIC" for AIC, "BIC" for BIC, "EBIC" for extended BIC

(5)

metafuse 5

verbose if TRUE, output fusion events and tuning parameter lambda plots if TRUE, create plots of solution paths and clustering trees loglambda if TRUE, lambda will be plot in log-10 transformed scale Details

Adaptive lasso penalty is used. See Zou (2006) for detail. Value

a list containing the following items will be returned:

family the model type

criterion model selection criterion used

alpha the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat the estimated coefficients

betainfo additional information about the fit, including degree of freedom, lambda opti-mal, lambda fuse, friction of fusion for each covariate

Examples

n <- 200 # sample size in each study K <- 10 # number of studies

p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size

# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept

0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyID)

y <- data$y sid <- data$group

X <- data[,-c(1,ncol(data))]

# fuse slopes of X1 (it is heterogeneous with 2 groups)

metafuse(X=X, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse slopes of X2 (it is heterogeneous with 3 groups)

metafuse(X=X, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=TRUE, alpha=0, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse all three covariates

(6)

6 metafuse.l

criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE) # fuse all three covariates, with sparsity penalty

metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

metafuse.l fit a GLM with fusion penalty for data integraion

Description

Fit a GLM with fusion penalty on coefficients within each covariate at given lambda. Usage

metafuse.l(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)), family = "gaussian", intercept = TRUE, alpha = 0, lambda = lambda) Arguments

X a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies

y a vector of response, with length N, the total sample size of all studies sid study id, numbered from 1 to K

fuse.which a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept

family "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response

intercept if TRUE, intercept will be included in the model

alpha the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) lambda tuning parameter for fusion penalty

Details

Adaptive lasso penalty is used. See Zou (2006) for detail. Value

a list containing the following items will be returned:

family the model type

alpha the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat the estimated coefficients

(7)

metafuse.l 7 Examples

n <- 200 # sample size in each study K <- 10 # number of studies

p <- 3 # number of covariates in X (including intercept) N <- n*K # total sample size

# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept

0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyID)

y <- data$y sid <- data$group

X <- data[,-c(1,ncol(data))] # fit metafuse at a given lambda

metafuse.l(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1, lambda=0.5)

(8)

Index

∗Topic

data

datagenerator,3 ∗Topic

generator

datagenerator,3 ∗Topic

package

metafuse-package,1 datagenerator,3 metafuse,4 metafuse-package,1 metafuse.l,6 8

References

Related documents