Future Work - Cross-validated Model Fitting

3.1 Cross-validated Model Fitting

3.4.4 Future Work

Much of the work done is related to the preprocessing, cleaning, and filtering of the data. A thorough investigation into the additional predictive power of isoform- level covariates on cancer survival should include additional TCGA datasets following the same procedure. A comparison of our results with a replication on non-TCGA breast cancer survival data could also yield interesting findings.

Alternatively, testing the discriminative power of concordance using a known data-generating process would aid in the interpretation of results on real world data. This could also be used to validate our preprocessing and filtering techniques.

List of References

Love, M. I., Hogenesch, J. B., and Irizarry, R. A. (2015). Modeling of rna-seq fragment sequence bias reduces systematic errors in transcript abundance estima- tion. bioRxiv, page 025767.

BIBLIOGRAPHY

Bishop, J. M., “The molecular genetics of cancer,” Science, vol. 235, no. 4786, pp. 305– 311, 1987.

Broad Institute TCGA Genome Data Analysis Center, “Analysis-ready standardized tcga data from broad gdac firehose 2016_01_28 run,” 2016. [Online]. Available: http://dx.doi.org/10.7908/C11G0KM9

Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, S., “Evaluation of statistical methods for normalization and differential expression in mrna-seq experi- ments,” BMC bioinformatics, vol. 11, no. 1, p. 1, 2010.

Cox, D. R., “Regression models and life-tables,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 187–220, 1972.

Crick, F. et al., “Central dogma of molecular biology,” Nature, vol. 227, no. 5258, pp. 561–563, 1970.

DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L., “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics, pp. 837–845, 1988.

Efron, B. and Tibshirani, R., “Empirical bayes methods and false discovery rates for microarrays,” Genetic epidemiology, vol. 23, no. 1, pp. 70–86, 2002.

Fisher, L. D. and Lin, D. Y., “Time-dependent covariates in the cox proportional- hazards regression model,” Annual review of public health, vol. 20, no. 1, pp. 145– 157, 1999.

Friedman, J., Hastie, T., and Tibshirani, R., “glmnet: Lasso and elastic-net regularized generalized linear models. version1,” 2013.

Gilbert, W., “Why genes in pieces?” Nature, vol. 271, no. 5645, p. 501, 1978.

Hanley, J. A. and McNeil, B. J., “The meaning and use of the area under a receiver operating characteristic (roc) curve.” Radiology, vol. 143, no. 1, pp. 29–36, 1982. Hastie, T., Tibshirani, R., and Wainwright, M., Statistical learning with sparsity: the

lasso and generalizations. CRC Press, 2015.

Hoerl, A. E. and Kennard, R. W., “Ridge regression: Biased estimation for nonorthog- onal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.

James, G., Witten, D., Hastie, T., and Tibshirani, R., An introduction to statistical learn- ing. Springer, 2013, vol. 6.

Kleinbaum, D. G. and Klein, M., Survival analysis. Springer, 1996.

Koziol, J. A. and Jia, Z., “The concordance index c and the mann–whitney parameter pr (x> y) with randomly censored data,” Biometrical Journal, vol. 51, no. 3, pp. 467–474, 2009.

Le Cessie, S. and Van Houwelingen, J. C., “Ridge estimators in logistic regression,” Applied statistics, pp. 191–201, 1992.

Lee, K. L. and Mark, D. B., “Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and mea- suring and reducing errors,” Statistics in medicine, vol. 15, pp. 361–387, 1996. Li, B. and Dewey, C. N., “Rsem: accurate transcript quantification from rna-seq data

with or without a reference genome,” BMC bioinformatics, vol. 12, no. 1, p. 1, 2011.

Li, J., Witten, D. M., Johnstone, I. M., and Tibshirani, R., “Normalization, testing, and false discovery rate estimation for rna-sequencing data,” Biostatistics, p. kxr031, 2011.

Love, M. I., Hogenesch, J. B., and Irizarry, R. A., “Modeling of rna-seq fragment sequence bias reduces systematic errors in transcript abundance estimation,” bioRxiv, p. 025767, 2015.

Nelder, J. A. and Baker, R. J., “Generalized linear models,” Encyclopedia of Statistical Sciences, 1972.

Pencina, M. J., D’Agostino, R. B., and Vasan, R. S., “Evaluating the added predictive ability of a new marker: from area under the roc curve to reclassification and beyond,” Statistics in medicine, vol. 27, no. 2, pp. 157–172, 2008.

Pimentel, H. “In rna-seq, 2 != 2: Between-sample normalization.” Accessed: 08/20/2016. 2014. [Online]. Available: https://haroldpimentel.wordpress.com/ 2014/12/08/in-rna-seq-2-2-between-sample-normalization/

R Development Core Team, “R: A language and environment for statistical computing,” Vienna, Austria, version 3.2.5. [Online]. Available: http://www. r-project.org

Simon, N., Friedman, J., Hastie, T., and Tibshirani, R., “Regularization paths for cox’s proportional hazards model via coordinate descent,” Journal of statistical soft- ware, vol. 39, no. 5, p. 1, 2011.

Strimmer, K., “fdrtool: a versatile r package for estimating local and tail area-based false discovery rates,” Bioinformatics, vol. 24, no. 12, pp. 1461–1462, 2008.

Tibshirani, R., “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.

Tibshirani, R. et al., “The lasso method for variable selection in the cox model,” Statis- tics in medicine, vol. 16, no. 4, pp. 385–395, 1997.

Uno, H., “Package ‘survc1’,” 2013.

Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B., and Wei, L., “On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data,” Statistics in medicine, vol. 30, no. 10, pp. 1105–1117, 2011.

Uno, H., Tian, L., Cai, T., Kohane, I. S., and Wei, L., “A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data,” Statistics in medicine, vol. 32, no. 14, pp. 2430–2442, 2013.

Van Wieringen, W. N., Kun, D., Hampel, R., and Boulesteix, A.-L., “Survival prediction using gene expression data: a review and comparison,” Computational statistics & data analysis, vol. 53, no. 5, pp. 1590–1603, 2009.

VanRossum, G. and Drake, F. L., The Python Language Reference. Python software foundation Amsterdam, Netherlands, 2010.

Verweij, P. J. and Van Houwelingen, H. C., “Penalized likelihood in cox regression,” Statistics in medicine, vol. 13, no. 23-24, pp. 2427–2436, 1994.

Wilks, S. S., “The large-sample distribution of the likelihood ratio for testing compos- ite hypotheses,” The Annals of Mathematical Statistics, vol. 9, no. 1, pp. 60–62, 1938.

Zou, H. and Hastie, T., “Regularization and variable selection via the elastic net,” Jour- nal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005.

In document Risk Classification in High Dimensional Survival Models (Page 71-74)