This may be the author’s version of a work that was submitted/accepted for publication in the following source:
Senanayake, Sameera,White, Nicole,Graves, Nicholas, Healy, Helen, Ba- boolal, Keshwar, &Kularatna, Sanjeewa
(2019)
Machine learning in predicting graft failure following kidney transplantation:
A systematic review of published predictive models.
International Journal of Medical Informatics, 130, Article number:
1039571-10.
This file was downloaded from: https://eprints.qut.edu.au/200343/
2019 Elsevier B.V.c
This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the docu- ment is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recog- nise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to [email protected]
License: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
Notice: Please note that this document may not be the Version of Record (i.e. published version) of the work. Author manuscript versions (as Sub- mitted for peer review or as Accepted for publication after peer review) can be identified by an absence of publisher branding and/or typeset appear- ance. If there is any doubt, please refer to the published source.
https://doi.org/10.1016/j.ijmedinf.2019.103957
1 Machine learning in predicting graft failure following kidney transplantation: a systematic review of published predictive models
Sameera Senanayake a, Nicole White a, Nicholas Graves a, Helen Healy b,c, Keshwar Baboolalb,c, Sanjeewa Kularatna a
a Australian Centre for Health Service Innovation, Queensland University of Technology, Australia
b Royal Brisbane Hospital for Women, Brisbane, Australia
c School of Medicine, University of Queensland, Australia
*Address for correspondence Sameera Senanayake
Australian Centre for Health Services Innovation,
School of Public Health, Institute of Health and Biomedical Innovation, Queensland University of Technology,
60 Musk Ave, Kelvin Grove, QLD 4059, Australia [email protected] +61450865361
Abstract – 279
Body of the article - 3605
2 Abstract
Introduction
Machine learning has been increasingly used to develop predictive models to diagnose different disease conditions. The heterogeneity of the kidney transplant population makes predicting graft outcomes extremely challenging. Several kidney graft outcome prediction models have been developed using machine learning, and are available in the literature.
However, a systematic review of machine learning based prediction methods applied to kidney transplant has not been done to date. The main aim of our study was to perform an in-depth systematic analysis of different machine learning methods used to predict graft outcomes among kidney transplant patients, and assess their usefulness as an aid to decision-making.
Methods
A systemic review of machine learning methods used to predict graft outcomes among kidney transplant patients was carried out using a search of the Medline, the Cumulative Index to Nursing and Allied Health Literature, EMBASE, PsycINFO and Cochrane databases.
Results
A total of 295 articles were identified and extracted. Of these, 18 met the inclusion criteria.
Most of the studies were published in the United States after 2010. The population size used to develop the models varied from 80 to 92,844, and the number of features in the models ranged from 6 to 71. The most common machine learning methods used were artificial neural networks, decision trees and Bayesian belief networks. Most of the machine learning based predictive models predicted graft failure with high sensitivity and specificity. Only one machine leering based prediction model had modelled time-to-event (survival) information.
Seven studies compared the predictive performance of machine learning models with traditional regression methods and the performance of machine learning methods was found to be mixed, when compared with traditional regression methods.
Conclusion
There was a wide variation in the size of the study population and the input variables used.
However, the prediction accuracy provided mixed results when machine learning and
3 traditional predictive methods are compared. Based on reported gains in predictive performance, machine learning has the potential to improve kidney transplant outcome prediction and aid medical decision making
Keywords : Machine learning; Predictive models; Kidney transplant; Graft failure
4 1 Introduction
1.1 Graft failure after kidney transplant
Increasing prevalence of Chronic Kidney Disease (CKD) and end stage of kidney disease over recent years has resulted in increased demand for kidney replacement therapy (KRT) (1, 2). Among the available KRT modalities, kidney transplantation has demonstrated superior quality of life and survival rates (3). However, health systems around the world have not been able to meet the growing demand for kidney grafts, as evidenced by the increased prevalence of other KRT modalities (4). Including kidney dialysis and transplant, current demand is estimated at 2,692 per million population in Japan (2015)(5), 1700 per million population in the United States (2009)(6) and 782 per million population in the European Union (2013)(7).
The ability to predict graft failure across different cohorts is crucial in systems of organ allocation to minimise the flow of people returning to an already-burdened waiting list (8). Models that can accurately predict graft failure following transplant may therefore help inform medical decision-making. A number of predictive models based on regression methods (eg: logistic and cox regression) are currently being used to predict graft failure among people with kidney transplants (9-12). These models have yielded mixed predictive power, therefore motivating the need for alternative modelling approaches.
1.2 Machine learning in predicting graft failure following kidney transplant Machine learning (ML) is a suite of methods whose theoretical construct may lead to
improved predictive performance over conventional statistical modelling (13)(Supplementary File 1) ML is an efficient way of analysing large quantities of data and identifying hidden associations in complex data sets (14, 15). ML has evolved dramatically over recent decades and is already commonly used in medical diagnostics (16, 17). Its use in building predictive models to diagnose different disease conditions continues to expand (18-21).
For patients who receive KRT, the presence of multiple co-existing comorbidities and the complex nature of the immune response makes predicting graft outcomes extremely challenging. Several kidney graft outcome models using ML methodology have been published (8, 22-24). However, the comparative performance of these models is unclear,
5 therefore limiting their translation into clinical practice. To address this uncertainty, we conducted a systematic analysis of ML methods applied to predict graft outcomes among kidney transplant recipients. Our results are intended to inform on the utility of ML-based predictive models in clinical decision making
2 Methods Search Strategy
A systematic review was undertaken to identify all published studies where ML methods were used to predict graft outcomes among kidney transplants recipients.
Searches accessed the Medline, the Cumulative Index to Nursing and Allied Health
Literature (CINAHL), EMBASE, PsycINFO and Cochrane databases by using pre-specified key words (Supplementary material 2). Reference lists of retrieved articles and review articles in the field were also searched to identify additional publications that met predefined
inclusion and exclusion criteria (Table 1).
Table 1: Inclusion and exclusion criteria for review Inclusion criteria
• Primary development of a clinical prediction model to predict long term graft survival following kidney transplant among Chronic Kidney Disease patients
• Models based on ML algorithms
• Full text article available
• Based in an adult patient population
• Written in English Exclusion criteria
• Paediatric patients
• Predictive models to predict the graft survival following acute renal failure
• Non-English references and conference abstracts were excluded.
• Histology and molecular level based predictive models
• Prediction models for acute rejection
• Prediction models on sub-populations (CKD patients with SLE)
• Models not based on ML algorithms
• Did not contain an original analysis (e.g. editorials, reviews)
• Did not provide full details on methods (e.g. letters)
Data extraction
To identify the ML methods used, key details of methodology and results were recorded on a data extraction sheet. Data extraction was conducted by two independent reviewers (SS and SK) and discrepancies were resolved by discussion. Data elements extracted included
6 study name, year of publication, country, study population, feature selection method used, the type of input variables (pre-transplant and/or post-transplant), ML method used, size of the training and validation data sets, validation methods and results, and follow up
duration.
Input variables used in the ML models were organised into three categories. Models that used donor and recipient variables available before kidney transplant only were categorised as “Pre-transplant input”. Models that used pre-transplant, peri-transplant (e.g. cold
ischemia time) and post-transplant (e.g. immunosuppressive regimen) input variables were categorised as “Pre and post-transplant input”. Finally, models that used post-transplant variables only were categorised as “Post-transplant input”.
Quality assessment
The quality of studies included in the review were assessed by criteria introduced by Qiao in 2019(25) (Supplementary material 3). The instrument has five categories: unmet need (limits in current non-machine-learning approach), reproducibility (feature engineering methods, platforms/packages, hyperparameters), robustness (valid methods to overcome over-fit, the stability of results), generalizability (external data validation) and clinical significance (predictors explanation and suggested clinical use). A quality assessment table was provided by listing ‘yes’ or ‘no’ of corresponding items in each category.
3 Results
A total of 295 articles were identified and reviewed, and 18 met the inclusion criteria (8, 22-24, 26-39). The reasons for the exclusion of 277 articles are described in Figure 1 in accordance with the PRISMA reporting guideline (40). Of the 18 studies, 12 (66.7%) of the studies were published after 2010 (8, 23, 27, 30-38). Seven studies have been done in the USA (8, 24, 26, 28, 29, 31, 37), three in Iran (33, 34, 36), two in Italy (23, 32) and one study each in the UK (35), Australia (39), Korea (38), Belgium (27), Germany (30) and Egypt (22).
7 Figure 1: PRISMA flowchart for the selection of articles for the review
Quality assessment of the studies included in the review
The quality of the studies included in the review were generally satisfactory (Table 2). The feature selection method was not mentioned in nine (23, 26, 28, 31, 32, 34-36, 38) of the papers. Two thirds (n=12) of the studies mentioned the platforms/package used, while more than three fourths (n=14) had mentioned the hyperparameters which are needed for study replication.
None of the studies had validated the algorithm in an external data set.
MEDLINE (N=139)
CINAHL (N=06)
EMBASE (N=137)
PsycINFO (N=02)
Cochrane (N=11)
Initial search (N=295)
Title read (N=236)
Abstract read (N=57)
Full article read (N=29)
Selected papers (N=18)
Duplicates (N=59)
Conference proceedings - 09 Not English – 02
Mathematical model - 01
8 Table 2 : Quality assessment of machine learning studies used in the review
Limits in current
non- machine-
learning approach
Reproducibility Robustness Generalizability Clinical significance Feature
engineering
Platforms, packages
Hyperparameters Valid methods for over-
fitting
Stability of results
External data validation
Predictors explanation
Suggested clinical use
Shaikhina et al (35) Yes No Yes Yes Yes yes No Yes Yes
Decruyenaere et al. (27) Yes Yes Yes No Yes Yes No Yes Yes
Brier et al (26) Yes No No No Yes Yes No Yes Yes
Yoo et al. (38) Yes No Yes Yes Yes Yes No Yes Yes
Topuz et al (37) Yes Yes No Yes Yes Yes No Yes Yes
Nematollahi et al (33) No Yes Yes No No No No Yes No
Shahmoradi et al (34) Yes No Yes Yes Yes No No Yes Yes
Brown et al.(8) Yes Yes Yes Yes Yes No No Yes Yes
Lasserre et al (30) Yes Yes No Yes Yes Yes No Yes Yes
Lofaro et al (32) No No Yes Yes Yes Yes No Yes No
Greco et al (23) No No No No Yes No No No No
Li et al (31) Yes No Yes Yes Yes No No Yes Yes
Akl et al (22) No Yes Yes Yes Yes Yes No Yes Yes
Lin et al (24) Yes Yes Yes Yes Yes Yes No No Yes
Krikov et al (29) Yes Yes Yes Yes Yes Yes No Yes Yes
Goldfarb et al (28) Yes No Yes Yes Yes No No Yes Yes
Petrovsky et al (39) Yes Yes No Yes Yes No No No Yes
Tapak et al (36) Yes No No Yes Yes Yes No Yes Yes
9 Evaluation of the machine learning algorithms used in the studies included in the review Ten studies utilised data from both living and deceased donor transplants in proposed models (23, 24, 29, 31-36, 38, 39). Six studies used data from deceased donor transplant records only (8, 26-28, 30, 37), and one study used living donor transplant information (22) (Table 3).
Four of the seven ML predictive models developed in the USA utilised large data sets, including more than 30,000 kidney transplant recipients (24, 28, 29, 37). Half of the studies identified (n=9; 50.0%) developed models using datasets with less than 1,000 patients (23, 26,
27, 30, 32-36), and two studies had only 80 patients (32, 34).
Several studies implemented more than one ML method to predict the same outcome.
Decision trees (n=8) (23, 27-29, 32, 34, 35, 38) were the most commonly used ML method for predicted graft outcome, followed by artificial neural networks (n=6) (22, 24, 26, 33, 34, 36, 39) and Bayesian belief networks (n=3) (8, 31, 37). Eight studies each used pre-transplant variables (8,
24, 28, 30, 31, 34, 35, 37) and pre- and post-transplant input variables (22, 23, 26, 27, 29, 33, 36, 38, 39). Only one model used exclusively the post-transplant laboratory blood and urine tests collected six months after transplantation (32).
Most studies included a large number of input variables, and some used feature selection methods to identify the most important input variables prior to modelling. Nine (23, 26, 28, 31, 32, 34-36, 38) of the papers reviewed in the present study did not mention a feature selection method, while four had used literature review and expert opinion (8, 24, 33, 37). The input variables used in the models ranged from 6(30) to 71 (24) with 50.0% (n=9) of the models using less than 20 variables.
ML methods were applied to predict different graft outcomes at different time points.
Shaikhina et al. (2017) (35) used decision tree and random forest methods to predict acute anti-body mediated rejection at 30 days post kidney transplant; Decruyenaere et al. (2015)
(27) and Brier et al. (2003) (26) used ML methods such as decision tree, random forest, Linear and Radial Support vector machines and artificial neural networks to predict delayed graft function within one week of kidney transplant. The majority of models sought to predict graft survival beyond one year post transplant, with two studies predicting outcomes after
10 10 years post-transplant (29, 31). Seven studies (8, 24, 29, 31, 34, 37, 38) had developed predictive models to predict the outcome at different time points (e.g.: outcome prediction at one year, three years and five years) but the majority reported outcomes at a single time point.
One study presented a prediction model that used time-to-event (survival) information (38).
A number of approaches were used validate the ML model outputs across studies. Seven studies (23, 24, 27, 30-32, 37) used cross validation methods, while 10 studies (8, 22, 26, 28, 29, 34-36, 38, 39)
used a training and a test data set to derive the validation parameters. Common
parameters used included area under the curve (AUC), sensitivity, specificity and accuracy.
For descriptive purpose, the identified studies were divided into three groups based on the time duration, namely; predict graft failure before one year (early graft failure), predict graft failure using time-to-event (survival) data and predict graft failure at and after one year (late graft failure). It was evident that the validation parameters did not differ significantly between the prediction models developed to predict graft failure with in short versus longer term (Table 2).
In the six artificial neural network machine learning models, the AUC ranged from 0.67 to 0.88, with 0.67 in the model that was used to predict the delayed graft function within one week post kidney transplant and others being used to predict graft outcome at five years.
The artificial neural network method, developed by Akl et al. (2008) (22), was evaluated using an independent data set, and it predicted the graft survival at five years with an accuracy of 95%. Sensitivity was assessed in four (23, 27, 34, 35) of the eight studies that used decision tree, which ranged from 29.5%(27) to 88.2%(23). According to Shaikhina et al. (2017) (35), who developed a model to predict acute anti-body mediated rejection post kidney transplant 30 days, random forest (AUC – 0.854) outperformed decisin tree (AUC – 0.819). Furthermore, artificial neural networks (AUC – 0.865) outperformed support vector machine (AUC – 0.769) in predicting graft survival at five years, based on the study by Nematollahi et al.
(2017) (33).
11 Table 3 : Studies evaluating machine learning algorithms used for kidney graft outcome prediction
First Author, Year of Publication, Reference &
country
Population Input variable (Pre/
Post/
Both)
Feature selection method
Number of inputs in the final model
Output ML method Training
and Test set sizes
Validation results
Predict graft failure before one year (early graft failure) Shaikhina et al
(2017) (35) UK
80 HLA incompatible KT (both living and deceased donor)
Pre Not mentioned 14 Acute anti-body
mediated rejection post KT 30 days
Decision tree Tr - 75%; Ts – 25%
Ac - 85%; Sn - 81.8%;
Sp - 88.9%; PPV - 90%;
NPV - 80%; AUC - 0.854 Random Forest Tr - 75%; Ts
– 25%
Ac - 85%; Sn – 92.3%;
Sp – 71.4%; PPV – 85.7%; NPV – 83.3%;
AUC - 0.819 Decruyenaere
et al. (2015)(27) Belgium
497 deceased donor KT patients
Pre & post Recursive feature elimination procedure
20 Delayed graft
function (Dialysis within 1st week after KT)
LDA Cross
validation method
Sn - 27.6%; PPV - 42.3%; AUC -0.822
QDA Sn - 37.6%; PPV -
37.9%; AUC - 0.796
Linear SVM Sn - 83.8%; PPV -
30.6%; AUC - 0.843
Decision Tree Sn - 29.5%; PPV -
14.2%; AUC - 0.525 Random Forest Sn - 16.4%; PPV -
43.9%; AUC 0.739
SGB Sn - 16.2%; PPV -
58.3%; AUC - 0.772
Radial SVM Sn - 88.8%; PPV -
23.6%; AUC - 0.833 Polynomial
SVM
Sn - 10.9%; PPV - 24.0%; AUC - 0.798
12 Brier et al
(2003) (26) USA
304 deceased donor KT patients
Pre & post Not mentioned 10 Delayed Graft Function within one week post KT
ANN Tr - 65%; Ts
– 35%
Sn – 63.5%; Sp – 64.8%; AUC - 0.668
Predict graft failure using time-to-event (survival) data Yoo et al.
(2017)(38) Korea
3,117 KT patients (both living and deceased donor)
Pre & post Not mentioned 33 Graft failure Survival decision tree model
Tr - 80%; Ts – 20%
Index of concordance - 0.80
Predict graft failure at and after one year (late graft failure) Topuz et al
(2018) (37) USA
31,207 deceased donor KT patients
Pre • Literature review
• Elastic net
• SVM, ANN &
bootstrap forest combined with sensitivity analysis and information fusion
Not specifically mentioned
3 output levels
• high risk (GF before 3 years)
• medium risk (GF between 3 -7 years)
• low risk (GF after 7 years
Bayesian belief network model
Cross validation method
Total Ac - 68%; Ac high risk - 71%; Ac medium risk - 74%; Ac low risk - 59%; Sn - 41%; Sp - 84%; F measure - 0.60;
G mean - 0.49
Yoo et al.
(2017)(38) Korea
3,117 KT patients (both living and deceased donor)
Pre & post Not mentioned 33 Graft failure at 10 years)
Decision tree Tr - 80%; Ts – 20%
Index of concordance - 0.71
Nematollahi et al (2017)(33) Iran
717 KT patients (both living and deceased donor)
Pre & post Clinical expertise and current available evidence
07 Graft failure at 5 years post KT
SVM Not
mentioned
Ac 85.9%; Sn – 97.3%;
Sp 26.1 AUC 0.769
ANN Ac 90.4%; Sn - 98.2%;
Sp 49.6%; AUC 0.865
13 Shahmoradi et
al (2016)(34) Iran
513 KT patients (donor method not specified)
Pre Not mentioned 11 Graft survival at 1,
2, 3, 4, 5, & 6 yrs after KT
ANN Tr - 70%; Ts
– 30%
Sn - 87.1%; Sp - 65.0%;
Ac - 83.7%
C & R Tree Model
Sn - 87.1%; Sp - 57.3%;
Ac - 83.3%
C5.0 Model Sn - 90.8%; Sp - 52.0%;
Ac - 87.2%
Brown et al.
(2012) (8) USA
7,348 deceased donor KT patients
Pre Clinical expertise and current available evidence
52 Graft survival at 1 year & 3 years post KT
Bayesian model
Tr - 70%; Ts – 30%
Prediction at 1 year : AUC - 0.63; Sn - 39.9%;
Sp - 79.9%
Prediction at 3 years : AUC - 0.63; Sn - 39.8%;
Sp - 80.2%
Lasserre et al (2012) (30) Germany
707 deceased donor KT patients
Pre Recursive feature elimination
6 eGFR at 1 year
post KT
SVM with a Gaussian kernel (G-SVM)
Cross validation method
Pearson correlation coefficient 0.48
Lofaro et al (2010) (32) Italy
80 KT patients (both living and deceased donor)
Post Not mentioned 23 Chronic Allograft
Nephropathy at 5 years
Decision tree Cross validation method
Sn - 62.5%; Sp - 92.8%;
AUC - 0.847
Greco et al (2010) (23) Italy
194 KT patients (both living and deceased donor)
Pre & post Not mentioned 09 Graft failure at 5 years post KT
Decision tree Cross validation method
Sn - 88.2%; Sp - 73.8%
Li et al (2010) (31) USA
1,228 KT patients (both living and deceased donor)
Pre Not mentioned 70 Graft survival up
to 1 year
Bayesian model
Cross validation method
Sn - 85.8%; Sp- 95.7%;
Pr - 89.3%; F Measure - 0.875; AUC - 0.967 Graft survival >1 -
5 years
Sn - 63.8%; Sp - 88%;
Pr - 63.8%; F Measure - 0.629; AUC - 0.866
14 Graft survival >5 -
10 years
Sn - 54.2%; Sp - 86%;
Pr - 54.2%; F Measure - 0.542; AUC - 0.824 Graft survival > 10
years
Sn - 64.6%; Sp - 89%;
Pr - 63.3%; F Measure - 0.639; AUC - 0.856 Akl et al
(2008) (22) Egypt
1,900 live donor KT
Pre & post Factors significantly associated with graft survival in the univariate analysis
11 Graft survival at 5 years
ANN Tr - 83%; Ts
– 17%
Sn - 88.4%; Sp - 73.2%;
PPV - 82.1%; NPV - 82.0%; Ac - 95%; AUC - 0.88
Lin et al (2008)
(24)
USA
57,383 KT patients (both living and deceased donor)
Pre Clinical expertise and current available evidence
71 Graft survival at 1 year, 3 years, 5 years & 7 years
Single output ANN
Cross validation method
AUC - 1 yr 0.73; 3 yr 0.75; 5 yr 0.77; 7 yr 0.82 & % of non- monotonic predictions 2.34%
Multiple output ANN
Cross validation method
AUC - 1 year 0.61; 3 year 0.68; 5 year 0.73;
7 year 0.82 & % of non-monotonic predictions 5.46%
Krikov et al (2007) (29) USA
92,844 KT patients (both living and deceased donor)
Pre & post Significant predictors of survival analysis and multiple logistic regression. Additional variables added considering the clinical relevance
29 Graft survival at 1, 3, 5, 7 and 10 years after KT
Tree based model
Tr - 66%; Ts – 34%
AUC > 1 year - 0.626, AUC >2 year - 0.640, AUC >5 year - 0.717, AUC >7 year - 0.830, AUC > 10 year - 0.901
Goldfarb et al (2003) (28) USA
37,407 deceased donor KT patients
Pre Not mentioned 17 Graft survival at 3
years
Tree based model
Tr - 66%; Ts – 34%
Correlation between the prediction probability and the observed survival (r) -
15 0.984; PPV - 76%; NPV 53.8%
Duration not specified Petrovsky et al (2002) (39) Australia &
New Zealand
1,542 KT patients (both living and deceased donor)
Pre & post Principal Component Analysis (PCA)
22 GF. A time
duration has not been specified
ANN Tr - 70%; Ts
– 30%
Ac – 71.7%
Tapak et al (2017) (36) Iran
378 KT (both living and deceased donor) patients
Pre & post Not mentioned 19 GF. A time duration has not been specified
ANN Tr - 70%; Ts
– 30%
Sn - 91%; Sp - 74%;
PPV 27%; NPV - 98%;
Ac - 75%; AUC 0.88;
Kendall tau-b 0.41 (0.002); Kappa 0.17 (<0.001)
KT – Kidney Transplant; SVM – Support Vector Machines; ANN – Artificial neural networks; GF – Graft failure; Ac – Accuracy; Sn – Sensitivity; Sp – Specificity; Pr- Precision;
Tr – Training; Ts – Testing; PPV – Positive Predictive value; NPV - Negative predictive value; AUC – Area under curve; LDA - Linear Discriminant Analysis; QDA - Quadratic Discriminant Analysis; SGB - Stochastic Gradient Boosting; PCA – Principal Component Analysis; FFS – Forward Feature Selection; RFE – Recursive feature elimination
16 ML Compared with Other Predictive Methods
Seven studies compared the performance of ML models to other conventional methods (22,
24, 26-28, 33, 36) (Table 4). Six studies compared ML models, four of which were artificial neural network (24, 26, 33, 36), with logistic regression modelling (24, 26-28, 33, 36). The performance of all four artificial neural network models was reported as superior to logistic regression models developed. The prediction accuracy was 20% and 5.7% higher in artificial neural network models compared to logistic regression in the two studies of Tapak et al. (2017)(36) (Accuracy
Artificial neural networks 75% vs Accuracy Logistic 55%) and Nematollahi et al. (2007)(33) (Accuracy
Artificial neural networks 90.4% vs Accuracy Logistic 84.7%) respectively. Furthermore, according to Lin et al. (2008)(24) (AUC Artificial neural networks 0.77 vs AUC Logistic 0.71) and Brier et al. (2003)(26) (AUC Artificial neural networks 0.668 vs AUC Logistic 0.636) the AUC was about 5% higher in the artificial neural network models.
Nematollahi et al (2007)(33) (AUC Support vector machine 0.769 vs AUC Logistic 0.774) and
Decruyenaere et al. (2015) (27) (AUC Support vector machine 0.843 vs AUC Logistic 0.817) employed SVM and reported slight model improvements compared with logistic regression. However Goldfarb et al. (2003)(28) (Correlation (r) Decision Tree 0.984 vs Correlation (r) Logistic 0.998) and Decruyenaere et al. (2015)(27) (AUC Decision Tree 0.525 vs AUC Logistic 0.817), using the decision tree ML approach, found logistic regression gave superior results.
Prediction models developed using Cox regression were compared to ML models developed using artificial neural network by Akl et al. (2008)(22) and Lin et al. (2008)(24). According to Akl et al. (2008)(22), the prediction accuracy (artificial neural network 95% vs Cox 90%) and AUC (artificial neural network 0.88 vs Cox 0.72) of artificial neural network were 5% and 0.16 higher compared to the Cox regression model.
17 Table 4 : Studies comparing machine learning methods with other predictive methods for kidney graft outcome prediction
Study Prediction
duration
ML method Regression method
ANN Decision Tree Random Forest SVM Logistic Cox
Tapak et al (2017) (36)
Not specified Sn - 91%
Sp - 74%
Ac - 75%
AUC 0.88
Sn - 91%
Sp - 51%
Ac - 55%
AUC 0.75 Nematollahi et
al (2007)(33)
5 Years Ac 90.4%
Sn - 98.2%
Sp 49.6%
AUC 0.865
Ac - 85.9%
Sn – 97.3%
Sp - 26.1%
AUC 0.769
Ac - 84.7%
Sn – 97.5%
Sp 17.4%
AUC 0.774 Decruyenaere
et al. (2015) (27)
Delayed graft function within 1st week of KT
Sn - 29.5%
PPV -14.2%
AUC - 0.525
Sn - 16.4%
PPV - 43.9%
AUC 0.739
Sn - 83.8%
PPV -30.6%
AUC - 0.843
Sn - 85.5%
PPV - 26.5%
AUC - 0.817 Akl et al (2008)
(22)
5 Years Sn - 88.4%
Sp - 73.2%
PPV - 82.1%
Ac - 95%
AUC - 0.88
Sn – 61.8%
Sp – 74.9%
PPV – 43.5%
Ac - 90%
AUC - 0.72 Lin et al (2008)
(24)
1 year AUC 0.73 AUC 0.71 AUC 0.65
3 year AUC 0.75 AUC 0.72 AUC 0.67
5 year AUC 0.77 AUC 0.75 AUC 0.71
7 year AUC 0.82 AUC 0.81 AUC 0.75
Brier et al (2003) (26)
Delayed graft function within 1st week of KT
Sn – 63.5%
Sp – 64.8%
AUC - 0.668
Sn – 36.5%
Sp – 90.7%
AUC - 0.636 Goldfarb et al
(2003) (28)
Correlation between the prediction
probability and the
Correlation between the prediction
probability and the
18 observed survival
(r) - 0.984; PPV - 76%; NPV 53.8%
observed survival (r) - 0.998; PPV - 76%; NPV 63.0%
SVM – Support Vector Machines; ANN – Artificial neural networks; Ac – Accuracy; Sn – Sensitivity; Sp – Specificity; Pr- Precision; PPV – Positive Predictive value; NPV - Negative predictive value; AUC – Area under curve
19 4 Discussion
This is the first review to systematically review current ML methods that have been developed to predict clinical outcomes following kidney transplant. Results showed heterogeneity in the types of ML methods used, including artificial neural networks, decision trees and Bayesian belief networks. Variation in the size of the study populations and the input variables used was also observed. Furthermore, it was evident that there are inconsistencies in prediction accuracy of ML models compared to traditional predictive methods (based on logistic and cox regression).
ML techniques are being increasingly used in clinical and preclinical medical research. ML has been effectively used to predict survival of grafts after transplantation surgery, varying from stem cell transplants to heart transplants (41). Techniques such as artificial neural network have been applied to the study of cancer development and progression, and, according to Cruz and Wishart (2007), machine learning substantially improves the
predictive accuracy (10% - 25% absolute improvement) of biologically meaningful outcomes like cancer risk, recurrence and mortality (42).
ML techniques used in Kidney transplants
It was interesting to note that models developed using a small number of cases reported similar prediction accuracy compared with models developed using larger numbers of cases.
For example, the prediction model developed by Lofaro et al. (2010)(32) using 80 records performed similar to the model developed by Krikov et al. (2007)(29), which was built on a national database of 92,844 patient records. Little is known about the minimum sample size needed to develop a predictive model using ML methods, but it is closely linked to the complexity of the prediction and the complexity of the ML method. The evidence points to larger sample sizes resulting in better prediction accuracy (43). The number of events per variable is a third factor in the performance of the model. Larger numbers of events per variable are associated with better model stability and higher predictive accuracy (43). Present day patient registries, with large volumes of data, ought to yield high fidelity ML predictions.
20 Eight studies used pre-transplant donor and recipient variables (8, 24, 28, 30, 31, 34, 35, 37) to predict graft outcomes. The models in these studies have immediate translation potential as
decision-making tools in kidney organ allocation. Brown et al. (2012) proposed using a model based on a Bayesian belief network in kidney allocation, to determine which donor- recipient match would yield the longest graft survival. They hypothesise that such a system would have the potential to prevent more than 40% of graft failures within the first year (8). Despite this ML approaches have not been widely adopted for organ allocation, possibly due to the non-acceptance of such methods by the organ allocation bodies (39). The greatest strength of an organ allocation system based on ML techniques is that it circumvents human bias during the allocation process. However human bias may be desirable eg weighting of specific patient groups to deliver parity of access. However, bias, for whatever reason, ought not disqualify exploration of improving systems of organ allocation, particularly when they deliver more accurate assessment of the likely graft outcome (39, 44).
The accuracy of a predictive model largely depends on the incorporation of prognostically significant variables in the model (45). However, in a practical sense it is important that the number of input variables used are manageable. Thus, the selection of variables that account for most of the variation in the outcome is an important pre-step. Furthermore, over-fitting is a phenomenon that leads to poor model performance, and this is associated with having too many irrelevant parameters in the model. Over-fitting occurs when a machine learning model adapts to the details of the data set to the extent that it negatively impacts the performance of the model on new data (46). The use of a feature selection method to identify the most important variables that need to be included in the model is commonly used in ML models to improve the model performance. Principal Component Analysis (PCA) is considered as one of the most popular methods for dimensionality reduction (47). However, PCA had been used only in one study identified in this review (39).
The studies in the review used different ML methods (artificial neural networks, decision trees, support vector machine, Bayesian belief networks) to develop the predictive models.
In the review, five studies used more than one method to develop models (27, 33-35, 38). The best ML method to develop a predictive model is a widely discussed topic (48). The current consensus is that there is no one method that fits all data sets, with the complexity of the
21 data pivotal (49). To get around this uncertainty investigators use multiple machine learning methods on single data sets and the best is chosen based on validation parameters. Abbas et al (2018) used six machine learning methods to classify foetal distress and hypoxia using machine learning approaches (50) and Aljaaf et al (2018) used four machine learning methods to predict chronic kidney disease early (51), highlighting the importance of using multiple ML methods on a single data set.
Validation parameters have not yet been standardised in the literature. The studies included in the review have used different validation parameters such as sensitivity, area under the curve and accuracy. Standardisation of validation parameters will facilitate comparisons between different models
ML Compared with Other Predictive Methods
Conventional models such as Cox models and logistic regression assume that the predictors are independent of each other (52, 53). They are less suited to handle complex interactions among predictors, and are not often used to model non-linear relationships among
predictors and outcomes (24, 54). The survival of the graft after a kidney transplant depends on many factors, and evidence indicates that accurate prediction of the outcome using conventional statistical modelling is imprecise (55).
However, according to our review it was evident that there are inconsistencies in prediction accuracy of ML models compared to traditional predictive methods (based on logistic and cox regression). This is consistent with the currently available literature. A good number of studies which compared ML and traditional statistical methods have concluded that their results are mixed (56-58). However, Oermann et al. (2015) revealed that ML was superior in predicting the outcome after stereotactic radiosurgery for cerebral arteriovenous
malformations compared to conventional prognostic scoring systems (59). The superiority of ML, compared to expert opinion, in predicting outcomes has also been demonstrated in medical literature (60). Similarly, Senders et al. (2017) reported ML methods outperformed logistic regression in predicting the outcome of neurosurgery in a systematic review of seven studies.
22 Limitations
This review has limitations. Though broad search strategies were used to find the relevant articles, we may have not identified all the machine learning predictive models reported in the specific area. Studies and outcome measures demonstrating high-performing ML models might be published or reported more often, thus publication and/or outcome bias could be present in this review.
Conclusion
This first systematic review of ML methods used in the field of kidney transplant found that the main ML methods used were artificial neural networks, decision trees and Bayesian models. There is a wide variation in the size of the study population and the input variables used in these ML models. Only one ML- based prediction model had modelled time-to- event (survival) information but instead used the binary outcome of failure or not. Adding the additional information of the timing of the event could lead to improved model predictions. The prediction accuracy provided mixed results when ML and traditional predictive methods (based on logistic and cox regression) are compared. However, ML has a potential to improve the prediction of kidney transplant outcomes. It is a novel tool for clinicians making decisions about the scant community resource of transplant organs. The barriers in the practical implementation of ML methods into clinical settings, including the ethical and societal implications of adoption briefly alluded to, warrant further exploration.
Future research should focus on modelling time-to-even information using machine learning methods such as Survival tree (61), Random survival forest(62) and Survival Support Vector Machine(63).
Authors’ Contributions
SS, NW & SK; Research idea, study design, analysis and interpretation. SS; Drafting of the manuscript. NW, SK, NG, HH, KB; Data analysis, interpretation, supervision and mentorship.
Acknowledgement
Sameera Senanayake is a recipient of Australian Government Research Training Program (RTP) for Postgraduate Research (PhD) Scholarship and Queensland University of
Technology International Postgraduate Research (PhD) Scholarship (2018 -2021)
23 Conflict of Interest Statement: The paper has not been published previously in whole or part. The authors of this manuscript have no conflicts of interest to disclose.
Support: This study received no specific funding.
Financial Disclosure: The authors declare that they have no relevant financial interests.
Summary Points
What was already known on the topic?
• Along with the increasing prevalence of Chronic Kidney Disease, the prevalence of patients in the end stage of renal disease, and the demand for renal replacement therapy have increased over the years
• The ability to predict graft failure before kidney transplant becomes crucial to facilitate donations to the most suitable recipient, and to minimise the flow of patients returning to the already-burdened transplant waiting list
• Several kidney graft outcome prediction models developed using machine learning are available in the literature
What this study added to our knowledge?
• We did a systematic review of the different machine learning methods used to predict graft outcomes among kidney transplant patients, and assess their predictive performance compared with traditional statistical methods.
• Most of the machine learning based predictive models predicted graft failure with high sensitivity and specificity.
• However, the prediction accuracy provided mixed results when machine learning and traditional regression based predictive methods are compared.
• Based on reported gains in predictive performance, machine learning has the potential to improve kidney transplant outcome prediction and aid medical decision making.
24 References
1. WANG T, XI Y, LUBWAMA RN, KORO C. Chronic Kidney Disease (CKD) in US Adults with Self- Reported Cardiovascular Disease (CVD)—A National Estimate of Prevalence by KDIGO 2012 Classification. Am Diabetes Assoc; 2018.
2. Valley TS, Nallamothu BK, Heung M, Iwashyna TJ, Cooke CR. Hospital Variation in Renal Replacement Therapy for Sepsis in the United States. Critical care medicine.
2018;46(2):e158-e65.
3. Karam VH, Gasquet I, Delvart V, Hiesse C, Dorent R, Danet C, et al. Quality of life in adult survivors beyond 10 years after liver, kidney, and heart transplantation. Transplantation.
2003;76(12):1699-704.
4. Cecka J, Gritsch H. Why are nearly half of expanded criteria donor (ECD) kidneys not transplanted? American Journal of Transplantation. 2008;8(4):735-6.
5. Masakane I, Taniguchi M, Nakai S, Tsuchida K, Goto S, Wada A, et al. Annual Dialysis Data Report 2015, JSDT Renal Data Registry. Renal Replacement Therapy. 2018;4(1):19.
6. Foley RN, Collins AJ. The USRDS: what you need to know about what it can and can’t tell us about ESRD. Clinical Journal of the American Society of Nephrology. 2013;8(5):845-51.
7. Luxardo R, Kramer A, González-Bedat MC, Massy ZA, Jager KJ, Rosa-Diez G, et al. The epidemiology of renal replacement therapy in two different parts of the world: the Latin American Dialysis and Transplant Registry versus the European Renal Association-European Dialysis and Transplant Association Registry. Revista Panamericana de Salud Pública.
2018;42:e87.
8. Brown TS, Elster EA, Stevens K, Graybill JC, Gillern S, Phinney S, et al. Bayesian modeling of pretransplant variables accurately predicts kidney graft survival. Am J Nephrol.
2012;36(6):561-9.
9. Moore J, He X, Shabir S, Hanvesakul R, Benavente D, Cockwell P, et al. Development and evaluation of a composite risk score to predict kidney transplant failure. American Journal of Kidney Diseases. 2011;57(5):744-51.
10. Foucher Y, Daguin P, Akl A, Kessler M, Ladrière M, Legendre C, et al. A clinical scoring system highly predictive of long-term kidney graft survival. Kidney International. 2010;78(12):1288- 94.
11. Tiong H, Goldfarb D, Kattan M, Alster J, Thuita L, Yu C, et al. Nomograms for predicting graft function and survival in living donor kidney transplantation based on the UNOS Registry. The Journal of urology. 2009;181(3):1248-55.
25 12. Rao PS, Schaubel DE, Guidinger MK, Andreoni KA, Wolfe RA, Merion RM, et al. A
comprehensive risk quantification score for deceased donor kidneys: the kidney donor risk index. Transplantation. 2009;88(2):231-6.
13. Kaplan B, Schold J. Transplantation: neural networks for predicting graft survival. Nature Reviews Nephrology. 2009;5(4):190.
14. Ghahramani Z. Probabilistic machine learning and artificial intelligence. Nature.
2015;521(7553):452.
15. Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. The New England journal of medicine. 2016;375(13):1216.
16. Patel VL, Shortliffe EH, Stefanelli M, Szolovits P, Berthold MR, Bellazzi R, et al. The coming of age of artificial intelligence in medicine. Artificial intelligence in medicine. 2009;46(1):5-17.
17. Shortliffe EH. The adolescence of AI in medicine: will the field come of age in the'90s?
Artificial intelligence in medicine. 1993;5(2):93-106.
18. Lee HC, Yoon HK, Nam K, Cho YJ, Kim TK, Kim WH, et al. Derivation and Validation of Machine Learning Approaches to Predict Acute Kidney Injury after Cardiac Surgery. Journal of clinical medicine. 2018;7(10).
19. Ryu S, Lee H, Lee DK, Park K. Use of a Machine Learning Algorithm to Predict Individuals with Suicide Ideation in the General Population. Psychiatry investigation. 2018:0.
20. Tolmeijer E, Kumari V, Peters E, Williams SCR, Mason L. Using fMRI and machine learning to predict symptom improvement following cognitive behavioural therapy for psychosis.
NeuroImage Clinical. 2018;20:1053-61.
21. Xie Y, Jiang B, Gong E, Li Y, Zhu G, Michel P, et al. Use of Gradient Boosting Machine Learning to Predict Patient Outcome in Acute Ischemic Stroke on the Basis of Imaging, Demographic, and Clinical Information. AJR American journal of roentgenology. 2018:1-7.
22. Akl A, Ismail AM, Ghoneim M. Prediction of graft survival of living-donor kidney transplantation: nomograms or artificial neural networks? Transplantation.
2008;86(10):1401-6.
23. Greco R, Papalia T, Lofaro D, Maestripieri S, Mancuso D, Bonofiglio R. Decisional Trees in Renal Transplant Follow-up. Transplantation Proceedings. 2010;42(4):1134-6.
24. Lin RS, Horn SD, Hurdle JF, Goldfarb-Rumyantzev AS. Single and multiple time-point prediction models in kidney transplant outcomes. J Biomed Inform. 2008;41(6):944-52.
25. Qiao N. A systematic review on machine learning in sellar region diseases: quality and reporting items. Endocrine connections. 2019.
26 26. Brier ME, Ray PC, Klein JB. Prediction of delayed renal allograft function using an artificial
neural network. Nephrology Dialysis Transplantation. 2003;18(12):2655-9.
27. Decruyenaere A, Decruyenaere P, Peeters P, Vermassen F, Dhaene T, Couckuyt I. Prediction of delayed graft function after kidney transplantation: comparison between logistic
regression and machine learning methods. BMC medical informatics and decision making.
2015;15(1):83.
28. Goldfarb-Rumyantzev AS, Scandling JD, Pappas L, Smout RJ, Horn S. Prediction of 3-yr cadaveric graft survival based on pre-transplant variables in a large national dataset. Clinical Transplantation. 2003;17(6):485-97.
29. Krikov S, Khan A, Baird BC, Barenbaum LL, Leviatov A, Koford JK, et al. Predicting kidney transplant survival using tree-based modeling. Asaio Journal. 2007;53(5):592-600.
30. Lasserre J, Arnold S, Vingron M, Reinke P, Hinrichs C. Predicting the outcome of renal transplantation. Journal of the American Medical Informatics Association. 2012;19(2):255- 62.
31. Li J, Serpen G, Selman S, Franchetti M, Riesen M, Schneider C. Bayes net classifiers for prediction of renal graft status and survival period. World Academy of Science, Engineering and Technology. 2010;39.
32. Lofaro D, Maestripieri S, Greco R, Papalia T, Mancuso D, Conforti D, et al. Prediction of Chronic Allograft Nephropathy Using Classification Trees. Transplantation Proceedings.
2010;42(4):1130-3.
33. Nematollahi M, Akbari R, Nikeghbalian S, Salehnasab C. Classification models to predict survival of kidney transplant recipients using two intelligent techniques of data mining and logistic regression. International Journal of Organ Transplantation Medicine. 2017;8(2):119- 22.
34. Shahmoradi L, Langarizadeh M, Pourmand G, Fard ZA, Borhani A. Comparing Three Data Mining Methods to Predict Kidney Transplant Survival. Acta Informatica Medica.
2016;24(5):322-7.
35. Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N. Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation.
Biomedical Signal Processing and Control. 2017.
36. Tapak L, Hamidi O, Amini P, Poorolajal J. Prediction of Kidney Graft Rejection Using Artificial Neural Network. Healthcare Informatics Research. 2017;23(4):277-84.
27 37. Topuz K, Zengul FD, Dag A, Almehmi A, Yildirim MB. Predicting graft survival among kidney
transplant recipients: A Bayesian decision support model. Decision Support Systems.
2018;106:97-109.
38. Yoo KD, Noh J, Lee H, Kim DK, Lim CS, Kim YH, et al. A Machine Learning Approach Using Survival Statistics to Predict Graft Survival in Kidney Transplant Recipients: A Multicenter Cohort Study. Sci. 2017;7(1):8904-.
39. Petrovsky N, Tam SK, Brusic V, Russ G, Socha L, Bajic VB. Use of artificial neural networks in improving renal transplantation outcomes. Graft. 2002;5(1):6.
40. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Annals of internal medicine. 2009;151(4):264-9.
41. Sousa FS, Hummel AD, Maciel RF, Cohrs FM, Falcão AEJ, Teixeira F, et al., editors. Application of the intelligent techniques in transplantation databases: a review of articles published in 2009 and 2010. Transplantation proceedings; 2011: Elsevier.
42. Cruz JA, Wishart DS. Applications of machine learning in cancer prediction and prognosis.
Cancer informatics. 2006;2:117693510600200030.
43. van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol.
2014;14(1):137.
44. Schold JD, Segev DL. Increasing the pool of deceased donor organs for kidney transplantation. Nature Reviews Nephrology. 2012;8(6):325.
45. Kattan MW. When and how to use informatics tools in caring for urologic patients. Nature Reviews Urology. 2005;2(4):183.
46. Cawley GC, Talbot NL. On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research. 2010;11(Jul):2079-107.
47. Khalaf M, Hussain AJ, Al-Jumeily D, Baker T, Keight R, Lisboa P, et al., editors. A Data Science Methodology Based on Machine Learning Algorithms for Flood Severity Prediction. 2018 IEEE Congress on Evolutionary Computation (CEC); 2018: IEEE.
48. Yousef AH. Extracting software static defect models using data mining. Ain Shams Engineering Journal. 2015;6(1):133-44.
49. Lorena AC, Garcia LP, Lehmann J, Souto MC, Ho TK. How Complex is your classification problem? A survey on measuring classification complexity. arXiv preprint arXiv:180803591.
2018.