testing the model’s performance on the test dataset

4. Model deployment: A final model can be operationalised on an appropriate computer system, enabling live or batch predictions.

Some important additional ML concepts are:

§ Feature Engineering is the process of extracting or selecting features (variables) from a dataset in order to enhance the dataset and improve the creation of accurate predictive models. For example, patient admission data could be enhanced by creating a feature (variable) that is the day of admission (Mon–Sun), and blood result data could be enhanced by calculating the absolute difference between consecutive blood results.

§ Generalisation usually refers to a ML model’s ability to perform well on new unseen data. It is related to the concept of overfitting.

§ Hyper-Parameter Tuning – Grid search: ML algorithms have a variety of inputs (hyper-parameters) in addition to the training data. These inputs provide instructions for how the algorithm should model the data. These parameters determine ‘higher level’ properties of the model such as its complexity and how fast it should ‘learn’. Grid search is the traditional method of hyper-parameter optimisation; sometimes referred to as parameter sweep, it simply refers to building multiple models from the same dataset by iterating through multiple different hyper-parameters of an algorithm. The model which has the highest performance based on a specific performance metric (logloss or AUROC), either via cross-validation or on a validation dataset (not on the test dataset), is the selected machine-learning model; the performance of this model can then be tested on the test dataset.

§ N-fold Cross-Validation: To enable optimisation of a model and enhance its predictive accuracy, it needs to be iteratively tested on an independent dataset, modified and then re-tested. However, because the independent test dataset is used for final performance testing, an additional dataset distinct from the train dataset is required. Rather than just splitting the train dataset into two parts, n-fold cross-validation offers an alternative. For n-fold cross-validation where n = 5, the train dataset is split into five parts, four of which are used to train the model, while the fifth part is used to test it. This process is repeated a minimum of five times, with the model iteratively being improved, especially for generalisation. The mean performance metrics (AUROC, logloss) of all the models provide an indication of the final model’s theoretical performance on an independent dataset.

§ Overfitting occurs when an algorithm models random error or noise, rather than an actual relationship within the data; thus, when the resultant model is applied to an independent dataset, it performs poorly.

§ Tree depth, and leaf: the depth of a tree (in a ML algorithm) is the number of edges from the node to the tree’s root node. A tree can have multiple depths depending on the distribution of leaves. Maximum, minimum, and mean are the usual ways of communicating tree depth e.g. minimum tree depth would be the shortest number of edges from a leaf node to the root node. A leaf is a vertex of degree one in a tree (decision, boosted, etc.).

§ Train and Test Datasets: The pre-processed dataset is split, ideally using randomisation, into two prior to model training. Ensuring that both the train and test datasets have similar outcomes and underlying characteristics. There is no universally agreed percentage split, but a 70:30 split for train:test is common.

1.6 Aims and Objectives

1.6.1 Aim

To develop a system for predicting AEs in all hospitalised patients, using ML of routinely collected blood test results and existing electronically held patient data relating to co-morbidities and demographics.

1.6.2 Objectives

1. To identify the appropriate universally accessible datasets and their specific variables to build and implement ML-EWS.

2. To map both the ethical and legal landscape required to undertake a multi-site ‘big data’ study. 3. To create a large dataset of ~1 million patients, and their blood results and administrative data,

from multiple acute hospitals in different geographic locations in the UK.

4. To develop expertise in programming, to undertake large-scale data capture and manipulation, and to implement ML models.

5. To investigate mortality associated with known models of disease. Specifically: a. For Dehydration,

i. To understand the effect of dehydration (Ur:Cr) on outcome

ii. To understand changes in dehydration (Ur:Cr), combined with AKI, on outcome

iii. To create a model that incorporates urea and creatinine results with simple demographic data to identify those at risk of poor outcome

b. For AKI,

i. the epidemiology of patients admitted to hospital who are diagnosed with AKI

ii. the relationship between the NHSE-AKI algorithm defined AKI stage and in-hospital outcome (Death or Renal Replacement Therapy (Drrt)), and whether this relationship differs according to method of admission and existing co-morbidities

iii. whether the Drrt risk differs depending on the route by which the AKI stage is defined iv. whether the NHSE-AKI algorithm fails to identify patients who continue to have AKI

v. whether a ML approach can better stratify risk than the current NHSE-AKI algorithm. 6. To build ML models that can be used on all patients in hospital in order to identify their risk of

dying in hospital, both on admission and subsequently.

7. To build a proof-of-concept computer system that is agnostic to the internal IT infrastructure of a hospital, but which can deploy advanced ML models by ingesting hospital data and relaying results to the hospital/clinician in real time.

In document ML-EWS: Machine Learning Early Warning System. The application of machine learning to predict in-hospital patient deterioration (Page 40-43)