Pre-Processing and Modelling Techniques - Predictive Risk Modelling of Hospital Emergency Readm

Figure 10.5: Toolkits: Screenshot of the inputs configuration file

Firstly, all the major steps in IPython Notebooks produce outputs in form of statistical outputs, configurations outputs or features backups. Also, an additional Notebook was added for more detailed analyses of the models’ performances and benchmarking, to reduce the main modelling Notebook’s complexity.

Figure 10.6: Detailed process-flow diagram of the feature generation step

10.5 Pre-Processing and Modelling Techniques

According to the proposed healthcare pre-processing framework in this thesis, the feature generation is called via the IPython Notebooks. This stage is highly resource- intensive; therefore the features and settings may be saved throughout the workflow.

10.6 Discussions 135 Moreover, the filtering stationary features step and the filtering correlated features (Section 7.4) are partially automated. Because, the number of input features can be high and it can be burdensome for the user to specify them manually in each run. However, it is strongly recommended to manually review the features to make sure that the right features are removed. Figure 10.7 presents a screenshot of the filtering stationary features step.

Figure 10.7: Toolkits: Screenshot of the filtering stationary features step

Similarly, in the feature ranking step, the list of ranked features can be approved, before progressing with the analysis. These two feature removal steps are only triggered after the features list is confirmed by the user!

Finally, the basic modelling approaches in the modelling stage, including theLR and theRF, are presented in the main Notebooks and can be configured quickly. The ad- vanced models,BPMandWDNN, are implemented in separate Notebook, due to their complex configurations.

10.6 Discussions

The version 1.0 of the ERMER and the T-CARERtoolkits are now released and can be applied and customised to any type of healthcare setting or data source. Also, there is a number of mapping tables provided as part of the release, which is used for re-categorising features.

One of the most challenging tasks during the development of the feature pre-processing was efficient feature generation and processing. In part, this was due to complex nature

10.6 Discussions 136 of healthcare data, but also the fact that very little public research was available about the prepossessing theNHSdata with a clear and detailed specification.

Moreover, another major challenge in the development was the development of mapping tables, to reduce sparsity and improve fitness. For instance, the design of effective diagnoses or cost grouping can be considered as the most important step in patient risk modelling, due to their high correlations and high levels of complexities. And, the design of features that have adequate precision with optimally low sparsity are very complex, when the number of feature categories is very high, population sample sizes are moderate, and prevalence of categorising varies across several dimensions.

Finally, there is a plan to release an extension of the feature pre-processing that can be fully implemented onHES, Secondary Uses Service (SUS) and General Practice (GP) data. In this separate extension, the hospital features will include all three sectors, inpatient, outpatient, and Accident and Emergency (A&E). In addition, the mapping tables for feature re-categorisation are going to be included, to allow the generation of features that have lower sparsity and higher significance, but are clinically meaningful. In the following chapter, the concluding remarks and feature works are highlighted.

Chapter 11

Concluding Remarks

In this chapter, firstly, a brief overview of the thesis is provided. Then, the future work and extensions are highlighted.

11.1 Conclusion

In this thesis, we have investigated several important problems regarding the identi- fication of patients risks. The principal motivation of this research was to provide a framework for analysing administrative healthcare data to generate significant features that are correlated to patients health and care status, and then to model the high layers of risks complexities using robust techniques. Because, at present, no other framework available for pre-processing healthcare data, and current predictive models for patients risk are very simplistic and mainly fail to learn the significant complex patterns in health and care status.

Moreover, hospital readmissions are rising, due to growth in long-term comorbidities, the ageing population, premature discharges and accidents. It has been estimated that about half of the Ambulatory Care Sensitive Conditions (ACSCs) can be predicted and may be avoided by adequate interventions. The present models of hospital emergency readmission and comorbidity risks have moderate performances and use very similar features and modelling techniques.

This thesis looked at three sub-problems in the area of healthcare modelling. Firstly, a healthcare pre-processing framework was developed to prepare data, generate a pool of features and select important features. Then, an Ensemble Risk Model of Emergency Admissions (ERMER) was developed as a decision support tool, to help clinicians and

11.1 Conclusion 138 commissioners to identify risks of patients. Next, a Temporal-Comorbidity Adjusted Risk of Emergency Readmission (T-CARER) was designed to identify patients’ risks of comorbidities and complexities, with more accuracy and higher confidence.

Firstly, the proposed healthcare pre-processing framework was used to sample, clean and treat input data. Then, it creates super-spells out of related episodes. After that, it systematically generates a pool of features, transforms, filters correlated features, ranks feature importance and select top features. The proposed healthcare pre-processing framework has been proven to be effective in prediction models of readmission and comorbidity risks, and it has potential to be used in other areas of healthcare modelling. Secondly, the ERMER was developed using an Ensemble of Bayes Point Machine (BPM) models. The sub-models in the Ensemble were generated using a collection of different cohorts, including prior spells, prior emergency admissions, prior operations and age limits. Then, the ERMER used a hill-climbing heuristic to optimise the weighted average rank of predicted estimates using several performance criteria. Introducing prior probabilities and using a collection of weaker sub-models have been demonstrated to be effective in the production of highly stable readmission models with strong confidence and accuracy.

Finally, the T-CARER implements a comorbidity risk model with inclusion of temporal dimensions: Length-of-Stay (LoS) and delta-time between admissions. Also, in addition to comorbidity groups, T-CARER adds population stratification, consultant specialities, operations and complications. The offered solution introduces a generic method for generating a pool of features out of re-categorised and temporal features, in order to create a customised comorbidity risk index.

Towards meeting our objectives, several extracts of the Hospital Episode Statistics (HES) within a 10-year timeframe have been obtained, to train, test and cross-validate the models. And, all the proposed models were benchmarked against previous models from several aspects, including different population cohorts, time-frames, fitting algorithms and risk segments. The benchmarks of theERMERand theT-CARERus- ing multiple comparison criteria have shown significant improvements against previous models, in terms of precision, accuracy and stability.

Finally, the proposed solutions are implemented in the form of user-friendly toolkits, using Jupyter IPython Notebook. The toolkits use a wide range of high-performance computing packages to process input data, generate features, and train and test models. Moreover, the developed IPython Notebooks provide an ideal environment for researchers to model a custom predictive model with great flexibility in feature generation and applying modelling algorithms.

11.2 Extensions and Future Work 139

In document Predictive Risk Modelling of Hospital Emergency Readmission, and Temporal Comorbidity Index Modelling Using Machine Learning Methods (Page 173-178)