Chapter 4: Scoping the potential of using data-driven segmentation analysis in healthcare - a
5.3 Data manipulation: Demographic data
While some attributes could be obtained directly from the raw dataset, like gender, the majority of patient-level variables had to be constructed from care episode data. Based on existing literature, a large number of variables were created. Afterwards, they were further reviewed in the data cleaning, reduction and normalisation stages.
Basic person data that can be extracted from administrative databases are age and gender.
Age was recorded as at the end of the study period, as calculated from the year of birth. For this reason, the youngest age in the dataset was 5, reflecting new-borns included at the beginning of the five-year study period. Townsend 2001 deprivation score was also included in the demographic data, at a 5, 10 and 20-step scale. While the Index of Multiple Deprivation (IMD) score is also available linked to CPRD, Townsend was chosen because it does not include any predictors related to health.107 The IMD does include a health predictor - mortality - and could therefore overestimate the impact of deprivation on health outcomes.
Per person:
Cost
Inpatient
Person Activity
• Long-term condition flags • # ELDCs
• # ELIPs
• # GP surgery attendances
• # GP clinic attendances
• # GP telephone contacts
• # GP home visits
• Total cost GP surgery
• Total cost GP clinic
• Total cost GP telephone
• Total cost GP home visits
Therapy • # prescriptions
• # unique prescriptions
• Total prescription cost
Outpatient • # OP appointments
• # unique OP specialties • Total cost OP
• Total cost
ELDC: elective day case; ELIP: elective inpatient; NEIP: non-elective inpatient; RA: Regular attender; ALoS: average length of stay; OP:
outpatient; GP: general practice/practitioner
• # long term conditions
• Multimorbidity flag
• Risk score
• Residential care flag Consultation
Clinical • Long-term condition flags Database
62 Secondly, the presence of selected mental health diagnoses was used to identify mental health conditions. Finally, learning disabilities were identified. Together these form the long-term condition (LTCs) flags.
5.3.1 Chronic conditions
This study considers the impact of chronic conditions on the healthcare needs of the patient.
To define a set of chronic conditions that significantly impact care needs, a comorbidity index was used. Comorbidity indices are based on a list of coexisting illnesses that may impact a patient’s prognosis. By combining the impact of these conditions in a single index, an overall score can be generated.108 These scores can be used to correct for case mix differences or to predict outcomes like mortality.109 There exist a large number of comorbidity indices, all based on different conditions and some providing weightings for specific diseases based on severity.
However, not all indices can be derived from administrative data as used in this study, instead requiring case note review to determine severity or non-diagnosis based factors.
HES uses the International Classification of Diseases (ICD) version 10. The Charlson Index has been proven to work using ICD-based diagnosis information from administrative data, and is one of the most widely used comorbidity indices.109-112 Moreover, while it is originally an index predicting mortality,113 it has been shown to also correlate with avoidable hospital admissions,110 health-related quality of life,114 and healthcare cost.115 The Charlson Index combines 16 conditions and assigns them a numerical standard weighting to create the overall score.113
This study used the individual conditions specified by the Charlson Index as variables rather than their combined score, to enable the exploration of different patterns at the condition level. There exist different versions of ICD-10 translations of the original ICD-9 codes that were developed for this index (see Table 11).110, 112, 116-119 This research used the translation developed by Aylin and Bottle119, 120 because it has been adapted to English coding practices, and because the HSCIC includes it in statistical guidance to NHS institutions.118 In addition, a translation to Read codes has also been created specifically for use in primary care datasets like CPRD, which use Read codes rather than ICD-10.121
For the purpose of this study, the condition rather than its state (e.g. diabetes, versus diabetes with complications) was used, as patient characteristics should be the same over time. Therefore ‘diabetes’ and ‘diabetes with complications’ were combined, as well as ‘mild liver disease’ and ‘severe liver disease’, and ‘cancer’ and ‘metastatic cancer’.
Table 11: Overview of Charlson comorbidity ICD coding
Deyo et al.111 ICD-9 adaptation Sundararajan et al.112 ICD-10 translation
Bottle and Aylin120 English coding adaptation
Condition ICD-9 codes ICD-10-AM codes ICD-10 codes
Myocardial infarct 410-410.9 Acute myocardial infarction;
412 Old myocardial infraction
Acute myocardial infarction:
I21, I22, I252
I21, I22, I252, I258 Congestive heart
failure 428-428.9 Heart failure I50 I50
Peripheral vascular
disease 443.9 Peripheral vascular disease inc.
intermittent claudication; 441-441.9 Aortic aneurysm; 785.4 Gangrene; V43.4 Blood vessel replaced by prosthesis; Procedure 38.48 Resection and replacement of lower limb arteries
I71, I739, I790, R02, Z958,
Z959 I71, I739, I790, R02, Z958,
Z959
Cerebrovascular
disease 430-438 Cerebrovascular disease G450-G452, G454, G458, G459, G46, I60-I66, I670-I672, I674-I679, I681, I682, I688, I69
G450-G452, G454, G458, G459, G46, I60-I69 Dementia 290-290.9 Senile and presenile dementia F00-F02, F051 F00-F03, F051 Chronic pulmonary
disease 490-496 Chronic obstructive pulmonary disease; 500-505 Pneumoconioses; 506.4 Chronic respiratory conditions due to fumes and vapours
Pulmonary disease: J40-J42,
J44-J47, J60-J67 J40-J47, J60-J67
Connective tissue disease
Rheumatologic disease: 710.0 Systematic lupus erythematosus; 710.1 Systematic sclerosis; 710.4 Polymyositis;
714.0-714.2 Adult rheumatoid arthritis;
714.81 Rheumatoid lung; 725 Polymyalgia rheumatica
Connective tissue disorder:
M050-M053, M058-M060, M063, M069, M32, M332, M34, M353
M05, M060, M063, M069, M32, M332, M34, M353
Ulcer disease Peptic ulcer disease: 531-534.9 Gastric, duodenal and gastrojejunal ulcers
Peptic ulcer: K25-K28 K25-K28 Mild liver disease 571.2 Alcoholic cirrhosis; 571.5 Cirrhosis
without mention of alcohol; 571.6 Biliary cirrhosis; 571.4-571.49 Chronic hepatitis
Liver disease: K702, K703,
K717, K73, K740, K742-K746 K702, K703, K717, K73, K74
Diabetes 250-250.3 Diabetes with or without acute metabolic disturbances; 250.7 Diabetes with peripheral circulatory disorders
E101, E105, E109, E111, E115, E119, E131, E135, E139, E141, E145, E149
E101, E105, E106, E108, E109, E111, E115, E116, E118, E119, E131, E135, E136, E138, E139, E141, E145, E146, E148, E149 Hemiplegia Hemiplegia or paraplegia: 344.1
Paraplegia; 342-342.9 Hemiplegia
Renal failure: 582-582.9 Chronic glomerulonephritis; 583-583.7 Nephritis and nephropathy; 585 Chronic renal failure; 586 Renal failure, unspecified;
588-588.9 Disorders resulting from impaired renal failure
Renal disease: N01, N03, N052-N056, N072-N074, N18, N19, N25
I12, I13, N01, N03, N052-N056, N072-N074, N18, N19, N25
Diabetes with end organ damage
Diabetes with chronic complications:
250.4-250.6 Diabetes with renal, ophthalmic, or neurological manifestation
Diabetes complications: Any tumour Any malignancy, including leukaemia
and lymphoma: 140-172.9 Malignant neoplasm; 174-195.8 Malignant neoplasm; 200-208.9 Leukaemia and lymphoma
Cancer: C0-C3, C40, C41, C43, C45-C49, C5, C6, C70-C76, C80-C85, C883, C887, C889-C901, C91-C93, C940-C943, C9451, C947, C95, C96
C00-C67, C80-C97 Leukaemia
Lymphoma Moderate or severe
liver disease 572.2-572.8 Hepatic coma, portal hypertension, other sequalae of chronic liver disease; 456.0-456.21 Esophageal varices
Severe liver disease: K721,
K729, K766, K767 K721, K729, K766, K767
Metastatic solid
tumour 196-199.1 Secondary malignant neoplasm of lymph nodes and other organs
Metastatic cancer: C77-C80 C77-C79 AIDS 042-044.9 HIV infection with related
specified conditions HIV: B20-B24 B20-B24
64 5.3.2 Mental health
While mental health conditions have not been included in the Charlson index, they have a significant impact on a patient's care needs, such as higher overall utilisation of care,122 more unplanned and potentially preventable hospital admissions,123 and more readmissions.124 Moreover, patients with mental health conditions require a care model which integrates with mental health services.125 While this study did not have access to linked mental healthcare provider data, the physical healthcare needs of these patients were explored by creating a mental illness flag, similar to the chronic condition flags.
There exists no standard classification for severe mental illness, however most studies include psychosis (including or limited to schizophrenia and schizoaffective disorders) and bipolar disorder (see Table 12). This research used the codes defined by White et al.,126 as it provides a wide definition for psychosis and bipolar disorders that includes the criteria set out by Chang et al.,127, while excluding drug-induced and depression-related psychosis that NHS England includes.128 The latter two may be temporary states rather than enduring mental illnesses and were therefore excluded.
Table 12: Overview of severe mental illness definitions
Source Definition Conditions and ICD-10 codes White et al.126 Severe mental illness are
“a range of serious and chronic conditions including
127 “SMI [Severe mental
illness] which might include schizophrenia, Bipolar affective disorder: F31 Substance use disorder: F10-F19 Depressive episode: F32
Recurrent depressive disorder: F33 NHS
England128 Severe mental illness are
“patients with psychoses, including schizophrenia and bipolar affective disorder”
Psychosis: F20-F29
Drug induced psychosis: F105, F115, F125, F135, F145, F155, F165, F195
Bipolar disorder: F302, F312, F315 Depressive episodes (with
psychosis): F323 and F333 HSCIC129 Mental health prevalence
based on “people with schizophrenia, bipolar disorder and other psychoses”
N/A
5.3.3 Learning disabilities
Learning disabilities is another group of conditions that significantly impacts a person’s care needs, but that is not included in general morbidity indices. Like mental health, it is included in the NHS Quality and Outcomes framework, however there are no specific conditions listed for this metric.129 While in England the term learning disabilities is common, the World Health
Organisation uses “mental retardation”.130 This group of conditions is covered by ICD-10 codes F70-F79,131 and was included as such in this research.
5.3.4 Creating the LTC flags
To create the chronic condition, mental health and learning disabilities flags in the acute dataset, all diagnosis fields in all inpatient hospital episodes during the study period were reviewed for the relevant ICD-10 codes. In primary care, where Read codes are used, the ICD codes for mental health and learning disabilities were mapped to Read codes according to the ICD-10/Read Cross Mapping (Version 3) created by the UK Terminology Centre.132 CPRD uses Medcodes rather than Read codes in its databases, so the Read codes for the various conditions were translated to Medcodes using the CPRD Medical Dictionary.
Some conditions can also be derived from fields other than diagnosis, for example diabetes from recorded HbA1c test scores. However, this is not true for all conditions, and therefore only diagnoses codes were used to avoid bias. If there was any diagnosis of the condition over the study period, in any dataset, the patient was given a flag for the condition.
In addition to the various LTC flags, the database also included a metric specifying whether the patient had multiple LTCs and a count of LTCs, both based on the conditions specified above.
5.3.5 Missing data
For both acute and primary care, identifying chronic conditions will be subject to missing data.
Doctors are more likely to record the chronic conditions they treat, rather than describing a patient’s full health status. However, it is likely that if a condition is related to any healthcare need, it will have been recorded in either dataset. If a patient technically has a chronic condition but never requires care, this does not affect the healthcare system and there will be little impact from not recording the condition.