Score-based models - Data Mining-Based Disease Prediction

Chapter 2: Background and Related Research

2.3 Data Mining-Based Disease Prediction

2.3.1 Score-based models

Scoring-based methods are perhaps one of the earliest and most intuitive methods of disease prediction. In traditional medicine, before the aid of computerised systems, doctors were obliged to observe the symptoms and medical history of patients and to evaluate them based on a combination of experience and well-established case studies to predict the risk of disease or comorbidity. This idea of symptoms-based prognosis has led to the development of many scoring-based methods to ease and standardise the assessment of disease and related risk in various healthcare settings. In these methods, scores are assigned to various factors such as physiologically observable conditions, demographic information or family history. Once a patient’s score is calculated, it is then normally evaluated against an interpretation table that describes the probable range of scores and their corresponding meanings. The ‘risk score’ thresholds are established based on clinical studies of different cohorts, often employing multivariate regression analysis methods. Risk score-based methods are simple to use, take less time to assess and are often deployed in web-based health assessment calculators for consumer use. The following Table 2.1 shows some sample factors that are considered while assigning scores.

Table 2.1: Sample parameters to score against in score-based system

Category Example of risk factors for scoring

Demographic factors Age, sex

Population subgroups based on age (e.g., children, adult, elderly), ethnicity (e.g., Asian, African) Behavioural factors Smoking, drinking, duration of physical activity, sun

exposure time, etc.

Genetic Family history of diseases

Biomedical factors Physical traits: weight, height

Diagnostics: blood pressure, cholesterol, blood glucose level, etc.

Various score-based systems are currently in use. For example, the Charlson Comorbidity Index (Charlson, et al., 1987) was proposed as early as 1987, and predicts the 10-year mortality for a patient by ranking a range (total 22) of demographic factors (e.g., age) and comorbid conditions (e.g., heart disease, cancer, AIDS). A higher overall score means a higher chance of mortality of the patient within next 10 years. The Charlson Comorbidity Index has been extended and adapted into different variants, such as Charlson/Deyo, Charlson/Romano, Charlson/Manitoba, Charlson/D’Hoores, Charlson/Ghali or Charlson/Dartmouth. Some of these methods have migrated to newer coding schemes such as ICD-9. The Elixhauser index (Elixhauser, et al., 1998) is another similar index for finding mortality rates (discussed in details in the next chapter). It has slightly better prediction performance (Sharabiani, et al., 2012), especially when predicting mortality beyond 30 days. Another widely used scoring system is APACHE-II (Wong and Knaus, 1991) (versions I–IV are available, version IV is only used in the US). It scores against 12 variables with a range of 0–71 to assess the condition of ICU patients in the first 24 hours of admission (higher values indicate greater severity). Other similar ICU scoring methods include SAPS (Simplified Acute Physiology Score, version 1-3) and MPM (Mortality Prediction Model, version 1, 2). Later versions of these systems usually incorporate larger datasets, calibrated against different hospital settings, and can give predictions of different phases of ICU admission (Breslow and Badawi, 2012). These models provide a

good way for physicians to assess patient condition, often without the help of sophisticated software. They usually work well in a specific healthcare setting like ICU, where it is important to determine quickly the required aggressiveness of treatment based on the patient’s chance of mortality. However, these score-based systems are mostly unsuitable for predicting a patient’s long-term chance of contracting a chronic disease where a huge volume of admission and related medical history data is available. Scoring-based methods are also used to group diseases into several types according to their complexity. This is often done to estimate the hospital resources needed to ensure quality of care and facilitate reimbursement. The Diagnosis-Related Group (DRG) (Fetter, et al., 1980) is one set of disease codes, implemented in the US in the early 1990s and later adopted internationally with local modifications in different countries. The original motivation of the DRG was to develop a classification system that could identify the ‘products’ received by the patient to aid in medical financing. However, as the healthcare industry has evolved since its initial introduction, the DRG codes have been modified and extended to support different objectives with a higher level of sophistication and precision (Baker, 2001). Modifications were also made by governments to meet national policies. A grouper software based on ICD codes scores the medical conditions of the patient, including diagnosis codes, comorbidities, discharge status and demographic information, and determines the DRG grouping, on which medical financing and reimbursement are planned. While this grouper software is useful for hospital resource management and cost planning, it does not predict future disease risk, but rather gives an estimation of the current condition. Overall, these scoring-based methods are limited to a smaller scope within the healthcare setting, and mostly cannot capture the interrelations or comorbidities of different disease codes. Table 2.2 shows some example of different scoring systems in practice.

Table 2.2: Example of different scoring systems

Score Name Predicts Prediction Range

Framingham Risk Score Different cardiovascular

diseases, type-2 diabetes 10 years (for most cardiovascular diseases) 30 years (one variant for CVD) 8 years (diabetes)

Reynolds Risk Score Cardiovascular disease 10 years

SCORE Cardiovascular disease 10 years

QRISK Cardiovascular disease 10 years (QRisk, QRisk-2) Lifetime (QRisk-lifetime) QDiabetes (QDScore) Type-2 diabetes 1–10 years

Diabetes Risk Calculator Type-2 diabetes 8 years ARIC Diabetes Risk

Calculator Type-2 diabetes 9 years

Gargano Mortality Risk

Score Mortality of type-2 diabetes pre-diagnosed patients 2 years Charlson Comorbidity Index Mortality 10 years

APACHE-II ICU mortality 24 hours of admission

SAPS II ICU mortality 24 hours of admission

In document Predicting the Risk of Chronic Disease: A Framework Based on Graph Theory and Social Network Analysis (Page 52-55)