• No results found

Ranking the logistic regression models

Ranking by the Likelihood Ratio Test statistic (ΔG2)

For the final analysis, the single, two-gene and three-gene LR models were ranked in decreasing order of the value for the Likelihood Ratio Test statistic (ΔG2). This value was

calculated automatically by SPSS to compare each model with the null model (constant only model without predictor variable(s)).

The ΔG2 is calculated using the Likelihood Ratio Test (LRT). This statistic was chosen as

a measure of overall performance of each model as it is considered the most accurate goodness-of-fit test in LR analysis, particularly in small datasets (307). The goodness-of- fit is an essential characteristic of the overall performance of a predictive model and describes how well the model fits the observed data, i.e. the deviance between the observed and predicted outcomes. A smaller deviance indicates a better fit to the observed data. In this study, it described how well the model predicted the observed outcomes for the patients (categorised correctly into the recurrent or non-recurrent group).

Likelihood Ratio Test in SPSS

The LRT is used to compare the goodness-of-fit of two models, a restricted model (with fewer predictors) and a final model (more predictors), where the models are hierarchically nested, i.e. the final model differs from the restricted model by the addition of one or more predictors.

SPSS uses the LRT to compare the performance of each model in relation to the null model (constant only model without predictor variable(s)). The LRT statistic (ΔG2) is

automatically calculated by the software and describes the improvement in fit under the final model (with predictor variable(s)) in comparison to the fit under the null model. ΔG2 is denoted χ2 in SPSS as it follows a chi-squared distribution.

SPSS calculates χ2 after it has selected the parameter estimates (coefficients) for the final

model, which achieve the greatest likelihood of the observed outcome (see Appendix 5 for a fuller description of LR analysis). It separately fits the null model and the final model to the data and calculates the -2*ln likelihood (-2LL) for each. This value is used rather than the likelihood to make calculation easier. The -2LL describes the deviance between

67

the observed and the predicted outcomes, with smaller values reflecting a better fit. SPSS aims to achieve the lowest possible value for the -2LL. χ2 is calculated as the

difference between the -2LL of the null model and the -2LL of the final model as shown below:

𝛥𝐺2 = 𝜒2𝑑𝑓 = −2 ∗ ln 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑓𝑜𝑟 𝑛𝑢𝑙𝑙 𝑚𝑜𝑑𝑒𝑙 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑓𝑜𝑟 𝑓𝑖𝑛𝑎𝑙 𝑚𝑜𝑑𝑒𝑙 = (−2𝐿𝐿 𝑛𝑢𝑙𝑙 𝑚𝑜𝑑𝑒𝑙) − (−2𝐿𝐿 𝑓𝑖𝑛𝑎𝑙 𝑚𝑜𝑑𝑒𝑙)

df is the degrees of freedom (the number of predictor variables added to the final model)

A higher value of χ2 reflects a larger difference between the -2LL of the null model and

the final model, therefore indicating a better fitting model.

SPSS also calculated a p-value for χ2, which was essential to identify whether the

improvement in fit obtained by adding the variable(s) to the null model was statistically significant. χ2 follows a chi-squared distribution, with the degrees of freedom (df) equal

to the difference in the number of predictor variables between the null and the final model. A larger value for χ2 results in a smaller p-value, which provides evidence that

the final model is a significant improvement over the null model.

The results section includes only models for which the p-value for χ2 was significant to

the conventional confidence level of 95% (p<0.05). Additionally, only the two-gene models are included, for which the χ2 exceeded the highest value obtained for the single

gene models; and only the three-gene models for which the χ2 exceeded the highest

value obtained for the two-gene models.

Ranking by the overall percent accuracy (used in the pilot study and method development)

During the method development, before χ2 had been selected as an appropriate

measure of overall performance, the predictive models were ranked in descending order of the overall percent accuracy as this had been the method used in the SL study. During the method development, it was decided to use the χ2 in preference to the overall

percent accuracy (2.3.4.8). To validate our decision to use χ2, the ranking of the best

performing models by χ2 was compared with the ranking by overall percent accuracy

68 Overall percent accuracy in SPSS

The overall percent accuracy is calculated automatically by SPSS and is the percentage of cases that are correctly classified. The overall percent accuracy is calculated as: [(true positive+true negatives)/total] * 100

SPSS calculated the overall percent accuracy by initially calculating the predicted probability of being in the recurrent group for each patient by incorporating the appropriate gene expression values into the LR equation (SPSS previously selected the parameter estimates for the equation). The patients were classified into either the recurrent or non-recurrent group based on their predicted probability. SPSS uses a default threshold of 0.5. Therefore, patients with a predicted probability ≥0.5 were classified as recurrent and those with a predicted probability of <0.5 were classified as non-recurrent. SPSS then compared the predicted categories with the observed categories and calculated the number of patients correctly classified into both the recurrent and non-recurrent groups. The overall percent accuracy was then calculated as the total number of correctly classified patients out of the total number of patients. Several additional parameters from the LR output, that will be referred to later, include the significance of the Wald statistic, the Nagelkerke R2 and the odds ratio.

The Wald statistic is another measure of goodness-of-fit that was included in the SPSS output. The Wald chi-square test describes the significance of each predictor variable in an LR model by testing the null hypothesis that the regression coefficient for the variable is zero. The significance for the Wald statistic could therefore be used to assess the contribution that each gene made to the model (p<0.05 would indicate a significant contribution to a 95% confidence level).

The odds ratio is the exponentiated regression coefficient for the predictor variable. Although the regression coefficient is used in the LR equation, it is difficult to interpret as it is measured in natural logarithmic units. In comparison, the odds ratio is easier to interpret. Here the odds ratio described the number by which the raw odds in favour of the patient being in the recurrent group were multiplied for a one-unit increase in expression of the gene (predictor variable).

69

The Nagelkerke R2 is also described. This is a pseudo R2 statistic, as LR does not have an

equivalent to the R2 found in Ordinary Least Squares regression. Although the

Nagelkerke R2 has limitations and should not be considered a measure of the goodness

of fit, it does offer a relative measure of the proportion of variation in outcome that is explained by a model (307).

Using the LRT to assess the effect of additional predictor