Clinically important treatment response at three
Step 5: Interpretation of models
β regression coefficients, 95% confidence intervals and statistical significance interpretation
Discussion of findings
172 Step 1
The first step in model building was to explore the independent variables for
collinearity within the future multivariable model. “Collinearity” or “multicollinearity”
relates to the phenomena whereby independent variables within a regression model are highly correlated with each other, potentially leading to spurious model output results due to explaining the same variance in the dependent variable (Tu et al, 2005). Pearson’s correlations between pairs of independent variables were investigated followed by the removal of one variable from each pair of variables that were highly correlated (r of above 0.70). The decision of which of the highly correlated variables to remove was based on perceived clinical importance with the variable considered the least clinically important being removed. For example, in Model 3A, baseline WOMAC pain and stiffness were highly correlated, hence WOMAC stiffness was removed from the models, since stiffness is considered of less clinical importance than pain (Bedson et al, 2007). A further example from Model 3A was that baseline PHQ8 depression and GAD7 anxiety were highly correlated. Although both have been theoretically linked to pain modulation (Linton & Shaw, 2011) and were crudely associated with future pain outcome, GAD7 anxiety was removed as there is a greater body of evidence for the
association between depression with knee pain severity in older adults with knee pain (Cruz-Almeida et al, 2013; Collins et al, 2014; Han et al, 2015).
Step 2
The second step was to enter absolute change in physical activity, the primary independent variable of interest, and all remaining baseline independent variables into an initial multiple linear regression model (Kutner, 2005). A priori, absolute
173
change in physical activity was held within the model throughout future model building (since it is of primary interest in answering the research questions) along with the intervention arm variable and the baseline score of the dependent variable under investigation (for example baseline pain in the objective 1 pain Model 3A).
Holding the intervention arm variable within the model adjusts for any treatment effect due to the intervention received within the BEEP trial. Adjusting for the baseline clinical severity of the outcome variable in effect ensures that change in clinical outcome (pain or physical function) is modelled (Allison, 1990).
Step 3
Step three involved model building using an author controlled “backwards
elimination” strategy (as oppose to an automatic computer generated backwards elimination) (Greenland, 1989; Agresti & Finlay, 2009). This involved fitting an initial multivariable model including all the variables from step 2 and removing the variable whose regression coefficient was the most non-significant (largest p-value) and then refitting the model. This iterative process was continued until all remaining variables within the model (with the exception of the primary
independent variable of interest and those held a priori regardless of statistical significance) were significant. Some authors recommend caution in using variable selection methods for model building based only on variable statistical significance since they may exclude clinically important variables or lead to the inclusion of variables that are not sensible (Greenland, 1989; Agresti & Finlay, 2009), however, since all the covariates included in the model had both theoretic
plausibility and supporting research to be potential confounders and key variables relating to the research question were held a priori, this variable selection strategy was deemed appropriate and unlikely to lead to inappropriate variable selection.
174 Steps 4 and 5
Once the final models were built, post hoc power calculations for sufficient sample size, model assumption tests and collinearity checks were carried out. Adequate sample size is required in regression modelling for both sufficient power to reject the null hypothesis when it is false (“type II error”), and for precise estimates of model output independent variable regression coefficients (Maxwell, 2000; Sim &
Wright, 2000). To paraphrase, power calculations in regression modelling relate to the ability of the model to detect statistically significant variable coefficients when they exist, and ensure confidence intervals (i.e. the uncertainty) around them are not too large. Regression models that contain too many independent variables for their sample size/outcome events are considered to be “overfitted” (Hosmer &
Lemeshow, 2000). Overfitting is typically characterised by unrealistically large coefficients and or confidence intervals (Hosmer & Lemeshow, 2000; Menard, 2010). Current literature suggests multiple linear regression models should
include around 2 to 15 outcomes per predictor variable to avoid overfitting (Green, 1991; Babyak, 2004; Austin & Steyerberg, 2015). Considering a conservative estimate, based on the 514 fully imputed outcomes in the BEEP dataset, and 15 outcomes per independent variable, the model could include 34 independent variables in the final model. Model assumptions were also checked post hoc by;
using scatter plots and best fit lines to check for adequate linearity between independent and dependent variables; using residual versus fitted plots to check for homoscedasticity, and; using histograms of residuals to look for a normal distribution (bell shape with mean of zero) (Kutner, 2005; Agresti & Finlay, 2009) and normal-probability plots to check that the residuals follow a normal distribution throughout the range of values of the independent variables (Kutner et al, 2005;
175
Regression diagnostics UCLA Statistical Consulting Group. From:
http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htmref,
accessed: August 2015). A further test to investigate and quantify collinearity was subsequently carried out using the “variance inflation factor” (VIF) statistic
(O’Brien, 2007). There is some debate in the literature as to what constitutes a cut point for unacceptable collinearity however, scores over 2.5 were considered as a cut off for high collinearity (Kutner et al, 2005; O’Brien, 2007; Allison, 2012 When can you safely ignore multicollinearity? From:
http://statisticalhorizons.com/multicollinearity, accessed: July 2015). The final step was to report β coefficients, 95% confidence intervals and statistical significance from the multiple linear regression models and interpret the findings.
0) Sensitivity analyses
A number of alternative statistical models were considered for this chapter some of which were carried out as sensitivity analyses. Sensitivity analyses allow data analysis under more than one set of assumptions or models (Sim & Wright, 2000 Haynes et al, 2006). They can be useful as post-hoc tools to further explore or validate primary findings. Within this chapter, sensitivity analyses were used to validate primary findings (for example by using complete case analysis), to explore clinically meaningful categorical independent variables (for example dichotomous important increase in physical activity or not) and to adjust for different
independent variables (for example intervention arm variable and WOMAC pain or function variables).
The first sensitivity analysis (Sensitivity analysis I) investigated complete case analysis. Since assumptions were previously made that missing data were
missing at random (chapter 4, section 4.3.4), it is expected that the complete case
176
analysis would be very similar to the multiple imputed analyses. Any potential difference may reduce confidence in the primary findings and or imputation
process and warrant further exploration of the imputation process and or the data.
A subsequent sensitivity analysis was the minimally important change in physical activity model (Sensitivity analysis II). This analysis involved substituting the continuous absolute change in physical activity variable with a dichotomous
“minimal important physical activity increase” variable (categorised into “no” as the reference and “yes” as the alternate category) to see if this predicted clinical
outcome. This model is of interest since error in the measurement of the absolute PASE score may be responsible for modest changes in PASE score (Svege et al, 2012; Bolszak et al, 2014). Minimal important change has been defined in various ways within the literature but can primarily be split into distribution based or anchor based methods (de Vet et al, 2006; Revicki et al, 2006, 2008). Anchor methods require either patient input or clinical expertise consensus to determine a clinically important amount of change in the measure of interest or validated comparative measures (for example, accelerometry) or global ratings of improvement by participants (Revicki et al, 2008). Distribution methods include calculating half a standard deviation of the baseline measure value (Norman et al, 2003), using the standard error of measurement (SEM), and intra-class correlation (ICC), which are described in detail outside this thesis (de Vet et al, 2006, 2010, Polit and Yang, 2015). Since the clinical interpretation of the PASE is not intuitive in its scaling, and there was no available anchor (indeed there is no consensus within the literature as to what constitutes a minimal clinically important change in physical activity for older adults with knee pain) distribution methods were considered.
However, since calculating the ICC and SEM require repeated measures in the
177
absence of true change in physical activity (which occurred due to the BEEP interventions) only the half standard deviation of the baseline mean PASE score described by Norman and colleagues remained an option (this was calculated as 42). In addition, it was deemed appropriate that any important change score should still be larger than measurement error, hence a surrogate minimal detectable change score (MDC) (i.e. change greater than measurement error) from a sample of older adults with hip pain (87) (Svege et al 2012) was also used and the largest of the two numbers selected as the cut point for minimal important change (i.e. 87).
A further sensitivity analysis involved not adjusting for the intervention arm variable (Sensitivity analysis III). As the BEEP trial tested three physical activity
interventions, it was considered important in the primary analysis to a-priori adjust for the intervention arm, which could potentially confound any relationship between change in physical activity and clinical outcome change. However, since there was no statistically significant difference in pain and function between the three intervention groups in the BEEP trial (Hay et al, 2015 under review), sensitivity analyses were carried out for each adjusted regression model without controlling for the intervention arm variable to explore if this altered the model output
coefficients.
There is some debate amongst epidemiologists and statisticians as to whether to adjust for the same dependent variable or a surrogate marker of baseline clinical severity since this may lead to over-adjustment (Allison, 1990; Croft and Ogollah personal communication 2014). Although this is more likely to be the case if adjusting for independent measures of clinical severity at more than one time point (“autocorrelation” of independent variables) (Kutner, 2005), this uncertainty was
178
addressed in the final sensitivity analysis (Sensitivity analysis IV) which substituted the WOMAC pain baseline adjustment with WOMAC function adjustment in the Model A pain outcome models and vice versa with the Model B function outcome models. By carrying out this sensitivity analysis it is possible to see if the results are altered by adjusting for an alternative baseline clinical severity variable.
6.4.2 Methods to address objective 3
In order to investigate if clinically important treatment response at three months can be predicted by change in physical activity, univariable unadjusted
associations were explored initially, followed by adjusted multivariable model building using multiple logistic regression.
I) Predictor and outcome variables
Predictor variables used for objective 4 were identical to objectives 1 and 2
(section 6.4.1), however, OMERACT-OARSI responder criteria (described in detail in chapter 4, section 4.3.3) (Pham et al, 2003) were used as the dichotomous clinical outcome variable rather than WOMAC pain and function. OMERACT-OARSI at three months was selected since this time point is following the period of greatest mean change in physical activity (i.e. baseline to three months- see
chapter 4, section 4.4.3)
II) Univariable analyses
Logistic regression was used to investigate the relationships between change in physical activity, attitude and beliefs about physical activity, sociodemographic and clinical variables with dichotomous OMERACT-OARSI criteria. Like linear
regression, logistic regression is also a mathematical equation that can be used to describe the relationship between dependent variables and one or more