• No results found

6.3.2 ‘Strength of causal inference’

6.3.4 Statistical methods used and reported

Figure 6.1 suggests that articles expressing strong causal inference were less likely to have used statistical methods designed to improve control of adjustable biases. Of the three categories: multivariable regression, propensity score methods (compared to other

multivariable regression methods), and sensitivity analysis, each control for these biases in different though related ways.

When a study used an inactive control as the comparative group intervention — ‘no intervention’ or ‘usual care’ — then the proportion expressing strong causal inference was found to be around half that of ‘Not strong’ (Figure 6.2), compared to studies with two or more active interventions compared, where ‘Not strong’ and ‘Strong’ causal inference were approximately equal.

Also in Figure 6.2, when studies focused on unintended harms or adverse effects of an intervention, such as drug side-effects or long-term health risks, they were less likely to use strong causal language in their conclusions than if they focused on the positive health benefits of an intervention, such as improved symptoms or survival.

The final graphs in Figure 6.3 suggest no link between the ‘strength of causal inference’ and authors who reported their method of missing data handling. But a lower chance of using strong causal language was found for articles that had adequately described their

methodology, where we thought a clear picture of the methods they used could be obtained from their reporting.

An alternative way to compare these proportions is to calculate an odds ratio using univariate logistic regression, and Table 6.9 presents odds ratios with corresponding

confidence intervals for each of the comparisons in Figure 6.1, Figure 6.2 and Figure 6.3. We also explored a number of possible multivariable models, however, with considerable uncertainty over the causal structure of the relationships between the variables, it was decided that too many possibilities existed and this would make interpreting such models difficult. Some relationships are briefly explored in Table 6.10 and Table 6.12. The exercise included an attempt to create a causal diagram, and it was the difficulty of doing this that led us to two realisations. One was to increase our doubt that some of the variables are really

causes of the outcome (strong causal language), with the type of software used and whether the methodology was adequately described considered the most unlikely to be causes. The second was the level of uncertainty over which variables might be causes of other variables, such that they might act as confounders.

There was only very weak evidence suggesting that the result of comparing group outcomes had an effect on the ‘strength of causal inference’ in study conclusions (Table 6.10). When stratified by the type of outcome there was some difference; however, this largely just reflected the difference seen in the second graph of Figure 6.2, where ‘strong’ causal

language was much more likely if the outcome was a health benefit than if a harm to health was the outcome.

There appears to be no obvious association between strong causal inference and study size (Table 6.11), while for intervention type (Table 6.12), a difference can be seen between some study types, notably drugs, and a number of the other intervention types such as surgery. Also displayed was the relationship between intervention type and whether an inactive control was used. In most cases, studies with intervention types associated with strong causal language were also more likely to not use an inactive control.

No clear difference in ‘strength of causal inference’ is apparent between different author locations, in terms of the continent where they all reside (Table 6.14). But journals in the categories of Infectious Diseases (60%), Gastroenterology & Hepatology (59%), and Surgery (57%) had the highest proportion of studies with causal inferences rated strong, while Critical Care Medicine (13%), Urology & Nephrology (13%), and Cardiac & Cardiovascular Systems (24%) journals had the lowest proportion.

Finally, studies that used SAS, Stata or R, appeared to use weaker causal language, on average, compared to studies using SPSS (Table 6.15).

6.3 Results

6.3 Results

Table 6.9 Results from logistic regression with outcome: ‘Strong’ causal language Univariate logistic model results for each variable

Odds

ratio 95% CI P No multivariable method used

(compared to use of a multivariable method) 2.7 (1.2 – 5.7) 0.012

Multivariable but no propensity score method used

(compared to use of a propensity score method) 1.8 (1.1 – 3.1) 0.031

No sensitivity analysis performed

(compared to performing one) 2.1 (1.3 – 3.4) 0.004

Methodology not adequately described

(compared to providing adequate description) 1.8 (1.0 – 3.3) 0.045

Comparison group used active control intervention

(compared to inactive intervention or usual care) 1.8 (1.1 – 2.9) 0.016

Outcome is improvement in health or health benefit

(compared to a harm to health) 2.6 (1.6 – 4.3) 0.000

Group results similar or no difference reported

6.3 Results

Table 6.10 Group outcome comparison result and the ‘strength of causal inference’ Percentages relate to row N

N

‘Strong’ causal language Overall

Similar (null result) 92 41 (45%)

Different 196 74 (38%)

Total 288 P = 0.27 

If outcome is harm to health

Similar 44 13 (30%)

Different 80 21 (26%)

Total 124 P = 0.69

If outcome is health benefit

Similar 48 28 (58%)

Different 116 53 (46%)

Total 164 P = 0.14

Chi-squared test

Table 6.11 Study size and ‘strength of causal inference’ Percentages relate to row N

Study Total Subjects N (288) ‘Strong’ causal language 200 - 799 75 35 (47%) 800 – 4,999 75 27 (36%) 5,000 - 29,999 63 35 (56%) 30,000 - 10,912,834 75 18 (24%)

Table 6.12 Intervention type and ‘strength of causal inference’ plus type of control Percentages relate to row N; highlighted values: high = magenta, low = blue

Intervention Type N (288) Strong causal language Inactive control

Assisted reproductive tech. 19 10 (53%) 8 (42%)

Drug 120 41 (34%) 76 (63%) Mix 15 5 (33%) 10 (67%) Other* 56 17 (30%) 40 (71%) Radiation therapy 6 3 (50%) 2 (33%) Surgery 60 31 (52%) 22 (37%) Vaccine 12 8 (67%) 12 (100%)

* For example, hospital procedures that do not fall under the other intervention types; interventions relating to quality or timing; other health services

Table 6.13 Journal Category and the ‘strength of causal inference’

Percentages relate to row N; highlighted values: high = magenta, low = blue

N (320†)

Strong causal language

Cardiac & Cardiovascular Systems 17 4 (24%)

Critical Care Medicine 15 2 (13%)

Gastroenterology & Hepatology 17 10 (59%)

Infectious Diseases 15 9 (60%)

Medicine, General & Internal 28 7 (25%)

Obstetrics & Gynecology 35 15 (43%)

Other categories 116 48 (41%)

Peripheral Vascular Disease 19 9 (47%)

Surgery 42 24 (57%)

Urology & Nephrology 16 2 (13%)

6.3 Results

Table 6.14 Other study features and the ‘strength of causal inference’ Percentages relate to row N

N