Statistical analyses - Material and methods

5. Material and methods

5.4. Statistical analyses

STATA ® software version 11 and 12 was used for all statistical analyses in the thesis. Microsoft Excel ® was used for labelling and preparing of data for export to STATA ® and all ranking of variables for drug utilization (DU 90%) analyses. The Table 6.1 (page 36) gives a summary of statistical methods used in the thesis.

Paper I

The Bland Altman plot. The methodological considerations for appraising the agreement between one measurement unit and another unit, perceived as the “gold standard”, were reviewed by Bland and Altman in a seminal work (1986). ¹³⁸ A general error in statistical analyses is to report the Pearson’s correlation coefficient as a measure of agreement between methods. A very

Table 6.1. Statistical analyses in the thesis, overview according to purpose and paper.

Method Purpose in the study Paper

Bland-Altman statistics Visual (graph) and measure (limits of agreement) of the reliability of one method of measurement, compared to a perceived “gold standard”.

Intraclass correlation Assesses the degree of agreement of two methods of measurement applied on the same sample.

Student-T test (two-sided) Compare continuous variables between groups. II, II.

IV Single linear regression Determines the contribution of one (independent) factor to

a single outcome (dependent) variable.

Pearson correlation coefficients

Investigates correlations between numerical variables. II, III

Analysis of variance (ANOVA)

Evaluation of trends in antibiotic use. III

Kruskal-Wallis rank-sum test Trend analysis if group variances are large, (as determined by the Bartlett’s test for equal variances).

III

Multiple linear regression Determines the contribution of several (independent) factors to a single outcome (dependent) variable.

poor agreement may produce very high correlations, e.g. the agreement between the series 10-20-30-40 and 100-200-300-400 is very poor, but the correlation is 1.0. The Bland Altman method is to plot, in an x-y diagram, the difference in parallel values of the two measurements against their mean value. The term “limits of agreement” was introduced by the authors, defined as the

± standard deviation (SD) of the average difference between the methods (provided these are normally distributed). It will then be a clinical decision to determine whether the values range, which represents the 95% error range by the new method, is clinically relevant or not. In a visual inspection of the plot, one may also determine if the differences are similarly distributed for various ranges of means – that is, if the limits of agreement remain constant or changes significantly for low and high mean values. In our study, the ranges were the percent DDD representing ± one SD of the average mean DDD values.

Intraclass correlation coefficient (ICC). For determining agreement between the two methods for measurement, we also used the ICC. ¹³⁹ Variance in ICC is relying on the pooled variability of the two matched measurements, and not the variability in each group as in Pearson’s correlation.

The objective of the ICC is to determine how much of the total measurement variation that is due to difference between the methods and how much is due to difference in measurement for each method (the “within” variation). The mean square of the differences between the values (MSB) and the mean square within them (MSW) are used for calculation

ܫܥܥ ൌ

^{ெௌ஻ିெௌௐ}

ெௌ஻ାெௌௐ

If the “within” difference MSW is very small, ICC will be close to one. Conversely, if MSB = MSW, ICC will be zero and there is absolutely no agreement between the measures.

Interpretation of ICC as for Pearson’s correlation, with an ICC value of > 0.7 considered to indicate a sufficient agreement between the methods. ICC was calculated with the STATA ® procedure ‘icc’.

Paper II

Linear regression. A linear regression procedure (‘regress’ in STATA) was used to analyse the temporal trends for different groups of antibiotics (total antibiotic use, broad-spectrum antibiotics and all penicillins). The regression gave the regression coefficient with confidence intervals, i.e. the changes in DDDs per year, and the effect size (or coefficient of determination)

which, in linear regression, is interpreted as a correlation coefficient. Lastly, an adjusted R² gave the strength of the relationship for the equation, which is a conservative estimate of how large a part of the variation in antibiotic use could be explained by the time factor.

Paper III

We used Pearson correlation coefficients (STATA ® procedure ‘pwcorr’) for univariate analyses of correlation between numerical variables. The Chi-square test was used for categorical variables in contingency tables. ¹⁴⁰ For differences between group means of normally distributed numerical data, a Student-T test (STATA procedure ‘ttest’) was used, while analysis of variance (STATA procedure ‘anova’) was used for multiple-group analysis. For all analyses, a two-tailed P < 0.05 was considered statistical significant.

For the drug utilization 90% (DU90%) ranking, ¹³³ antibiotic substances and groups were ranked according to their total amount of DDDs (respective haDDDs) over the study period. The list was limited to the antibiotics that, accumulated, fell within 90% of the total antibiotics used.

Paper IV

The Pearson correlation coefficient was calculated as described for Paper III. Multiple linear regression (STATA ® procedure ‘regress’) was used for all 12 regression models with a backward stepwise approach. As a test for collinearity between independent variables, we calculated the variance inflation factor 8’vip’) at each regression step and rejected the variable with the largest vif value until no variable had a vif > 5.¹⁴¹ The coefficient of determination R² was calculated for all regression models. R² may be defined as the proportion of possible perfect prediction represented by the regression model. ¹⁴² Because of a relatively large number of independent variables, the adjusted R square (aR²) was calculated for each regression model to show how well it fitted the data. For all analyses, aR²> 0.3 was considered a strong correlation, i.e. inferring a high degree of causal influence. A two-tailed P-value < 0.05 was set as a limit for statistical significance.

6. Synopsis of the studies

Here, we give the synopsis (abstracts) of the four publications that this thesis is based on. To aid in the interpretation of our findings, an extra Table 6.1 (page 41) for the Paper II and a Figure 6.1 (page 42) for the Paper III supplement the results given in the published articles.

In document Hospital antibiotic use in Norway. Epidemiology and surveillance methodology. (Page 46-49)