Philosophy and Methods
5.5 Design and Execution of this Study
5.5.6 Tabulation and Reporting
I used binomial tests and Pearson’s chi-square tests (often referred to simply as chi-square tests) to analyse the hypotheses that involved frequency data.
Binomial tests determine the exact statistical significance of deviations from theoretically expected distributions in cases when there are only two categories. Chi-square tests are commonly used as an equivalent to binomial tests when there are more than two categories. Put more succinctly, binomial and chi-square tests allow researchers to determine whether frequencies of occurrences across categories deviate from randomness to a statistically significant extent (e.g., in the case of two categories, whether the observed frequencies differ significantly from a 50:50 or chance level).
Like all statistical procedures, binomial tests and chi-square tests rely on certain assumptions. In the case of binomial tests, the main assumption is that the variable is dichotomous with two values that are mutually exclusive and exhaustive in all cases (i.e., that there are two categories of frequency data that, if random, would be evenly balanced). Binomial tests also rely on an assumption of
independence of observations. This means that the same observations must appear in only one category and, in this study, meant that in some cases overlapping articles needed to be removed from the analysis. As in the case of all tests of significance, binomial tests also assume that random sampling has occurred. Chi-square tests share with binomial tests the assumptions of random sampling and independence of observations. In addition, adequate cell sizes are assumed. A common application of this principle, which was applied in this analysis, is that at least 80% of expected cell frequencies must be greater than five and all expected frequencies must be equal to or greater than one. Both binomial test and chi-square test calculators are widely available online and in statistics packages. For the purposes of this study I used an Excel binomial test calculator and conducted chi-square calculations using an online calculator (Preacher, 2001).
Both binomial and chi-square tests produce significance values expressed as p. The smaller the p value, the more statistically significant the finding is.
Statisticians commonly accept a probability or p value of less than .05 as indicative of significance (GraphPad)and this value is routinely identified as an appropriate minimum significance level in statistics textbooks (e.g. Mendenhall et al., 2009) . Put simply, this means that an observation has a probability of less than 5% of
128
occurring by chance. Clearly, the more tests one conducts the more likely it
becomes that one or more of the findings will be due to chance. This is particularly the case if the p values obtained are close to the cut-off threshold of .05. The Bonferroni correction, and modifications thereof, have been proposed as means of “correcting” for the increased likelihood of chance playing a role when multiple tests are conducted (e.g. Simes, 1986). The application of this adjustment requires that the chosen significance level (e.g., .05) be divided by the number of tests performed to produce a lower significance threshold. Although commonly applied, the
Bonferroni correction has been the subject of detailed criticism (e.g. Perneger, 1998). The most obvious problem with the procedure is that it implies that comparisons should be interpreted differently depending on how many other tests have been conducted. For example, a single test might or might not be considered significant depending on whether it was conducted alone or in a study involving multiple tests. This does not seem logical and, if Bonferroni corrections were universally implied, would make the comparison of results extremely difficult. In this study I applied the traditional threshold for significance of p = .05 and did not apply Bonferroni
corrections because of the shortcomings mentioned above.
As already described, I considered three measures of article prominence in this study: appearance of an article on page 1, designation of an article as an editorial, and length of article. As outlined in Table 6.1 in Chapter 6, there was a very small number of relevant instances of the first two of these criteria (six articles in total). Hence I chose to disregard both appearance of an article on the front page and designation of an article as an editorial in the final analysis, but I considered length of article in relation to each hypothesis for which a significant finding was obtained. The article length data concerned continuous entities (i.e. interval
measurement) rather than frequencies within categories (i.e. nominal measurement). As such, the data were not directly amenable to analysis using binomial or chi-square tests. Initially, I considered the use of analysis of variance (ANOVA). However on exploratory testing it became apparent that key assumptions of ANOVA, including sample size and a normal distribution of data, were not met. Consequently, I chose two non-parametric alternatives: the Wilcoxon-Mann-Whitney test and the Kruskal- Wallis One-Way Analysis of Variance by Ranks. These can be considered to fulfil a similar purpose to ANOVA, but they use the ranks of data rather than their values. Importantly, they also do not assume a normal distribution. Both the Wilcoxon-
129
Mann-Whitney test and the Kruskal-Wallis One-Way Analysis of Variance by Ranks serve to establish whether the values or cases in different groups differ significantly from each other. Sample values almost invariably differ somewhat and these tests assess whether the differences signify genuine population differences or whether they merely represent the type of variations that are to be expected among random samples from the same population (Siegel and Castellan, 1988). The main difference between the two tests is that whereas the Wilcoxon-Mann-Whitney test is suitable for two groups only, the Kruskal-Wallis test can be used with three or more groups. For the purposes of this study all Wilcoxon-Mann-Whitney calculations were
conducted by hand using the appropriate formula and all Kruskal-Wallis calculations were conducted using an online calculator (McDonald, 2009).
As already noted, this study covers a 16-year period from 1994 to 2009. To enable longitudinal trends to be described I divided the data into three time periods from 1994 to 1999, 2000 to 2004 and 2005 to 2009. I based analyses that considered changes across time on these divisions and compared them against the total number of Irish Times articles published during each of those time periods. I used Nexis UK to calculate these totals, which were as follows: from 1994 to 1999, 364,049 Irish Times articles were published; from 2000 to 2004, 297,816 Irish Times articles were published; and from 2005 to 2009, 267,247 Irish Times articles were published.