CHAPTER H LITERATURE REVIEW
3.2 METHODOLOGICAL ISSUES 1 EXPERIMENTAL DESIGN
3.2.2 STATISTICAL APPROACH AND ANALYSIS
CV rv-oWTg.
Each o f the experim ents to be described in the current thesis has one independent variable
(IV): the allocation o f Ss to dysphoric and nondysphoric groupsA The two groups were
then com pared on at least one dependent variable (DV). The D Vs w ere one or m ore
m easures o f perform ance on a reasoning task. W here only one I V was m easured, the
com parison could be carried out using univariate statistics (e.g. t-test). H ow ever, the com plexity o f many o f the experim ental designs m eant the m ost im portant and revealing com parisons involved two or m ore I Vs, thereby necessitating the use o f m ultivariate procedures (e.g. repeated m easures analysis o f variance; A NOVA). For exam ple, many o f the experim ental designs em ployed in the present series o f experim ents involved the com parison o f two groups on a task carried out in two experim ental conditions. Furtherm ore, within each condition there m ight be two or m ore m eaures, either one variable m easured at different stages o f the task (e.g. num ber o f correct responses after the first, second and third feedback), or m easures taken at different levels o f the same variable (e.g. num ber o f correct responses on problem s with one, two or three prem ises).
W hen selecting a test to analyse a data-set, cetain issues need to be addressed. The aim o f analysis is usually to get as close as possible to the 'truth' as revealed by the data. This is form alised by setting up an experim ental hypothesis, w hich predicts a difference in the D V s as a result o f the experim ental m anipulation o f the IV(s). The null hypothesis is also set up, w hich predicts there will be no effect o f the experim ental m anipulation. W hen selecting a statistical test, two possible kinds o f error are o f concern, know n as Type I and T ype II errors. A Type I error occurs when the experim ental hypothesis is accepted and the null hypothesis is rejected, even though the null hypothesis is true. On the other hand, a Type II error involves not rejecting a null hypothesis that is in fact false. The selection o f a statistical test is influenced in part by the need to achieve a balance between the
possibility o f m aking one o f these two errors. That is, the aim is to identify an
experim ental effect, but only if it really exists.
Statistical tests are often divided into tw o types: param etric and nonparam etnc. Param etric tests require the data to m eet m ore stringent assum ptions than do nonparam etric tests. W hen carrying out data analysis, param etric tests are often preferred to nonparam etric tests because o f their greater power. The 'pow er' o f a test refers to the probability o f correctly rejecting a false null hypothesis and accepting the experim ental
hypothesis. In general, when the assumptions o f a parametric test are met, the nonparametric test requires more observations than the parametric test for the same level of power. Thus, for a given set of data, the parametric test is more likely to lead to rejection of a false null hypothesis than is the corresponding nonparametric test. For this reason, in the current study, parametric tests were adopted whenever appropriate.
As mentioned above, parametric tests require the data to meet certain assumptions. The parametric tests used in the current study, analysis o f variance (ANOVA) and the t-test, make two basic assumptions. The first assumption is that the populations from which the samples were taken are normally distributed. The second assumption is that the samples are drawn from populations o f equal variances. This is known as the homogeneity o f variance assumption. (The variance is a measure of the dispersion o f a distribution, and is calculated by summing the squared deviations of each observation from the mean o f the distribution; for more detail see e.g. Howell, 1987, p. 39.) Although these two assumptions are theoretical requirements of parametric tests, in practice the tests are robust to violations of these assumptions, provided they are not too extreme. The use of samples that are relatively large and equal in size offers protection against the effects o f any violations. For example, with regard to homogeneity of variance, the general conclusion is that provided sample sizes are equal, violation o f the assumption o f homogeneity produces very small effects (Howell, 1987, p. 179). In the current study, the use of student samples had the advantage that Ss were available in relatively large numbers, and it was therefore possible to achieve adequate sample sizes with equal numbers in each group. It was therefore assumed that provided the data passed the checks described below, it was safe to proceed with parametric analysis on the basis that any violation of these two assumptions was unlikely to lead to serious problems.
While noting parametric tests are generally robust to violations of their underlying assumptions, Tabachnick and Fidell (1983; 1989) have made recommendations about issues that need to be considered before proceeding with multivariate parametric tests. Many of these are also relevant to other parametric tests. Therefore, before carrying out data analysis, the following issues were considered, and remedial steps taken where necessary. If it was found that a particular data-set was not appropriate for parametric analysis, then an alternative was sought. This is also outlined below.
Tabachnick and Fidell (1983; 1989) identified unequal sample size and missing data as potential problems when attempting multivariate analyses. Fortunately, in the current study sample sizes were equal in each experiment, and there were no missing data.
As noted above, parametric tests make the assumption that the populations from which the samples were drawn are normally distributed. In multivariate tests, this becomes an assumption of multivariate normality, implying that the sampling distributions of the mean of the various DVs in each cell and all linear combinations of them are normally distributed. The sampling distribution o f the mean is the distribution of values that would be obtained for that statistic if an infinite number of samples were drawn from the population in question and the mean was calculated for each sample. All the important information about the sampling distribution o f the mean is summed up by the Central Limit Theorem. In its simplest form, this states the sampling distribution of the mean approaches normal as N, the sample size, increases. With univariate F and large samples, the central limit theorem suggests the sampling distribution of the mean approaches normality even when the raw data do not. Tabachnick and Fidell (1989) note univariate
F is robust to modest violations of normality as long as the violations are not due to outliers (see below).
Tabachnick and Fidell (1983; 1989) note multivariate analyses assume linear relationships ,
A
; ^
among all pairs of DVs, and deviations from linearity reduce the power of the test. The assumption o f linearity is that the relationship between two variables, between one variable and a combination o f others, or between combinations o f variables from each of two sets can be described using a straight line. Normal distribution of each DV increases the chances of a linear relationship. The only way to establish a linear relationship exists is to plot each pair of DVs on a bivariate scatterplot, and then make a subjective judgement about their relationship. With a large number of variables this is both time- consuming and likely to be inexact. Therefore in the current study, each variable was inspected for its normality by screening for outliers, kurtosis and skewness (see below), and either taking steps to achieve a normal distribution or by using alternative nonparametric analyses. Homoscedasticity is the assumption that the variability in scores on one variable is roughly the same at all values of the other variable. This assumption is met when both variables have a normal distribution, and therefore the steps taken to ensure linearity (screening for outliers, skewness and kurtosis) should also ensure the data
meet the assumption o f homoscedasticity, or that any data-set failing this assumption will be identified, and appropriate steps taken (see below).
On the basis of the recommendations put forward by Tabachnick and Fidell (1983), the following procedure was adopted in the current study for each data-set. The data were first inspected for unequal sample sizes and missing data; neither were found on any
OCCOSim in the current study. Next, each variable was inspected for normality of distribution. The following procedures were carried out to identify and deal with outliers, skewness, and kurtosis, all o f which can lead to a failure of normality in a variable, and have deleterious effects on the robustness of parametric tests.
T-test and ANOVA are sensitive to outliers. Outliers are cases with such extreme values on a variable that they unduly influence statistics. They can lead to both Type I and Type II errors, with no clue as to which has occurred and they lead to results that do not generalise because of being overly determined by the outlier(s). In the current study, outliers were detected by converting each variable to standardised scores, and identifying any cases that had standardised scores in excess o f +/-3.00. The influence o f any outliers was reduced by either transforming the data as described below, or by replacing the raw score of the outlier with the score-plus-1 of the next most extreme case in the distribution, as recommended by Tabachnick and Fidell (1983). This process was carried out separately for each group.
Skewness has to do with the symmetry of a distribution in that a skewed variable is one whose mean is not in the centre of the distribution. The skewness of each variable in the current study was examined using the equation recommended by Tabacknick and Fidell (1983) to compare it with the standard error for skewness and test whether it differed significantly from a normal distribution (zero). The standard error for skewness was calculated using the equation:
ss = sqrt. 6/N
where N is the number of cases. The probability of obtaining a skewness value o f the size given if the data came from a normal distribution was then calculated using the z
distribution, where:
z = S - 0/ss
At the 1% level, a z value in excess of +/-2.58 would lead to rejection o f the assumption o f normality.
S = +/-2.58 x ss
Kurtosis has to do with the peakedness of a distribution: a distribution is either too peaked (with too few cases in the tails) or too flat (with too many cases in the tails). The kurtosis of each variable was examined using the equation recommended by Tabacknick and Fidell (1983) to compare it with the standard error for kurtosis and test whether it differed significantly from a normal distribution (zero). The standard error for kurtosis was calculated using the equation:
sk = sqrt. 24/N
where N is the number of cases. The probability of obtaining a kurtosis value of the size given if the data came from a normal distribution was then calculated using the z
distribution, where:
z = K - 0/sk
Where K is the value reported for kurtosis. At the 1% level, a z value in excess o f +/- 2.58 would lead to rejection of the assumption of normality.
K = +/-2.58 x Sk
When a variable was found to have a skewness or kurtosis value in excess o f the calculated acceptable level then steps were taken to reduce them. As described above, all the variables were subjected to a test for the presence of outliers, and treating them as described above frequently reduced skewness and kurtosis to an acceptable level. If skewness or kurtosis remained then an appropriate transformation o f the data was performed. Different transformations can be carried out, and they vary in their strength and effect. In each case, the transformation resulting in skewness and kurtosis values closest to zero was selected. Tabachnick and Fidell (1983) discuss the transformations most likely to correct positive and negative skewness of different degrees. The most common transformation for positive skewness is either a square root or logarithmic transformation, depending on the severity o f the skewness, although stronger transformations are possible. For negative skewness, Tabacknick and Fidell (1983) recommend a "reflex" strategy. This involves subtracting each sample value from the largest score+1 in the distribution, thus converting a variable with negative skewness to one with positive skewness, and the application of an appropriate transformation for positive skewness.
Violation o f the assumptions underlying parametric tests can result in misleading conclusions with regard to the significance of the results. Therefore, when transformation o f the data failed to reduce skewness and/or kurtosis to an acceptable level, an appropriate nonparametric comparison was carried out. However, there are data sets for which nonparametric analyses are not yet available, for example, when a comparison of two independent groups on two or more dependent variables is required. In these cases, repeated measures ANOVA was employed, and to check this approach was not leading to false conclusions about the data, the analysis was repeated as far as possible with nonparametric tests.
In line with much of the data collected in the course of psychological research, some of the variables in the current study were discrete rather than continuous. In particular, a number of variables were proportional in nature, such as 'number of errors out o f ten.' Data of this type may not fit a normal distribution, and may be closer to a binomial distribution. The optimal method for dealing with data of this kind would therefore be statistical tests designed for a binomial distribution. However, there was no binomial test available to perform multivariate analyses. The alternative solution, which is commonly applied, is to transform the data. The arcsine transformation has been recommeded for proportions (Winer, 1971), and this was adopted in the current study. Variables were tested for skewness as described above. Where skewness was above the 1% level, then the data were transformed using the formula recommended by Winer (1971). The raw scores were first converted to proportions, and these were transformed using the equation:
2 arcsin i/ n .
If this transformation failed to reduce skewness to an acceptable level then nonparametric analyses were the most appropriate solution, except in situations where a multivariate analysis was required, as described above.
When ANOVA is carried out with both between- and within-subject variables, then there is an additional assumption which must be met known as sphericity. For a full discussion of this concept see Winer (1971) or Greenhouse and Geisser (1959). When the sphericity assumption was violated then degrees of freedom were adjusted using the Greenhouse- Geisser adjustment (Greenhouse & Geisser, 1959).
The level of significance was set at 0.05 throughout. When post-hoc tests were carried out to explore interactions, the level of significance was adjusted by dividing 0.05 by the number of post-hoc tests. All analyses were carried out using the Statistical Package for the Social Sciences (SPSS).
CHAPTER IV