• No results found

Phase one analysis concerns visual and statistical interpretation of time-series data. Hence, this phase will begin with a visual inspection of the charted time-series, to look for any obvious trends, and any ‘movement’ in the time-series for both groups, particularly in the pre-programme period (Glass, 1997). The initial focus, then, will be upon DPs one through to seven, prior to programme delivery.

118 In figure 3.4, Ramsay, et al., (2003) depict the possible outcomes that can occur in interrupted time-series designs. Researchers look for these visual indicators of a programme effect, in the first instance. This highlights why it is beneficial to have a number of pre-programme measures.

Prior to the interruption point, for example, the researcher would be looking for ‘stationarity’ in the time-series, by making a judgement about its slope and level. Stationarity means that both the level and slope remain linear, without major fluctuations (Glass, 1997). This would indicate that unidentified mechanisms were not influencing the students’ attitudes, prior to programme delivery, and that pre-test sensitisation to the measurement instrument was unlikely to have occurred.

If a shift in the time-series, pre-programme, is observed, an exploration of historical influences occurring around that time may be required, to look for reasons why the shift happened. As Glass (1997, p. 6) goes on to say, graphical representations of time-series data can ‘change level and direction for many reasons’, hence the particular attention paid within the study design to ruling out all other possible explanations for observed shifts in the time-series. The task here is to separate real programme effects from other observed trends in the time-series for each group.

Once this visual inspection is complete, the time-series data will be statistically analysed. The appropriateness and utility of statistical analyses within critical realism has been questioned (Bhaskar, 1998). However, as Lawson (1997) notes, its potential within critical realist research should not be discounted. It may be useful here in detecting even small programme effects, not discernible upon visual inspection of the time-series, so statistical analyses were conducted. Pratschke (2003, p. 21) proposed that critical realists object to statistical modelling because ‘theories are typically expressed in linguistic form, [whilst] statistical models

119 necessarily take a mathematical form’, suggesting that it may be difficult to reconcile the two positions.

Lawson (2001, p. 59), cited by Pratschke (2003) provides useful guidance here, positing that the use of statistical analysis should not be dismissed entirely within critical realist research, as it may be appropriate in certain circumstances. It was not possible to obtain a copy of Lawson’s article within the UK, so further detail cannot be obtained about how he suggests the critical realist researcher ought to proceed. In terms of the research hypothesis here, however, it seems sensible to conduct suitable statistical analyses of the PG scores, in particular, to ascertain whether an effect had been demonstrated, and upon PG and CG scores, to ascertain whether there are differences between them. Brown (2009) also asserts that there is an expectation to see analyses of this kind, as it is viewed by stakeholders as giving credence and rigour to study findings that claim critical realism as their base. However, it has been noted that the more statistical analyses a researcher conducts, the more likely they are to find significance somewhere within their data (Glass, 1997). In order to avoid this kind of ‘fishing’, the intended analyses were decided at the design phase. But, in order to be able to carry these out, a judgement needed to be made about the level of the data generated by MAST. The importance of this is outlined in the next section.

3:9:1:1 Levels of measurement

The level of data refers to the numerical measurement of the variables of interest in a research study. There are generally four levels of data, which describe the nature of the variables, as follows:

 Nominal Level: Describes categorical variables, such as gender, or artificial categories, as in this study, such as the PG and CG;

120  Ordinal Level: Describes the order or ranks given to data. So, whilst it will be clear what the order is, the researcher can not discern or presume that the distance between two consecutive ranks is the same;

 Interval Level: Describes the order of data on a scale, but the interval between the points on the scale are presumed to be equal, and  Ratio Level: Data at the ratio level may have a zero point, and can be

expressed in ratio format. Age is a common example of ratio level data; six years of age is twice as old as three years, for example (Brown, 2011).

The relevance of these definitions become clearer when applied to MAST; a 30 item Likert Scale, with response categories between one and five. It is important to make clear at the outset how the data generated by MAST were treated, in terms of their level, as these decisions govern the choice of appropriate descriptive and inferential statistical tests; differing tests are applied, depending on whether the data are judged as being at the ordinal or interval level. If the wrong types of statistical tests are applied, it ‘increases the chance of coming to the wrong conclusion about the significance (or otherwise)’ of the study (Jamieson, 2004, p. 1212).

Making a decision on the level of the data generated by a Likert Scale is indeed a ‘judgement’. Controversy exists about the assumptions underpinning such a decision (cf. Jamieson, 2004). Jamieson (2004, p.1217) noted that Likert Scales are common tools to measure attitudes, and that the data generated by them falls ‘within the ordinal level of measurement’, because numbers are used to characterise verbal categories. Rightly, this means that the order of the response categories is clear, but the distance between these categories cannot be presumed to be the same. Applying this to MAST, the distance between ‘Strongly Agree’ and ‘Agree’ may be very wide, or very narrow; the questionnaire cannot give this degree of detail.

It is a widely held assumption that non-parametric statistical tests should be used on data at the ordinal level, and parametric tests used for data at the interval or ratio

121 level (Pallant, 2007). In view of Jamieson’s (2004) position with regard to the level of the data, the reasons for judging the data generated and collected by MAST to be at the interval level need to be justified, before proceeding with explanation of the statistical tests used within the study.

Brown (2011) makes an important distinction between ‘Likert Items’ and ‘Likert Scales’. The former refers to individual items on the scale, or questionnaire, and the latter to a number of different items, or a number of responses to the same item, from which a total score, or a mean score in this study, from the answers to these numerous items can be calculated. He goes on to suggest that the sum or mean of total scores generated from answers to a number of Likert-type items can be treated as interval level data (Brown, 2011). Carifio and Perla (2007) suggest that this is in line with research in the field of attitude change; these data are commonly, and appropriately, treated as being at the interval level within the research community.

Applied here, at each DP, the attitude score is the sum of the variety of responses, from one to five, to the 30 individual items on MAST, accrued from ten randomly selected respondents. Each DP on the time-series depicts the mean of these cumulative attitude scores. A judgement can be made about where this mean score sits, in relation to scores at other DPs. For example, if the mean attitude score at DP five was ‘50’, and at DP six, it was ‘100, ‘attitudes’ at DP six could be said to be twice as positive than they were at DP five. In line with Brown’s (2011) assertion, these data then fit with the definition of being at the interval level.

These ‘mean of cumulative’ scores at each DP were used to chart the measured attitudes of both groups over the course of the study. The y axes on the figures showing the charted time-series in chapters four and five follow the assumption of data being at the interval level; the distance between the points on this axis are

122 presumed equal, allowing visual and parametric statistical interpretation and analyses.7

In order to address the research hypothesis, both the PG and CGs’ scores were compared across DPs seven (pre-programme), eight (immediately post-programme) and 15 (the end of the study), using an independent-samples t-test, which compares the mean scores of two separate groups, with reference to a dependent variable, or attitude scores in this case, to look for significant differences between them, mindful that this would contribute to the study’s LMCӦ (Campbell, 1986; Pallant, 2007).

Attitude scores for the PG were analysed using ‘repeated measures’ techniques. A one-way repeated measures ANOVA can be used to test for ‘significant differences’ among mean scores, where subjects have been ‘measured on the same continuous scale on three or more occasions’ (Pallant, 2007, pp. 251 - 252). This test was used to ascertain whether there was a significant change in attitude scores for the PG following the programme, and whether any observed change was sustained, or any effect was delayed, focusing again on these three DPs.

Before this could be carried out, a decision had to made about how to deal with the large amount of data missing at DPs six and seven. SPSS can be set up to run analyses in a number of standard ways, to deal with missing data. These usually involve the exclusion of ‘cases’ from any analyses, if they do not have full data (Pallant, 2007). Because the sample size here is small, this would have further reduced the sample size for analysis, potentially biasing results (Wayman, 2003).

7

This also applies to the charts depicted in appendices 18 to 25 (cf. chapter 4; pp. 128 - 158), detailing mean scores for all respondents at each DP across the time-series, in relation to individual MAST items.

123 Hence, this approach was deemed inadequate for the purposes of this study. The approach taken to dealing with missing data is outlined in the next section.