Analysis - Study 5: Quantitative analysis of changes in teach-

4.3 Methods

4.3.6 Study 5: Quantitative analysis of changes in teach-

4.3.6.2 Analysis

Teachers’ practices The data collected using the practices instruments were ordinal variables—categorical data that is ordered (Field, 2009, pp. 8–9). The possible responses were: almost never, occasionally, about half the time, most of the time, and almost always. I was not certain that these responses would be equivalent to a continuous interval scale. As a result, I used a non-parametric test. Furthermore, this was a within-groups design i.e. I was using a repeated-measures design with the same sample, so I used a Wilcoxon signed-rank test for matched pairs. This test involves finding the difference in responses for each respondent and then ranking the differences. The ranks are then ‘signed’ to reflect whether there has been an increase or decrease. The sum of each of the positive and negative ranks is found to find a test statistic, T . A decision is then made whether to accept or reject the null hypothesis (Field, 2009, pp. 552–553).

The assumptions for this test are:

1. The differences between the pairs must be able to be ranked; 2. A random selection should be used in order to generalise;

3. The difference scores should come from a symmetric population distribution (Nolan and Heinzen, 2008, p. 633).

If the distribution of differences was not symmetrical, I used a sign test. This uses the signs of the each ranking and then a binomial distribution to determine a probability. This has less power than the Wilcoxon signed- rank test for matched pairs but offers greater freedom (Howell, 2002, pp. 217–218).

I was not concerned with random selection, since I was not going to generalise the results of this quantitative analysis.

I used the same procedures as Swan (2006a). I coded the responses almost never —1, occasionally—2, about half the time—3, most of the time— 4 and almost always—5. I reverse-coded responses to the student-centred items. The responses were summed to give a score for each respondent in the pre- and post instrument ranging from 28–140 (see Swan, 2006a, p. 200). The difference between pre and post scores for each respondent was calculated. I inspected the differences (see Figure 4.8) and concluded the distribution to be symmetrical and therefore it was appropriate to use the Wilcoxon signed-rank test for matched pairs.

Figure 4.8: Boxplots of differences in pre- and post teacher-centred practices scores.

In order to determine effect size I used a power calculation proposed by Field (2009, p. 558).

z √

Number of observations (4.3.3)

Where z can be found by converting the T statistic (see Field, 2009, pp. 553–554, for details)

I used Cohen’s convention that a small effect size is approximately 0.2, medium is around 0.5 and large, 0.8 (Nolan and Heinzen, 2008, p. 547).

Where effect size is a “standardized value that indicates the size of a difference with respect to a measure of spread, but is not affected by sample size” (p. 543).

Teachers’ self-efficacy In both self-efficacy instruments, I assumed the 1–9 Likert scale could be reasonably considered to be a continuous interval variable. As a result I used a parametric test. Since this was a within- groups design, I planned to use a dependant t-test or paired-samples t-test.

The assumptions of the dependent t-test are that:

1. The sampling distribution is normally distributed. In the dependent t-test this means the differences between scores should be normal; 2. Data are measured at the interval level (Field, 2009, p. 326).

There is an assumption of homogeneity of variance and that the scores are independent when using an independent t-test but these assumptions do not apply to a dependent t-test (Field, 2009, p. 326).

I summed the scores for each of the three factors from the pre- and post questionnaire. (the items and related factors are shown in Table 4.6, p 87). The factors were: efficacy for instructional strategies, efficacy for classroom management and efficacy for student engagement. I also summed the scores for the pre- and post teaching problem solving efficacy instrument (see Table 4.7, p. 88). I assumed the responses formed a scale: I present a reliability analysis shortly.

I found the differences in between pre- and post scores for each par- ticipant and then analysed these distributions for normality. A visual in- spection (see Figure 4.9, p. 93 and Figure 4.10, p. 94) revealed that the distribution of differences for efficacy for classroom management (Figure 4.9b) and efficacy for teaching problem solving (Figure 4.10) could be considered to be normal. In order to investigate normality further I used a Kolmogorov-Smirnov test. The results of this analysis are shown in Table 4.9 (p. 92).

Table 4.9: Teaching self-efficacy: analysis of normality of the difference distributions using Kolmogorov-Smirnov (K–S) test.

Self-efficacy factor K–S test (D) df Sig.

Instructional strategies .206 18 .042*

Classroom management .150 18 .200

Student engagement .230 18 .013*

Teaching problem solving .087 18 .200

(a) Instructional strategies.

(b) Classroom management.

Figure 4.9: Distribution of differences between pre- and post scores for teaching self-efficacy factors.

Figure 4.10: Distribution of differences between pre- and post score for self-efficacy for teaching problem solving.

This strongly suggested the differences distributions for efficacy for classroom management and for teaching problem solving were normal and could be analysed using a dependent t-test. The results of this analysis are presented in Chapter 7. For the efficacy for student engagement and for instructional strategies I was guided by Leech, Onwuegbuzie, and Daniel (2007) and decided to use a non-parametric test. I begin by testing the assumptions for the Wilcoxon signed-rank test for matched pairs.

I inspected the symmetry of the distributions of the differences (see Figure 4.11, p. 95) and I assumed the distribution of the differences in efficacy for instructional strategies (Figure 4.11a) to be symmetrical and therefore used the Wilcoxon signed-rank test for matched pairs. The efficacy for student engagement was not symmetrical so I opted to use the sign test which provides and alternative when the assumptions for the Wilcoxon test are not met (Field, 2009, pp. 552–553).

Effect sizes were calculated using the following equation (Field, 2009, p. 332):

s t2

t2_{+ df}

I used Cohen’s convention that a small effect size is approximately 0.2, medium is around 0.5 and large, 0.8 (Nolan and Heinzen, 2008, p. 547). Where effect size is a “standardized value that indicates the size of a difference with respect to a measure of spread, but is not affected by sample size” (p. 543).

(a) Efficacy for instructional strategies.

(b) Efficacy for student engagement.

Figure 4.11: Boxplots of differences between pre- and post teaching efficacy factors.

A note on the validity and reliability of the instruments used Teacher practices questionnaire In terms of the validity of this instrument, I considered whether it reasonably measured what it claimed to do. In this case, does the instrument gather data that reflects teachers’ perspec- tives on their teaching? I relied on the development work by Swan (2006a) who validated the instrument through comparisons with other data sources. From this he judged the instrument to be valid. I also drew on further validation completed by Pampaka, Williams, Hutcheson, Wake, Black, Davis, and Hernandez-Martinez (2012). They concluded the instrument had reasonable construct validity. While I have concerns that the instrument has not been validated through observational processes, I was happy, in the context of this research, that it would give some indication of changes in teachers’ practices.

In terms of the instruments internal validity, Swan (2006a) carried out an analysis of reliability with further education teachers (n = 120) and found Cronbach’s α = 0.85 (p. 200). I carried out a reliability test using pre-test results (n = 19) and found Cronbach’s α = 0.91. Pampaka et al. (2012) carried out a R¨asch analysis and found that it was reasonable to assume these items were consistent. I therefore assumed it reasonable to sum the scores for each item and reverse-code student-centred items. Teacher self-efficacy I discussed the extensive validation that this instrument has been subject to in Section 3.2.3 (p. 43). I therefore concluded that this instrument was valid and reliable for the purposes of this study.

Problem solving teaching self-efficacy This was a new instrument

that I had developed and its validity had not been tested. However, I used an approach based on guidance by Bandura (2006). I suggest that results based on this instrument should be treated tentatively. I used pre-test data to conduct a reliability analysis (n = 18) and found Cronbach’s α = 0.89. I decided that it was reasonable to accept the items in this instrument as forming a scale and I assumed it to be valid. Although further testing would be required to confirm this.

In document The impact of professional development on mathematics teachers' beliefs and practices (Page 104-110)