10. Be cordial and appreciative Always remember to thank the interviewees when the interview is over and answer any questions they might have
2.19 Concluding Marks
3.2.5 Questionnaire, Pilot Study, and Inter Rater Reliability Each of the three topics will be discussed separately.
3.2.5.1 Questionnaire
For the present study, a questionnaire (See Appendix 1) consisting of three different parts was used for the EFL students. The questionnaire items were written according to the guidelines mentioned by Siniscalco and Auriat in 2005 (See 2.17 Questionnaire). The EFL teachers were given a similar questionnaire which included the first two parts of the EFL students’ version. The first part of the questionnaire includes the demographics. In this section, demographic information such as age, gender, field of study, mother tongue, number of years allocated for English learning was obtained. The questionnaire items were prepared in English and it was given to each student. The questionnaire was designed in English since the students had a good command of English vocabulary and structure. The researcher was present to make sure there was no ambiguity regarding vocabulary or comprehension difficulty in each section of the questionnaire. A Persian questionnaire was not used because some points may have been lost in the translation of the ideas provided by previous studies.
In order to seek answers to the third research question regarding the most problematic areas in English writing, the second part of the questionnaire was used. This part focused on the participants’ perception of the most problematic areas of English writing. In this section, the participants were asked to express their perceptions regarding the six
different problems in writing (vocabulary, grammar, spelling, style, punctuation, and handwriting) as mentioned by Jordan (1997) using a five point Likert scale.
In the third part of the questionnaire, the student participants were asked to comment on their writing techniques, styles, and Myside bias. A five point Likert scale was used to collect the participants’ ideas on this section. The designed questionnaire (based on Zia Houseini and Derakhshan, 2006; Mu and Carrington, 2007; Wolfe, Britt, and Butler, 2009; and Saneh, 2009) was piloted before administration. It took about 20 minutes for the students to fill out the entire questionnaire.
A similar questionnaire consisting of the demographics and the second part of the questionnaire (perception of the most problematic areas in English writing) was distributed among 20 Iranian EFL teachers who had at least 3 years of teaching experience and who had taught the higher intermediate level of language proficiencies. Jordan’s (1997) questionnaire was used for the present study.
Since the researcher did not want the participants to become self conscious and jeopardize the outcome of the study, the students were first given the consent form, then they wrote the argumentative essays and after the essays were handed in, they were given the questionnaire. This was done to ensure the questionnaire items would not affect the essays written by the participants.
3.2.5.2 The Pilot Study (Reliability)
Before using the questionnaire in the main part of the study a pilot study was conducted. The questionnaire which included 6 problematic areas in English writing and also 6 question items on writing techniques, styles and myside bias (using a five-point Likert
scale) was given to 30 Higher-intermediate Iranian EFL students (9 males and 21 females). Students were given 20 minutes to complete the questionnaire. All the students’ comments regarding ambiguous vocabulary and sentence structures were taken into account. Through the students’ comments, it was established that the adverb “always” which was mentioned in the questionnaire items was “too strong” and made the students “feel restricted” while answering. Therefore, this adverb was omitted from the questionnaire items. Cronbach's alpha was applied to the data obtained from the 12 items of the questionnaire and this showed 0.76. The questionnaire was now ready to be used for the actual study.
3.2.5.3 Inter Rater Reliability
In order to be objective in presenting the results of the study, it was essential that more than one rater analyze the essays (Connor, 1996). One statistical measurement for interrater reliability is Cohen’s Kappa. The SPSS tutorial (http://www.stattutorials.com/SPSS/index.html) defines Cohen’s Kappa as a measurement “which ranges generally from 0 to 1.0 (although negative numbers are possible) where large numbers mean better reliability, values near or less than zero suggest that agreement is attributable to chance alone”. Cohen’s Kappa was calculated separately (See Table 3.1, 3.2, 3.3, 3.4, and 3.5) for each of the six subsections and a final average of all subsection was calculated in order to report the overall inter rater reliability. The overall inter rater reliability was 0.821. It should be pointed out that NVivo calculated the Kappa for the explicit discourse markers and that came to 0.970.
As for the rest of the subsections, the SPSS tables showing the results are as follow:
Table 3.1: Kappa inter rater reliability result for inductive vs. deductive
Value Asymp. Std. Error(a) Approx. T(b) Approx. Sig. Measure of Agreement Kappa .800 .067 7.164 .000 N of Valid Cases 80
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
Table 3.2: Kappa inter rater reliability result for start-sustain-turn-sum vs. introduction-body- conclusion Value Asymp. Std. Error(a) Approx. T(b) Approx. Sig. Measure of Agreement Kappa .794 .200 7.256 .000 N of Valid Cases 80
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
Table 3.3: Kappa inter rater reliability result for circular vs. linear
Value Asymp. Std. Error(a) Approx. T(b) Approx. Sig. Measure of Agreement Kappa .701 .126 6.286 .000 N of Valid Cases 80
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
Table 3.4: Kappa inter rater reliability result for straightforward vs. metaphorical
Value Asymp. Std. Error(a) Approx. T(b) Approx. Sig. Measure of Agreement Kappa .775 .074 7.112 .000 N of Valid Cases 80
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
Table 3.5: Kappa inter rater reliability result for myside bias
Value Asymp. Std. Error(a) Approx. T(b) Approx. Sig. Measure of Agreement Kappa .900 .049 8.058 .000 N of Valid Cases 80
a Not assuming the null hypothesis.
As one of the pioneering works on Kappa-type statistics, Landis and Koch (1977) successfully categorized the various ranges of Kappa statistics results according to their strength of agreement. Table 3.6 shows this categorization.
Table 3.6: Categorization of Kappa statistics results according to strength of agreement (adopted from Landis and Koch, 1977, p. 165)
Kappa Statistic Strength of Agreement
<0.00 Poor 0.00-0.20 Slight 0.21-0.40 Fair 0.41-0.60 Moderate 0.61-0.80 Substantial 0.81-1.00 Almost Perfect
According to this categorization the strength of agreement between the two raters for four of the six categories fall under “substantial” and the remaining two could be classed as “almost perfect”. The overall inter rater reliability (0.821) also shows that the agreement between the two raters was “almost perfect” in this study. Discrepancies in the coding of the essays were resolved by having the two raters discuss scoring differences and determine the most appropriate coding.