4. Research Methodology
4.4 Data Collection Instruments and Procedures for data collection and analysis
4.4.1 Questionnaire
Questionnaires can be efficient and economical tools. Dornyei (2003) stated that
questionnaires are used to elicit three types of data: ‘factual, behavioural, and attitudinal’ (p. 8). They can also be classified as either quantitative or qualitative based on their design.
Specifically, questionnaires that seek answers through closed-ended questions with multiple- choice scale options are analysed numerically and, thus, act as a quantitative method. On the other hand, questionnaires with open-ended questions are analysed using coding and
discussions and are considered as a qualitative method.
Questionnaires are very effective as they allow researchers to collect data from a wider audience in a short time, with low or no cost requirements. However, questionnaires have certain drawbacks, e.g. participants completing a questionnaire may select answers randomly, without reading the question properly. Furthermore, sometimes the high objectivity of
questionnaires may prevent respondents from expressing additional thoughts about an issue, perhaps due to the absence of a relevant question.
4.4.1.1 The questionnaire design.
To answer RQ 1, this study used an online questionnaire with closed-ended questions. Online distribution (through Google Forms) was more convenient and practical for two reasons. First, it enabled me to reach out to as many EFL teachers from various higher educational contexts in Riyadh as possible. Second, it could be completed anonymously and in private, which constituted the most conducive conditions for my participants to respond fully and honestly, since in the KSA we are usually most uncomfortable in a face-to-face setting.
The design was developed after an extensive review of the relevant research literature. The items in the designed questionnaire were either taken or adapted from previous similar questionnaires in the empirical studies reviewed in chapter 2. The questionnaire items were modified to suit my research context, i.e., the study explores issues related to EFL teachers’ beliefs about English grammar assessment. Three main research studies (Barnes, Fives, &
Dacey, 2015; Elshawa et al., 2017; Muñoz et al., 2012) guided the construction of my questionnaire and helped me identify the key points which are mapped into subset themes that host the questionnaire items.
The first draft of the questionnaire (see Appendix C) was entirely written in English and consisted of two parts. The first part of the questionnaire included items on the respondents’ biographical information (11 items). The second part addressed EFL teachers’ beliefs and thoughts about English grammar assessment. It consisted of 46 five-point Likert scale items (plus a few open response items) where respondents would have to specify their level of agreement or disagreement with respect to a symmetric agree-disagree scale for a series of statements. These statements, or items, were grouped into subset themes (see Table 4 and Appendix D).
The questionnaire was piloted in April 2018, subsequently being administered in English to 30 EFL teachers working on a private international institution, who were not included in the main study. The comments received from the pilot-test participants were related to the length of some items, redundancy and the overall structure of the questionnaire. Three examples of modification and revision of items are given here:
• First, the questionnaire items were presented in two parts: demographical and beliefs. Items in the second part target various themes which were all mixed together. Based on the participants’ feedback and my opinion, items were grouped according to their theme. This allowed for more clarity regarding what the questionnaire is about and what the participants were responding to.
• Second, some questionnaire items were made clearer, while others were rephrased because confusing structure and ambiguous items could render the questionnaire
ineffective. Accordingly, item 11 in the first draft of ‘English grammar assessment informs teaching (diagnoses strengths and weaknesses in teaching)’ was rephrased to ‘The purpose of English grammar assessment should be to inform teaching by showing the students’ strengths and weaknesses in English grammar’ (item 4 in section B).
• Third, some items were deliberately missed by most participants, resulting in low internal consistency scores. Therefore, these items were removed to allow continuity throughout the questionnaire, e.g., item 10 in the first draft ‘Assessing English grammar is a waste of time’. Other items were added, such as the approximate
number of students in a class and the training received on language assessment, to the demographical part of the questionnaire (see Appendix D).
After the modifications, there remain two parts: demographic information and beliefs about English grammar assessment. There are 46 quantitative items in part two, which are grouped into five categorical themes presented in Table 5.
Table 5.
Questionnaire Categorical Themes
No. Theme Likert Items
1. Teachers’ beliefs about the general nature of English grammar assessment
1-5 (5)
2. Teachers’ beliefs about the purposes of English grammar assessment 6-13 (8) 3. Teachers’ beliefs about English Grammar assessment methods 14-28 (15) 4. Teachers’ Beliefs about English Grammar Assessment Formats 29-38 (10) 5. Teachers’ beliefs about their role and sources used in constructing
English grammar assessment tasks
39-46 (8)
The questionnaire was anonymous in order to encourage the participants to respond truthfully. The questionnaire can be found on the following link:
https://drive.google.com/open?id=1338IlO7eo4N6jFb-nzRFbT7SZrpaq1FU2DKR8D69ows
4.4.1.2 Data Collection.
After official permissions were obtained to conduct the study in the four educational contexts, an email was sent to the English department chairpersons requesting the EFL teachers’
participation by asking them to complete the questionnaire on the link provided and to forward the email to the these teachers. This served my purpose of collecting data from a wide range of people relevant to my research topic. Respondents were then screened based on the study needs.
4.4.1.3 The questionnaire participants demographics.
136 teachers responded to the online questionnaire. However, 36 respondents were excluded from the analysis because they were either working in schools or did not mention their workplace. Hence, they could not be regarded as genuine representatives of the targeted population in the current study.
A second important prerequisite for the sample was that the teachers had some experience of assessing and/or teaching English grammar. As there were six teachers who stated neither kind of grammar-related experience, those were also excluded, which left 94 EFL teachers representing the targeted population of EFL teachers.
Figure 17. Participants’ experience of teaching and assessing English grammar.
As a result, 94 suitable participants were identified. The cohort included, male (N = 9) and female (N = 85) participants ranging from 21 years old to over 50 years old; almost all of the participants are of Saudi origin (N = 84). Some of the teachers had BA qualifications (N =19), while a little over half of them held MA qualifications (N = 50) and the rest were PhD holders (N = 25). The majority (N = 74) had received some kind of training on language assessment. The following table summarises the basic background information on the participating teachers.
Table 6.
Background Information about the Participants in the Questionnaire
Variables Number / 94 Percent %
Training Received None 20 21.4
Through UG courses 36 38.4
Through MA/ PhD courses 3 3.2
Professional training 35 37 Gender Male 9 9.6 Female 85 90.4 Age 21-30 21 22.3 31-40 43 45.7 41-50 24 25.5 Over 50 6 6.4 Country of Origin Saudi Arabia 84 89.4 Jordan 4 4.3 Syria 1 1.1 Egypt 2 2.1 Algeria 1 1.1 India 1 1.1 USA 1 1.1 Educational level attained BA 19 20.2 MA 50 53.2 PhD 25 26.6
With respect to the relevant classroom teaching matters (Table 6), almost all the participants were experienced English teachers with more than five years of experience, while a quarter had more than 15 years of experience (N = 25). In their classrooms, class sizes of 20 pupils or more were commonest, although a fifth of teachers claimed class sizes over 40 students (N = 21). The levels taught were primarily either foundation level (level 1 or 2), taken by students of almost all majors, or levels 3 or 5, where normally only English majors are in the
classroom. Finally, with respect to the grammar textbook used, the most dominant textbook was found to be ‘Understanding and Using English Grammar’ (N = 34), which is consistent with what has been reported earlier in 4.3.1.2, about this textbook being used in as many as three educational contexts.
Table 7.
Participants’ Relevant Classroom Backgrounds
Variables Number / 94 Percent
English teaching experience 1–5 years 2 2.1 6–10 years 41 43.6 11–15 years 26 27.7 Over 15 years 25 26.6 Number of students in the grammar class Under 20 9 9.6 20–30 28 29.8 31–40 36 38.3 Over 40 21 22.3 Foundation 1 + 2 30 31.9 Level 3 26 27.7
Level of students currently taught grammar Level 4 9 9.6 Level 5 18 19.1 Level 6/7 1 1.1 Level 8/9 3 3.2 Graduate 1 1.1 Other 1 1.1
Not currently teaching grammar 5 5.3
Grammar textbook being used
English Grammar in Use 25 26.6
Understanding and Using English Grammar 34 36.1 Interactions/Mosaic Grammar 17 18.1 Grammar Sense 10 10.6 Basic/Fundamentals of English Grammar 4 4.3 Other textbooks 4 4.3
To sum up some of the essential demographics of the participants, the number of the
participants was 94. All those participants were EFL teachers from public and private higher- educational facilities. The majority of them were female (90.4%). The average age of the participants is between 31–40 years. They all had undergone some kind of language assessment training at some point in their teaching careers. The duration of their grammar teaching and assessment experience varied from 1 year to over 4 years.
The participating teachers in this study do not, as Cohen et al. (2007) state, ‘represent the wider population’ (p. 104) of higher-educational facilities in Saudi Arabia, and hence the findings from this study group are not applicable in general, as they present the particular, subjective perspectives of these participating teachers.
4.4.1.4 Questionnaire data analysis procedure 4.4.1.4.1 Data handling.
The responses to the questionnaire were downloaded from Google Forms in an Excel file. All of them were then converted into numbers before the data was copied into SPSS for analysis (see Appendix O). All the belief item responses on the scale ‘strongly disagree’ to ‘strongly agree’ were represented on a scale of 0–4. In terms of analysis, first of all, internal reliability was assessed. Second, appropriate analytical tests were selected according to the normality of distribution of the data.
4.4.1.4.2 Reliability checking.
Although Cronbach’s alpha is widely used in questionnaires as well as tests to assess internal reliability (Taber, 2018), it was not deemed appropriate in the present instance as it is only applicable for sets of items in questionnaires where multiple items measure a single construct. The present questionnaire, however, largely follows the commonly found pattern of ‘one item per target construct measured’. Hence, there are no expectations of agreement between the responses within large subsets of items, and in fact the lack of agreement in responses is to be expected, rather than being misinterpreted as a sign of unreliability (Tavakol & Dennick, 2011).
Some pairs of items were, however, selected due to their logical relation in terms of meaning in some way, to check if the expected agreement in response was present. For instance, in the description of participants above, some participants reported that they had no experience of either teaching or assessing grammar. When I looked at their responses concerning the grammar textbook they used in their class, I found that they had responded ‘none’ or ‘not teaching grammar at the moment’. Since that response was consistent with what they reported earlier about their experience, one can judge that their responses are quite reliable.
Again, among the belief items, there are some among which one might detect a logical connection, such that if a participant agrees with one belief, they must also agree with the other, unless they were not responding with care (i.e., unreliably). Two items, for example, refer in slightly different ways to the advantages of multiple-choice items in assessment instruments (Belief Items 19 and 34). Hence, if a teacher agrees with one, he/she should agree with the other, and vice versa. The Cronbach’s alpha between the responses to these items was in fact .519, which is moderate as a measure of reliability (where a value of .7 or better would indicate a really high reliability). A similar check between items 19 and 31, both of which mention cloze items, yielded an alpha of .569. However, it must be borne in mind that reliability increases when all the items in a subset measure the same thing (Tavakol &
Dennick, 2011); hence, a subset composed of only two consistent items cannot be expected to attain the values that a set of five or ten such items would achieve. Thus, in instances where it is sensible to assess it, there is evidence for at least a moderate reliability of the instrument.
4.4.1.4.3 Normality checking.
In order to decide what statistical tests might be appropriate, it was necessary to check the normality of the distributions of the belief ratings. Therefore, the Kolmogorov-Smirnov test
with Lilliefors correction (Lilliefors, 1967) was applied to the rating responses for each of the 46 belief items. All of them emerged as distributed highly, significantly and differently in relation to the normal distribution (bell shape), with p < .001. This often arises with short score scales such as five-point rating scales. Therefore, nonparametric inferential statistics were used to test significances for the results, as follows:
1. In the account of the results for all valid participants together on each belief item, the sign test was used to assess whether the teachers were expressing an overall view that definitely departed from the midpoint of the rating scale (= 2, on my scale), either higher or lower, or a view that essentially did not significantly differ from the midpoint. The sign test allowed the assessment of whether there were significantly more responses above the midpoint (i.e., 3 and 4) than below the midpoint (i.e., 0 and 1), or the reverse, or whether there was no significant difference between those above and those below (and hence no clear opinion was expressed).
2. When comparing groups such as genders, or the Saudi versus non-Saudi teacher origin, I used the nonparametric Mann-Whitney test.
3. When testing relationships between the beliefs of teachers and attributes in ordered categories, such as age groups, educational levels, or degrees of experience in grammar assessment, I used the Spearman correlation.