3.10 Reliability and Validity
3.10.1 Reliability
Reliability concerns the likelihood of similar results being obtained if the study was repeated (Payne and Payne 2004). It refers to whether or not similar results will be obtained if the data collection procedures and data analysis process are repeated with the same participants. The aim of ensuring reliability is to reduce researcher bias and enhance validity (Yin 2003).
The design of the data collection instruments can influence the reliability of a study. Establishing the internal consistency of multiple items can help to ensure reliability; this refers to whether ‘each respondent’s answers to each question are aggregated to form an overall score’ (Bryman 2012: 170), which means that all items in one test or
questionnaire should be indicators of the target phenomenon, and that all these indicators should be related to one another. The internal reliability of the CCTDI and CCTST has been tested in previous studies and satisfactory results obtained (see section 3.8.1). The self-evaluation questionnaire was adapted from Dawes at al. (2000), but it has not been widely used in other studies. Its internal consistency has thus not been established. The findings from the self-evaluation questionnaire therefore need to be carefully considered. As mentioned earlier, this study used the Chinese versions of the CCTDI and CCTST. The self-evaluation questionnaire was also translated into
110
Chinese and presented along with an English version. It is hoped that this increased the reliability of the data by providing a clear and comprehensive explanation to the respondents (McDonough and McDonough 1997).
In the data collection process in the present study, the students were required to complete the writing tasks independently in class, so that data of a high level of reliability could be collected from their written texts. At the same time, it allowed the students to decide whether or not to submit their works, since some students may have had a sloppy attitude towards completing writing tasks they had been compelled to perform. This strategy was also applied to the all data collection procedures to enhance the reliability of the data, and at the same time deal with ethical issues (see section 3.11).
The interview data were collected on the same day of the last infusion lesson to elicit the students’ actual attitudes while their memories were still fresh. However, the
reliability of these data may have been impaired by the questioning skills of the inexperienced interviewer (researcher) and the scheduling of the interviews. This was the first time the researcher had acted as an interviewer, and her questioning skills may not have been good enough to stimulate active responses, or to create an interactive environment for the group interviews. As mentioned in section 3.8.4, this may influence the quality of interview data. Moreover, as Kumar (2004) and Silverman (2006) propose, the contribution of interviewees can also influence the quality of data. In this study, the students were interviewed in the week before their final examinations, and they were preoccupied with their studies. Although they had volunteered to be
111
interviewed, they provided short answers, and thus the quality of the interview data might also have been influenced by their relatively poor contribution.
The self-completion questionnaire was sent to the students by email after their final examination, with one week allowed for them to return it. This was done in order to obtain more information about the students’ attitudes. However, it was sent three weeks
after the last infusion lesson when the winter vacation had already begun. When reading the responses, it was thus necessary to take into account the fact that some information may have been missing since some of the students may have forgotten it, and also that some of the students may have completed the questionnaire in a sloppy manner.
When analysing written data, employing more than one rater or coder can enhance internal reliability (Bryman 2012; Mays and Pop 1995). In this study, the coding of clauses, error-free clauses and T-units was completed by two coders, and the inter-rater reliability was above 90% (see section 3.9.2), which is a high level of reliability. The reliability of the scores for the students’ writing proficiency may have been reduced by
the fact that only one rater was employed. However, it was hoped that the careful selection of an experienced and qualified rater and the use of standardised criteria enhanced the reliability of the results (see section 3.9.3). The interview and self- completion questionnaire data were coded by only one rater - the researcher, and the themes were also developed by her. Research bias may have impaired the objectivity of the results (Fine et al. 2009). Nevertheless, the results were crosschecked by teacher A, with the aim of minimising the influence of any bias.
112
The tool used to analyse the use of critical thinking in the students’ writing was
developed by the present researcher based on the rubrics for analysing elements of critical thinking in writing proposed by Stapleton (2000) and Paul and Elder (2007). It was expected that the reliability of the results would be improved by the use of two raters, as discussed above. The inter-rater consistency was 89%.