Reliability, validity, and analysis of data

6. Methods

6.2 Phase 1: Quantitative component

6.2.4 Reliability, validity, and analysis of data

Reliability is defined as “dependability, consistency and replicability over time” (Cohen et al., 2011, p. 199). This means being concerned with the internal consistency of the research instrument – in this study, the questionnaire – and being concerned with the sample. Validity is defined as “the correctness or truthfulness of the inferences that are made from the results of the study” (Johnson & Christensen, 2012, p. 245), in other words “the extent to which our observations indeed reflect the phenomena and variables of interest to us” (Pervin, 2010, p. 48). A central concern is whether the questions asked in a study actually make it possible for the researcher to answer the research questions. Furthermore, an important consideration of quantitative research is to ensure external validity, namely that there are grounds for generalizing results to a population based on data acquired from a sample (Cohen et al., 2011, p. 186).

In this study, the internal consistency of the research instrument relies on the piloting, which allowed me to refine the questionnaire. In the results chapters, I problematize some of the terminology used in the items and discuss how it may have been understood in different ways by different respondents. Additionally, the terms used in the multiple- choice scales in sections 3 and 4 of the survey are also open to interpretation: terms like “never”, “rarely”, “sometimes”, and “often” may mean different things to different

people. This means that rather than being accurate descriptions of how often teachers do things, the responses to these items describe tendencies that cannot be specified precisely. Another potential challenge for this study’s validity is the fact that respondents tend to over-report what they consider to be positively regarded attitudes and actions, and under-report what they think may be perceived as negative (Lavrakas, 2008, pp. 15, 479). However, since the teachers were completely anonymous at all stages of the process and the survey did not address sensitive issues, this might have encouraged them to answer truthfully.

In the chapters that follow, I use the survey results mainly as a starting point for further elaboration on the issues addressed using the qualitative data, and do not make any claims of my results being representative for upper secondary English teachers in Norway in general. This is mainly because the sample of 110 teachers is too small for the findings to be generalized to the population (Cohen et al., 2011, p. 147). However, the survey participants’ lack of representativeness is not just a result of how many teachers responded to the survey, but also of the possibly skewed sample. It might be assumed that teachers who found the survey topic interesting and/or who identified first and foremost as English teachers were more likely to respond than those who were less interested in literature and/or who considered English as their second (or perhaps even third) teaching subject. This might also have affected the overall response rate: teachers who were less interested in literature and/or who did not identify as primarily English teachers might not have wanted to participate. This means that the sample would probably not have been representative of upper secondary English teachers in Norway even if I had been able to recruit more teachers; a sample consisting of volunteers will not be likely to represent the population at large (Cohen et al., 2011, p. 160).

Rather than being concerned with the study’s generalizability, I, therefore, rely on concepts like “transferability”, “extrapolation”, and “fittingness” that are frequently used in qualitative research to refer to external validity. These terms denote “the degree of congruence between sending and receiving contexts” (Lincoln & Guba, 1985, p. 124), or “modest speculations on the likely applicability of the findings to other situations under similar, but not identical conditions” (Patton, 2002, p. 584). Even though the findings in this study cannot be generalized statistically to the entire

population, it might be possible to generalize them analytically to sub-groups of the population. In this case, teachers who read fiction regularly and/or are generally interested in literature, teachers that view themselves as primarily English teachers, and/or English teachers who are motivated and dedicated in their profession. This constitutes “a reasoned judgment about the extent to which the findings of one study can be used as a guide to what might occur in another situation” (Brinkmann & Kvale, 2015, p. 297) – what is called the transferability of a study. For this reason, the quantitative data are important, even though the results cannot be generalized or be subject to sophisticated statistical analysis.

The data that were gathered from the survey were analyzed using descriptive statistics, namely frequency tables and bivariate analyses (crosstabulations). First, the data were entered into the analytical software program SPSS (Version 25; IBM 2017) and I conducted frequency analyses of the items in sections 1, 3, and 4 in order to get an overview of the material. I found that a total of 18 teachers had failed to respond to all items in sections 3 and 4. The items they had failed to respond to were spread out across the two sections, and no items stood out as being overlooked by many. One of the teachers had neglected to respond to six items, and another to ten, but the rest of them had missed between one and four items. I considered removing the two informants who had failed to respond to six and ten items, but decided to keep them because they had answered the open questions in section 2 in detail. This means that the number of respondents for the survey items reported on in the following chapters varies from 106 to 110; these numbers are clearly presented in the tables that appear in the following chapters.

Next, bivariate analyses – crosstabulations – were conducted in order to find out whether there were associations between variables denoting teachers’ backgrounds and contexts (section 1 of the survey) and variables denoting the types of texts they used (section 3 of the survey). When conducting these analyses, I looked at the percentage difference, which “estimates the extent to which one phenomenon implies the other” (Cohen et. al., 2011, p. 631). This type of analysis was chosen because it is transparent: “straightforward to calculate and simple to understand” (Cohen et. al., 2011, p. 632). However, the research literature does not take a clear stand in terms of how big the

differences between groups need to be in order for them to be relevant. I decided that there had to be at least a 25 percentage point discrepancy in order for findings to be relevant for discussion. The main reasons for this were that this is the middle ground between suggestions and examples provided by different research handbooks (see Cohen et. al., 2011, pp. 631-632; Jacobsen, 2015, p. 334) and because another mixed- methods study with a similar number of respondents used this as the limit (Vestby, 2017). When calculating the percentage difference, I treated the variables denoting types of texts used as dependent variables, and teachers’ backgrounds and contexts as independent variables.

In both the frequency tables and the bivariate analyses, I sometimes collapsed categories of responses in order to show the general tendencies in the material. This applied to the ratio variables (age, years of teaching experience, and number of English teachers at the school),63 ordinal variables (reading habits), and one nominal variable (job title) in section 1, as well as the ordinal variables in sections 3 and 4. For instance, I used six different age categories in the survey, but in table 6 above these have been changed into three age categories. Although some argue that one should only use two- category variables in bivariate analyses that examine percentage difference (Cohen et. al., 2011, p. 632), others argue that tables can be more complex and include three categories (Leon-Guerrero & Frankfort-Nachmias, 2015, p. 215). I have used two- category variables when possible, but in some cases, collapsing categories to make two-point scales would lead to a possible distortion of meaning. In those cases, I have used three-category variables instead. When collapsing categories, I ensured that the merging did not distort the meaning of the responses by only combining categories that were next to each other on the given scale and that were on the same side of a scale’s center. For instance, for items 18-29, the five-point scale was reduced to a three-point scale by merging “never” and “rarely” into one category and “often” and “always” into another, but leaving “sometimes” – the center point – standing alone.64_Merging

categories allowed me to have more respondents for each category, and it made the

63_{Note that the ratio variables were treated nominally in the analyses.}

64_{The tables make it clear whether categories have been collapsed in the analyses by including a slash (/) to}

heterogeneousness of the responses more apparent. The disadvantage of collapsing categories is that nuances in the quantitative material disappear. However, as this is a mixed methods project, I decided that the data from the qualitative component would ensure that the overall findings would not appear too oversimplified.

Two items from section 1 required more processing before they could be used in the bivariate analysis: item 13, which asked teachers which study program(s) they taught in the school year in which the survey was conducted, and item 14, which asked teachers which study program(s) they had previously taught. As teachers were able to tick several boxes in response to these questions, I had to compute new variables for each of the categories. I computed two different types of variables: the first contained responses to each of the available categories, making up seven variables for item 13 and ten variables for item 14. However, as teachers were able to answer affirmatively to several categories for each of these items, I was also interested in creating variables denoting teachers who only taught one of the two major study programs, meaning that they taught either vocational or general studies. Therefore, I computed two additional variables that denoted teachers who responded affirmatively to the one but negatively to the other.65

When working with the teachers’ responses to the three open questions in section 2, I had to approach the material differently. First, the responses were coded.66_Specific

examples of literary texts listed as responses to items 15 and 16 were placed in separate categories according to genre. In the cases in which the teachers’ responses listed types of texts rather than specific titles, these answers were placed in a separate category. Titles of textbooks listed in response to item 17 were categorized according to study program, and the teachers’ assessments of the textbooks were also included in this overview. The coded responses to the open questions were not entered into SPSS, but analyzed manually. The reason for this was that there were so many different texts,

65_{Note that this was only done for item 13: because 89% of the respondents reported that they had taught}

vocational studies at some point in their careers, I did not do this for item 14. Furthermore, this was only done for vocational and general study programs, and not the other categories, as the difference between vocational and general studies was what I wanted to examine.

genres, and textbooks mentioned that it would have been very difficult to operate with clear categories for analysis in SPSS. When working with the coded responses manually, I counted how many times specific texts and genres were mentioned, which provided me with a detailed overview of which texts the teachers viewed as suitable and unsuitable for their students. These data also showed which textbooks the teachers used and what they thought of the selection of literary texts in them.

In terms of the broader analytical approach, abductive reasoning was the main strategy for both the quantitative and qualitative data. Abduction entails examining theories and previous research alongside the analysis of data in order to find explanations; it differs from the more common explanatory models induction and deduction in that the two latter only move in one direction – bottom-up from data to theories and top-down from theories to data respectively – whereas abductive reasoning moves back and forth between theories and data (Alvesson & Sköldberg, 2009). I conducted preliminary analyses of the survey results before I moved on to the qualitative component of the study, and went back to the quantitative component after I had conducted and begun to analyze the interviews. The theory and previous research I had reviewed early in my project were revisited after I had collected the data, meaning that I worked with theory and data in different stages in order to analyze and explain the findings in the best possible way.

In document English teachers’ choices and beliefs about literature in the Norwegian upper secondary classroom (Page 136-141)