5. OER task-taxonomy empirical evaluation
5.3 Methods: triangulated quantitative and qualitative analysis
The goal of this study was achieved by analysing quantitative and qualitative feedback on the proposed task-taxonomy, collected from actual educators via surveys and interviews. Feedback was initially solicited from the partners of two European research projects, from the participants in a UK OER mailing list, and from Italian high school teachers, but was mainly received from this last group. While this is the sector of main interest to the author, the follow-up studies and discussions took this limitation into due account.
The survey collected quantitative data, first with the objective to engage respondents in critical thinking, in order to elicit highly valued qualitative information. Indeed, the strategy to collect feedback about the OER task-taxonomy, attempting to maximize the quality of the data, was to ask respondents to rate first the importance of tasks and categories identified by the previous task
analysis, with single-item constant-sum questions (CSQs). That is, respondents were asked to allocate a total of 100 points to tasks or categories in a sequence of groups. Each group corresponded to a specific category in the OER task-taxonomy, traversed top-down (that is, starting from its highest abstraction level). The main disadvantage of CSQs, is the high cognitive load imposed on respondents (Sue and Ritter, 2007). Yet this was an advantage here because, removing the simplistic possibility to rate every item as “very important” as on standard rating scales, CSQs force respondents to reflect on the precise relative importance of every category and task, increasing discrimination power and engaging them in critical thinking (Timpany, 2015). An additional advantage of using CSQs is that they eliminate scale meaning bias, increasing data reliability. It is only following this activity that qualitative data were collected, by inviting respondents to answer open questions, following each CSQ, soliciting suggestions for additional tasks / categories, modification of proposed ones, or their different organization.
CSQs, in addition to being instrumental in answering question RQ1, fulfil a second general objective of collecting quantitative data indicating the importance of each task and category (RQ2). These data could be conveniently used as weights in a usability metric, as discussed by Agarwal and Venkatesh (2002).
The bulk of questions collecting quantitative and qualitative data for the various task categories, were preceded by basic demographic and general questions related to country, experience, subject and level of teaching, search portals employed and frequency of use.
A final section included a question on the overall perceived completeness of the OER task-taxonomy with a 7 point Likert scale, a few open questions to collect additional qualitative feedback on possible important tasks not covered by the OER task-taxonomy, as well as any additional comments considered relevant.
The whole questionnaire was prepended by a short introduction with goals, background information about the task analysis, instructions, privacy and data management, and contact information. The questionnaire was implemented as a Web application, by extending Google Forms to support CSQ type of questions, in order to have the possibility to collect anonymous feedback and easily process a potentially large number of respondents. Yet all respondents preferred to use an email version, which made it possible to contact them easily for follow-up interviews.
The questionnaire was approved by the Human Research Ethics Committee (HREC) at The Open University (UK) and is available as Appendix B (email version).
5.3.1 Quality of the data: validity and reliability
The survey attempted to maximize both the reliability and validity of the data collected. In this context, reliability is the degree to which it produces repeatable and consistent results; internal consistency reliability, in particular, is “the degree to which different test items that probe the same construct produce similar results” (Phelan and Wren, 2005). Validity indicates to what extent the test
items actually measure what they are supposed to (Rattray and Jones, 2007). There are different types of validity: content, construct, and external validity are considered here.
Content validity
Content validity “ensures that the operationalization of a construct adequately represents the domain of coverage of the construct” (Agarwal and Venkatesh, 2002, p. 173), and it is frequently assessed by expert assessments or literature reviews.
Here, questions were directly associated with precise task categories and instances, derived from the analysis of the scientific literature, hence attempting to represent the research community’s understanding. Additionally, Information Foraging Theory, a widely used (Pirolli, 2009) behavioural model, was used to help interpreting and validating the identified tasks. However, to guarantee that the whole domain (i.e. the tasks of interest to educators) was adequately covered, the survey included for every group and again in the final section, open questions asking participants if they had any additional suggestions or comments (Rattray and Jones, 2007).
Construct validity
Construct validity is the degree to which a test measures the intended construct. To provide evidence supporting construct validity, convergent and discriminant validity can be used, based on the correlation among similar or dissimilar measures (Rattray and Jones, 2007).
Here, construct validity was assured by construction: each single-item question corresponded directly to a task category or instance. It made no sense to apply convergent or discriminant validity checks, as the questions were independent by construction, and could not be expected to converge or diverge. External validity (population and ecological validity)
External validity considers the generalizability of the results that can be obtained; it is concerned, for example, about the representativeness of the respondents. To support external validity, the survey was run among actual educators, avoiding fictitious roles such as engaging students or researchers asking them to take the role of an educator. Indeed, non-educators cannot be expected to judge reliably, for example, the importance of approximate versus precise alignment of a resource to an educational standard, or the advantage of using different proximity metrics in a QBE expansion operation.
Reliability
While participants were expected to be highly reliable, based on their experience and motivation, the reliability of the data collected could be checked by a redundant question. Additionally, outliers in the answers obtained, that is unusual weights – distant more than 3 times the interquartile range from those assigned by other participants – were double-checked in follow-up interviews, in order to exclude possible mistakes and fully understand their motivation. The cases at risk of misunderstandings because of a limited command of the survey language (English – for a potentially wider distribution), were handled by administering the survey as a “structured interview”.
5.3.2 Data collection protocol
The survey was made available in three different modalities: (1) self-administered via a document to be filled-in and sent via email, (2) self-administered via a Web form that could be filled-in also anonymously, (3) as a structured interview, including via telephone or Skype. Invitation to participate was sent to a few relevant mailing lists and to some directly known teachers in Italy and the UK. The survey was first piloted with two respondents, and, following a discussion with them aiming to spot potential misunderstandings, modified to eliminate a few ambiguities and introduce further explanations.
Once the first round of data collection was completed, a boxplot diagram of the weights assigned enabled the identification of outliers. These were scrutinized to identify potential unwanted mistakes, and the worst cases discussed with respondents: they were invited to a follow-up interview to explain and discuss their decisions, and possibly amend the data according to the resulting improved understanding. Following this process, the number of outliers could be decreased, contributing to improve the quality of the data and the applicability of parametric analysis techniques.
Finally, every comment collected via open questions was followed-up, to fully understand the underlying motivations and to elicit additional information. These open discussions enabled the collection of additional qualitative information, which helped in building a broader perspective on educators’ way of thinking, which proved to be the most valuable information.