5. OER task-taxonomy empirical evaluation
5.4 Data analysis methodology
5.4.1 Quantitative data analysis
Applicability of parametric statistics
The data collected through CSQ are numerical data, which capture subjective estimations of magnitude. Their level of measurement is clearly ordinal as a minimum (a higher weight assigned to Task A compared to Task B, implies that the participant considers Task A more important than Task B). It could be argued that it is not interval, because it is difficult for people to score the importance of task on an interval rather than an ordinal scale. Yet respondents have considerable experience in the domain, so that it could be claimed that approximate equal intervals between points can be reasonably assumed. If these data approximate interval data, then they also approximate ratio data: zero is meaningful (corresponding to a useless task), hence it could be reasonably assumed that, for example, 40 is double than 20. However, because of the limited sample size, their quantitative analysis required caution. In particular, the weights collected were tested for Normality before applying parametric statistics, with the Shapiro-Wilk Test. The null hypothesis of this test is that the sample comes from a normally distributed population; this hypothesis can be rejected with p-values lower than the chosen alpha level. In addition to the test, as with such a sample size there are significant chances of type II errors (false positives), a visual check was performed on the histogram and Q-Q plot of each set of weights.
Whenever any doubt persisted about the applicability of parametric analysis, the results of parametric tests were cross-checked with the results of corresponding non-parametric tests – which are suitable for ordinal data too.
Weights analysis: central tendency with confidence intervals
The weights assigned to tasks and categories (RQ2) were collected via CSQs primarily to foster critical thinking among respondents. Their mean was plotted on a bar chart, to give an immediate appreciation of the relative importance of the task. When necessary to compare items from different CSQs, they were scaled on the same interval between 0 and 10. As a measure of the mean accuracy, that is to understand how well the sample means represent the population means, an estimation of 95% confidence intervals was computed via One-Sample T-Test analysis. In addition, to improve estimate accuracy, bootstrap analysis (Efron, 1979) was employed.
Planned comparative analysis - relative importance of specific tasks
The research question investigating the relative importance of different tasks (RQ2), was investigated by charting basic descriptive statistics, and by using parametric Paired-Samples T-Test. This test is used to determine whether the mean of the populations corresponding to two paired samples is the same (the null hypothesis), that is, whether the differences observed in the mean of the samples could be justified by random noise.
Because some doubts about the applicability of parametric statistics persisted, the non-parametric Wilcoxon Signed Rank Test was used as a cross-check. This test has the same objective as the previous Paired-Samples T-Test, but does not assume that the populations are normally distributed. Explorative comparative analysis - educators with different profiles
While the sample size did not allow an extended multiple (post-hoc) comparison with the desired reliability, via ANOVA for example, to identify significant differences depending on the different profile of surveyed educators, it was still possible to test a few specific interesting cases. In particular, the explorative analysis of possible different priorities of educators having different ages, in relation to aspects that might be influenced by the amount of experience, was attempted. This analysis used parametric Multiple Independent-Samples T-Tests, with a confidence interval of 95%, to test the null hypothesis that the two populations means were equal at α = 0.05, following Levene’s Test for Equality of Variances. The effect-size, standardized group mean difference (Cohen’s d) and r, were also evaluated.
Again, because of persisting doubts about the applicability of parametric statistics, the non- parametric Mann-Whitney U-Test was used as an additional cross-check. This test can be considered as a non-parametric version of the previous Independent-Samples T-Test, having the same objectives but without assuming that the populations are normally distributed.
Explorative correlation analysis
An explorative correlation analysis among weights was carried out, to identify possible correlations to be further investigated. Here too, the non-parametric Spearman Correlation Test was used in addition to the parametric Pearson Correlation Test.
5.4.2 Qualitative data analysis
A limited but precious amount of qualitative data were collected via open questions in the survey, as well as in structured follow-up interviews. The data were analysed according to the method of content analysis (Cho and Lee, 2014). The text was first split in parts (set of sentences, whole sentences, or part of sentences) referring to single concepts. Then, the following simple schema with the most relevant or frequently expressed concepts was extracted:
1. Comment on Google limitations
2. Suggestions for additional tasks in the category Expansion
3. Educational alignments interesting, but unwilling to use explicitly learning objectives 4. More familiar with filtering type of tasks
5. First topics, then learning objectives 6. Comment on overall completeness 7. Suggestion for additional metadata 8. Interested on Query By Examples
9. Increased score for non-authoritative metadata 10. Publishing resources needs economic incentives 11. No interest in using OERs
Finally, the schema was used to colour-code the original text – to make it possible to scan quickly the main concepts expressed. These qualitative data were used to complement quantitative data by cross-checking them, and made it possible to understand motivations. In particular, they made it possible to explain a totally unanticipated way of thinking that proved to be the most useful output of the study.