Remarks on Statistical Methods - Supportive guidance methods for wiki-based learning and knowle

In the course of realising and completing this PhD project, a range of different statistical methods have been used to analyse the datasets of the five experiments which are described in the following chapters. Some methods are to date not that prevalent in psychological research or represent deviations from defaults as taught in many textbooks about analysing data in Psychology. The relevant methods are briefly described in the following paragraphs. If more details about a statistical method were necessary, due to specific requirements of an experiment, these details are presented in the method or result section of the respective experiment. The datasets and study materials are stored at the Open Science Framework and available for download at https://osf.io/w9v7g/.

Comparing differences of two or more groups. As a default test for two-group comparisons, Welch’s t-Tests was applied for the main statistical analyses instead of Student’s t-Test. Among others, Ruxton (2006) and Delacre, Lakens, and Leys (2017) showed that Welch’s t-Tests is more robust for smaller and medium-sized samples and consequent an increased likelihood of not meeting either the assumption of normality or equal variances. When more than one assumption of parametric tests was violated (e.g. normality and homoscedasticity of residuals) and/or sample sizes were very small (Experiment 3), non-parametric statistical tests were used for all analyses (cf. Chapter 4).

For group comparisons that involved more than two groups, the Brown-Forsythe test of variances was used as a robust test on the F -distribution, if the data violated the homogeneity of variance assumption (Brown & Forsythe, 1974). For analysing multiple dependent variables that were most likely to be correlated by design, mul- tivariate analyses of (co)variance (MAN(C)OVA) were used. In that case, Wilk’s λ

was chosen to report as an appropriate test statistic that is adequately powerful to detect statistical group differences when multiple dimensions come to play (Tabach- nick & Fidell, 2012).

Analysing conditional (mediation and moderation) effects. Conditional effects of influencing variables on dependent variables (Experiments 1, 2 and 4) (cf. Chap- ters 2, 3 and 4) were analysed with the PROCESS macro (version 2.16.3) for SPSS (version 24.0.0.2) by Hayes (2013). For all mediator and moderator analyses, bias- corrected and accelerated (BCa) bootstrapping with 5,000 samples was applied. Any level of conditional effects reaching statistical significance on α = 5% are illustrated by Johnson-Neyman plots using the so-called floodlight method. The common pick- a-point approach with simple slopes plots was additionally chosen to illustrate effects that did not meet the criteria of α = 5%, but still are of potential relevance. For all conditional effect analyses, 95% confidence intervals for the effect sizes are reported.

Analysing two-group equivalence. Equivalence hypothesis tests (Experiment 5) were performed with the R package TOSTER (version 0.2.5) by Lakens (2017b). The underlying two one-sided t-Test (TOST) procedure requires to determine a pri- ori the smallest effect size of interest (SESOI) for the specification of equivalence bounds (Hauck & Anderson, 1984; Schuirmann, 1987). Applying the TOST procedure, in conjunction with a t-Test of differences within the null hypothesis significance testing (NHST) framework, can yield four possible outcomes for an effect (cf. Figure 1.7): a) statistically equivalent and not different, b) not equivalent and statistically different, c) statistically equivalent and different, and d) not equivalent and not different.

Figure 1.7. Mean differences (black squares) and 90% confidence intervals (horizontal lines) with equivalence bounds ∆L = -0.5 and ∆U= 0.5 for four combinations of test results that are statistically equivalent or not, and statistically different from zero or not. Pattern A is statistically equivalent, pattern B is statistically different from 0, pattern C is practically insignificant, and pattern D is inconclusive (neither statistically different from 0 nor equivalent). Figure adapted from Lakens (2017a).

Comparing statistical evidence with Bayes Factors. Bayes Factors were computed with the open source statistics program JASP (version 0.8.2.0) developed by The JASP Team (2017) and the R package BayesFactor (version 0.9.12-2) by Morey and Rouder (2015). The addition of Bayes Factors is also suggested when smaller to moderate sample sizes are present (Experiments 1, 2, 3 and 4) (cf. Chapters 2, 3 and 4) because approximately 50% smaller samples are required for reliable factors to converge compared to the sample requirements for adequately powered studies within the frequentist framework (Schönbrodt & Wagenmakers, 2017).

For all Bayesian hypothesis tests that involved two-group comparisons an in- formed prior distribution of t(0.35, 0.102, 3) was used as discussed by Gronau, Ly, and Wagenmakers (2017). For general linear model analyses, a default r scale pa- rameter of r = .5 for ANOVAs and r = .35 for regression models was used (Morey & Rouder, 2015). When the analyses in the following chapters estimate Bayes Factors

in favour of the alternative model BF10 >100, the logarithmic logBF10 will be re-

ported due to better readability. Although Bayes Factors are continuous parameters, a rough heuristic classification aid is provided in Table 1.1.

Table 1.1

A classification scheme for Bayes Factors as suggested by Lee and Wagenmakers (2013).

Bayes Factor BF10 BF01

>100 Extreme evidence for H1 Extreme evidence for H0 30 – 100 Very strong evidence for H1 Very strong evidence for H0 10 – 30 Strong evidence for H1 Strong evidence for H0 3 – 10 Moderate evidence for H1 Moderate evidence for H0 1 – 3 Anecdotal evidence for H1 Anecdotal evidence for H0 1 No evidence No evidence

Reporting of effect sizes. Effect size measures that can be represented by propor- tions of variance (η2_{, R}2_{) are reported with 90% confidence intervals, for effect sizes}

based on standardised means (d, r) 95% confidence intervals are used in case of two- sided tests and 90% confidence intervals when one-sided directional tests are used (Steiger, 2004). All confidence intervals have been calculated using the R packages MBESS (version 4.4.0) (Kelley, 2017) and psychometric (version 2.2) (Fletcher, 2010).

Part II

2

Controversy Awareness on Evidence-led Discussions

as Guidance for Students in Wiki-based Learning

Wikis mainly distribute user-generated content over the article and its corresponding talk page. While educational research provides article- related suggestions for learner’s support, research has rarely analysed the potentials of supporting learning-related processes at the talk page level. With the presented experiment, this issue was addressed by inves- tigating effects of visual controversy awareness information on content- related discussion threads. Such information can induce socio-cognitive conflicts which research assumes to be beneficial for learning, particu- larly when contradictory evidence leads wiki discussions. It was investi- gated how controversy awareness highlight as implicit guidance directs students’ (N = 81) navigation and learning processes as well as their in- ternalised knowledge representations. Results indicate that the imple- mentation of controversy awareness representations helped students to focus on selecting meaningful discussion threads. The findings suggest that wiki talk page users can benefit from additional structuring aids and increase their learning outcome when being aware of occurring contro- versies.

In document Supportive guidance methods for wiki-based learning and knowledge construction (Page 75-82)