CHAPTER 4: RESEARCH METHOD
4.9 Data Analysis
4.9.4 Multilevel mixed-effects modeling
the nested nature of the data with variables relating to the task (level 1) and participant (level 2). Multilevel models enable the estimation of main effects and interaction effects of experimental conditions with nested data at two levels of analysis: task at level 1 and participant at level 2. A multilevel model provides multiple advantages over a repeated-measures ANOVA or repeated-measures ANCOVA: (a) the ability to include multiple covariates related to task including categorical or dummy variables (e.g., sequence, task time, etc.), (b) coefficient estimates indicate the strength, direction and significance of effects of each variable
while partialling out the effects of the other covariates in the model, (c) estimates of variance attributable to differences between tasks within an individual (i.e., variance in residual, level 1) and between individuals (i.e., variance in intercept, level 2), and (d) effect size estimates. In addition, multilevel models can handle unbalanced data.
The models described here include independent variables relating to the task (level 1) and participant (level 2) which were selected based on the literature and hypotheses as having potential effects. A hierarchical modeling approach was only used to test Hypothesis 7 by adding variables to the model testing Hypothesis 1a: individual differences in time pressure perception (H7 Model 1), and individual differences in time pressure perception with interaction effects with time limit condition (H7 Model 2). In hierarchical linear modeling, a sequence of models are estimated with additional variables added, the change in the amount of variance explained by adding additional variables between the versions of the models is compared.
Model building. To test for topic-related differences in Study 1, models include two fixed effects variables, task topic and topic order, and a random intercept for participant. A null model with a random intercept for participant and no independent variables was run to enable comparison for calculation of Pseudo-R2statistics. The findings section for Study 1 contains the results of significant post-hoc contrasts for topic differences when the model and contrasts are significant.
For Study 2, models include covariates appropriate to the analysis. To test for the effects of experimental factors (time, task topic, and topic order) onpre-task perceptions, models included independent variables relating to experimental factors (time limit condition, task topic, a time limit and task topic interaction, task order), and demographic and individual difference covariates (student status, age, search self-efficacy).
To test for the effects of experimental factors or pre-task perceptions onsearch and decision behaviors, the models added the pre-task perceptions of the task/topic (topic interest, prior knowledge, belief can make good recommendation without searching, expected difficulty, expected difficulty stopping, and task self-efficacy). They also included independent variables relating to experimental factors and demographic and individual difference covariates.
To test for the effects of experimental factors, pre-task perceptions, or search and decision behaviors on post-task perceptions and recommendations, models added key behavioral measures alone and interacted with time limit condition: presence/absence of a clock view, decision time (in minutes), number of queries issued, max view rank of items from the SERP, the total count of items hovered over on the SERP, the total number of all SERPs viewed (including re-views), the number of nonSERP documents opened from the
SERP (including re-views), and the total number of nonSERP documents viewed. The models also included independent variables relating to experimental factors, pre-task perceptions of the task/topic, and demographic and individual difference covariates. To test for individual differences in time pressure perception (Hypothesis 7), the composite variables from the Active Procrastination scale are added to the models.
Uncorrected p-values are reported in the results section. The threshold for statistical significance used wasp<.05 for models and model coefficients,p≤.003 for marginal effects, andp<.0083 for topic and order effects. Section 6.4 summarizes how to read regression results.
Model estimation and fit. For dependent variables withratio or interval data(or ordinal data assumed to be interval), multilevel mixed-effects linear regression models were estimated with a Gaussian distribution. Given the relatively small sample sizes, the restricted maximum likelihood (REML) estimator was used (Fitzmaurice, Laird, & Ware, 2011; Maas & Hox, 2005) to prevent biased estimates and standard errors. In addition, given the unbalanced data and sample size, the Kenward-Roger approximation of denominator degrees of freedom was used for small sample inference (Fitzmaurice et al., 2011; Kenward & Roger, 1997). The used Kenward-Roger approximated denominator degrees of freedom were used to calculate theFstatistic for the model as well as thetstatistic for significance tests for fixed effects parameters.
For count dependent variables, multilevel mixed-effects negative binomial models were estimated; Poisson models showed signs of overdispersion (i.e., greater variance than would be expected from a Poisson distribution). For clock views, abinary dependent variable, a multilevel mixed-effects logistic regression model was estimated. For recommendation specificity, anordinal dependent variable, a multilevel mixed-effects ordered probit regression model was estimated.
Overall model fit statistics are reported: Ffor multilevel mixed-effects linear regression models using the restricted maximum likelihood (REML); and Wald’s χ2 for multilevel mixed-effects ordered probit, multilevel mixed-effects negative binomial models, and multilevel mixed-effects logistic regression models. These overall model fit statistics indicate whether the entire set of independent variables significantly predicts the dependent variable. The model log-likelihood (or log restricted likelihood for REML) and BIC are reported to enable model comparison. When possible, the intraclass correlation (ICC) is reported; the ICC indicates the extent to which the observed variance in the dependent variable is attributable to differences at the individual level versus the task level.
When possible, pseudo-R2 statistics were calculated to provide an estimate of the increase in the proportion of variance explained by adding fixed effects variables compared to a random intercept-only model.
Pseudo-R2was calculated using the Snijders and Bosker method (Snijders & Bosker, 2012). Pseudo-R2is reported at both levels of the model: pseudo-R2for Level 2 (participant) is the increase in modeled variance by adding the Level 2 variables to the null model (i.e., the model with no fixed effects, only the random intercept for participant). The pseudo-R2statistic was obtained using the mltrsq in the mlt-package in Stata (“MLT: Stata module to provide multilevel tools,” 2013) after a model with the same specification was estimated using full maximum likelihood.
Marginal effects and planned comparisons. The fully specified models include multiple interactions of time limit condition with independent variables including categorical independent variables (e.g., topics, recommendation specificity categories) and continuous independent variables (e.g., task time, query count). As such, the interpretation of the effects of time limit condition focuses on themarginal effectsas recom- mended by Brambor, Clark, and Golder (2006), Mize (2019). Marginal effects indicate the change in the predicted value of the dependent variable as a result of a change in an independent variable holding other independent variables at specified values. Predicted values and marginal effects are presented graphically, in tables, and in the text. As noted, uncorrected p-values are reported, and a threshold for statistical significance ofp≤.003 was used for marginal effects as dependent variables derived from post-task questionnaires there were multiple marginal effects calculated after each model. Planned comparisons of predicted values were used to test for significant differences by topic and order. A threshold for statistical significance ofp<.0083 was used planned comparisons.
4.9.5 Qualitative analysis. To address the quality of the recommendation made by the participant,