• No results found

2.4 Further issues related to this work

2.4.3 Methods to report the results of different analysis strategies

One of the challenges of dealing with uncertainty, discussed in section 2.4.2, is the

reporting of different types of uncertainty. Therefore, straightforward methods to

quantify and illustrate uncertainty are needed. In this short overview, I will outline some

of the methods suggested to report epistemic uncertainty, including the vibration of

effects approach, which is extended and used in contributions 3 and 4. I will also briefly

discuss some advantages and disadvantages of these approaches.

Computational model robustness

Computational model robustness was developed

to estimate all possible models from a theoretically informed model space (Muñoz and

Young, 2018; Young, 2018). As a model space, the authors do not only consider specific

choices in a probability model, but suggest extending the model space for instance to

variable definitions or software implementations. Therefore, they address what we

denote as method uncertainty in their work. Young (2018) extensively discusses how to

define such a model space and presents three approaches. The first approach is motivated

by the idea that all models an analyst considered as worth running during the study

are worth reporting. An ‘uber log file’ could theoretically save all these models. The

second approach, denoted as the ‘task force approach’, combines a wide range of expert

opinions, which is close to the crowdsourcing approach of Silberzahn and Uhlman (2015).

As a third strategy, the authors suggest combining the uber log file and the task force

approach. Taking all the models into account, a ‘modeling distribution’ can be calculated

and visualized with kernel density graphs. In such a figure, a favorite model can be

indicated.

Specification Curve

The specification curve analysis was proposed and practically

illustrated by Simonsohn et al. (2015) in the field of social science. It considers all

operationalization decisions in the data analysis as specifications, and thus addresses

what we denote as method uncertainty. Conducting a specification curve analysis can be

summarized in three steps: First, all reasonable specifications have to be found, second,

all of these specifications have to be calculated, and third, a joint permutation test is

performed to test the null hypothesis of no effect. For illustration, Simonsohn et al.

(2015) suggest a two-paneled figure, with an upper part showing a ‘curve’ of estimated

effects and specification numbers. In this curve, a clear distinction between negative

and positive estimates can be made, and significant estimates can be highlighted. In the

lower panel, information about the decisions that produce the estimates can be found. A

practical application of specification curve analysis was provided by Rohrer et al. (2017)

in psychological research, who investigated birth-order effects on personality traits.

Multiverse analysis

The multiverse analysis was suggested by Steegen et al. (2016)

in psychological research with the aim of performing a statistical analysis for different

data pre-processing steps. To ensure that the alternative data sets cover reasonable

choices, they base their practical application on previously published studies, where

these choices have actually been considered. In order to visualize the results, they

suggest showing a histogram of raw p-values. The distribution of p-values obtained by

different data pre-processing choices can give information on the robustness of findings

due to alternative choices: p-values which are nearly uniformly distributed are not as

robust as p-values that indicate increased significance. Furthermore, for a more detailed

investigation, results can be reported in grids of p-values, where a p-value can be traced

back to the analysis strategy that yielded it. In addition to the practical illustration of

Steegen et al. (2016), applications of a multiverse analysis can for instance be found in

McBee et al. (2019), Stern et al. (2019), or Credé and Phillips (2017).

Vibration of effects

The concept of vibration of effects was initially proposed by

Ioannidis (2008) and extended by Patel et al. (2015), who used it to practically examine

model uncertainty in a large epidemiological study. The developers suggest visualizing

results obtained from different analysis strategies with volcano plots. These plots typically

show p-values on the y-axis and effect estimates on the x-axis. Moreover, the variability

of p-values and effect estimates can be quantified through summary measures. As such,

Patel et al. (2015) suggest relative effect estimates and relative p-values, defined as the

ratio of the 99th and 1st percentile of effect estimates and the difference between the

99th and 1st percentile of -log10(p-value), respectively. Apart from these primal works,

applications of the framework can be found in Palpacuer et al. (2019) and Chu et al.

(2020) for different method choices. In our work, we will use and extend the vibration

of effects framework in order to assess and compare measurement, sampling, model

and data pre-processing uncertainty. Moreover, we will apply it for different types of

regression (logistic regression (section 2.3.2) and Cox regression (section 2.3.4)), which

results in relative odds ratios and relative hazard ratios as summary measures in order

to quantify the variability of effect estimates.

Discussion of advantages and disadvantages

In contrast to computational model

robustness and the vibration of effects, specification curve analysis and multiverse

analysis allow easy tracing back of results to the corresponding analytical choices. This,

however, results in the disadvantage of there being a limited number of models that can

be considered for the visualization. For a large number of analytical choices, this can for

instance be accounted for by visualizing only a subset of these decisions. Furthermore,

when conducting a multiverse analysis, the focus of visualization can be the histogram

rather than the grid of p-values. Similarly, for a specification curve analysis, only the

upper panel of the suggested figure can be shown.

With regard to the other approaches, the specification curve analysis implicates a

permutation test, which provides a decision over all specifications. However, performing

such a test is very computationally demanding and its application has not yet gained ac-

ceptance in practice. On the other hand, the vibration of effects framework encompasses

relative effect estimates and relative p-values as summary measures of the variability of

results. Yet, neither these summary measures nor the permutation test are in principle

limited to their specific framework of visualization.

In general, none of the approaches are limited to the type of uncertainty for which

they were originally suggested. Using the specification curve only for data pre-processing

or model choices is straightforward, and in a multiverse analysis, decisions on model

specification can be similarly included to data pre-processing choices. Finally, the

vibration of effects framework can be extended to sampling, data pre-processing and

measurement uncertainty, as we demonstrate in contributions 3 and 4. In contrast to the

other approaches, this framework provides visualization of effect estimates and p-values

simultaneously. Moreover, for epistemic uncertainty, it allows the highlighting of points

in volcano plots in order to visualize the impact of particular choices. Thus, key choices

can be identified.

Related documents