EXPERIMENT-TO-CAUSATION INFERENCE: UNDERSTANDING
96 EXPERIMENT-TO-CAUSATION INFERENCE: UNDERSTANDING CAUSALITY IN A PROBABILISTIC SETTING
We identified twelve elements within uncertainty and causality where the reason- ing of students needed to be developed in instruction to enable them to appreciate more fully the argumentation and concepts underpinning the designed experiment and the randomization test.
Keywords: Randomization test; Introductory statistics students; Randomized ex- periment; Dynamic visualizations; Causality and uncertainty; Inference argumen- tation
4.1 Overview
In this chapter we focus on the randomized experiment and understanding causal- ity. Since causality is established within a probabilistic setting, we aim to explicate the notions within uncertainty underpinning experiment-to-causation inference us- ing the randomization test. We discuss six main interconnected uncertainty ideas that underpin a two-lesson learning trajectory designed using the dynamic Visual Inference Tools (VIT:http://www.stat.auckland.ac.nz/⇠wild/VIT). We then explore ideas of uncertainty prevalent in students’ reasoning processes as they progress from thinking about the observed data, recalling the randomization test with the VIT soft- ware, to making a claim about the data. We identify twelve notions of uncertainty that instruction may need to address when developing students’ ideas in the realm of experiment-to-causation inference.
The study is part of a large project, which aimed to understand how to introduce school- and tertiary-level students to inferential ideas using bootstrapping and ran- domization methods. The research reported in this chapter focuses on the pre- and post-instruction tests and interviews of six introductory university and workplace statistics students. Occasionally we refer to the test responses of the other students (n ⇡ 800). The study was conducted within the classroom setting for the university students (class sizes ⇡ 450) and a professional development workshop setting for the workplace students (n ⇡ 20).
4.2 Problem
Research on statistical inference has largely focused on sample-to-population infer- ence and students’ understanding of significance testing including the p-value. Apart from the work of Madden (2008a, 2008b, 2011) there seems to be little research that focuses on experiment-to-causation inference. With experimental design and causal inference included in the introductory statistics curricula at both the secondary and tertiary levels (e.g., College Board, 2010; Common Core State Standards Initiative, 2010; Franklin et al., 2007; Ministry of Education, 2007), there is a need to explore students’ reasoning about causality. Frameworks for understanding students’ reason- ing, conceptualizations, and misconceptions together with researched learning tra- jectories need to be further developed to inform the teaching of causal inference. Hence, it is useful to study experiment-to-causation inference in order to understand
LITERATURE AND BACKGROUND 97
the reasoning processes that students use regarding causality and uncertainty when learning the randomization test. Using that knowledge we should be able to con- struct better learning trajectories. Although we acknowledge that the randomization test is a formal inferential method, our approach could be classified as partial infor- mal inference, as students are not introduced to formal ideas of the null hypothesis, p-values and significance.
4.3 Literature and Background
For some time, statisticians (e.g., Pearl, 1996) and educators (e.g., Wild & Pfannkuch, 1999) have questioned why statistics has neglected causality. Wild and Pfannkuch (1999, p. 238) suggest that the looking for causation should be at the forefront in education:
Statistics education should really be telling students something every scientist knows, “The quest for causes is the most important game in town.” It should be saying “Here is how statistics helps you in that quest. Here are several strategies and some pitfalls to beware of along the way. . . ”
Since the search for causes is of fundamental importance, they believe that a goal of the introductory statistics curriculum should be to move students from association to causation, and that there is a need to provide accessible material to teachers to meet this goal. They point out that correlation, the objective measure of linking one variable to another, along with the mantra, “correlation does not imply causation” has dominated statistics and statistics education. Pearl (1997) believes the field of statistics has not addressed causal inference, apart from the randomized experiment, because the language of statistics is ensconced in the language of probability. For the field to move forward in the area of causality he has invented mathematical con- structs for thinking about causal pathways in observational studies. Rubin (2004) has proposed a similar, albeit different, framework. Both Pearl and Rubin have opined that introductory statistics courses need to better address statistical inference and causality, especially related to observational studies. In fact, Pearl has instigated an award, the American Statistical Association’s Causality in Statistics Education Prize, to encourage the teaching of causal inference in introductory statistics.
Currently, within conventional statistics courses, testing for a causal relationship is limited to Fisher’s randomized experiment, where there is an intervention and random assignment of units into groups (e.g., treatment and control). The random- ized experiment, to date, has been the primary path to causal inference in statistics. Fisher’s insight, which enabled causal inference, was replacing the link between the explanatory and response variables with a random coin toss, that is, random re-assignment (Pearl, 1996). In this situation, probability modeling can be used to determine whether the treatment is effective. This juxtaposition of uncertainty and causal inference within the context of the randomization test may be problematic for students when first encountered.