2.3 Hypothesis testing
2.3.5 Effect size
When comparing the difference or dependence between two sets of observations, it is often important to investigate effect size alongside corresponding significance tests. For example, in the case of samples x = {x1, x2, . . . , xnx} observed under control
conditions and samples y = {y1, y2, . . . , yny} observed under treatment conditions
following an intervention, a statistical test may tell us if the change in observation mean, ¯y−x¯, is significant but nothing about the magnitude of the effect size. In cases where n1 and n2 are very large, a very small treatment effect may result in
rejection of the null hypothesis of no difference between the means of the populations from which xand y are sampled, but be of no practical importance.
In the case of comparing the difference between sets xand y, Cohen’sdstatistic (Cohen, 1992) is a measure of effect size. The value of d depends upon sample means, ¯x and ¯y, and standard deviations,sx and sy,
d= x¯−y¯ s s= s (nx−1)s2 x−(ny −1)s2y nx+ny−2 .
The magnitude ofdmay be assessed by comparison to thresholds (Cohen, 1992), with |d| ≤ 0.2 denoting a negligible effect size, 0.2 < |d| ≤ 0.5 a small effect size, 0.5<|d| ≤0.8 a medium effect size and anything greater than 0.8 a large effect size. Alternatively, the magnitude of d may be assessed in comparison to its standard deviation, sd, given by s2d= nx+ny nxny + d2 2(nx+ny−2) nx+ny nx+ny−2 .
Chapter 3
Differences between collections
of point patterns
This chapter introduces in more detail the methodology proposed for testing for differences between collections of point patterns. The material has been made avai- lable as a working paper (Honnor et al., 2017a), and is presented in this chapter in a slightly reduced format as background information common to the whole thesis is presented in Chapter 2.
An introduction is first given to put the biological question in context, describe the data and formulate a statistical question which aims to answer the biological question with the data available. Following this the methodology is presented in Section 3.2. In Section 3.3 a validation study design is presented and the results analysed. Application of the methodology to a set of TACC3 biological data follows in Section 3.4 before a summary of the conclusions is presented in Section 3.5.
3.1
Introduction
Advances in sensor (Kanoun and Trankler, 2004) and storage technology (Grochow- ski and Hoyt, 1996) allow parallelisation of data collection across sensor networks and continuous monitoring. Improvements in communication networks have also made collected data more accessible. One result of this is the production of large, specialised spatial point pattern data sets, the analysis of which requires develop- ment of new statistical techniques. A particular area in which this is apparent is the imaging of large numbers of biological samples at high magnifications. The resulting images may be analysed computationally to determine the location of subcellular structures of interest. Further investigation can shed light on the inner workings of the cell and the effect of applied external conditions.
of point patterns with a particular biological application in mind, the structure of microtubules within kinetochore fibers, but with further applicability to more general data sets. Analysis of microtubule structure is of particular importance as microtubules perform a vital role during chromosome separation in mitosis, where errors can lead to aneuploidy, a common cause of genetic disorders.
Point pattern data comprising observations from two populations such as those analysed in this chapter may arise in numerous ways. For example, Plant locations, divided into two populations based upon the species of plant (Mateu et al., 2014), and the location of neurons within the brain, divided based upon whether the individual suffers from mental illness (Diggle et al., 1991).
It may be desirable to determine if there is variation in the structure of point patterns to make inference on underlying differences between the two populations. Such variation may occur consistently, but at a small enough scale to make detection by eye impossible. This chapter describes a statistical methodology for application to point patterns and a class of marked point patterns, to test for the existence of structural differences between two collections of point patterns.
One modelling approach is to model each of the populations individually and compare the models. Due to the wide variety of models and difficulties fitting them to data, we instead compare the collections of point patterns directly using a number of nonparametric summary statistics, which are then combined across and compared between collections to produce test statistics. Nonparametric permutation testing is then be used to quantify the significance of reported test statistic values.
We introduce a variety of test statistics, such as the number of points and the dis- tances between points, and a number of comparison methods, for example pointwise and functional comparisons. Decisions on which of the suggested testing procedures are most suitable will depend upon the type of pattern structure of interest, the required sensitivity of the testing procedure and the desired interpretability of the test statistic.