Multiple hypothesis testing correction is vital within a GWAS, as this ensures the proper report- ing of false positive genotype-phenotype associations upon the corresponding sampled SNP panel thereof. When testing multiple null hypotheses, there are many definitions for the Type I error rate. Within a GWAS, it seems control of the family-wise Type I error rate is most befitting. The MHT goal is to control the adopted Type I error rate in the strong sense, while simultaneously maximiz- ing statistical power to reject false null hypotheses. The Bonferroni MTP is a popular approach in
GWAS for strong control of the FWER. However, when implemented upon a sample of correlated data, this approach can suffer a loss in statistical power. Meanwhile, the maxT and minP MTPs – the multiple testing procedures which control the FWER in the strong sense and provide maximum statistical power amongst all MTPs controlling the FWER – are seldom implemented within these studies due to their high computational effort.
There would seem to be two general approaches in addressing the computational problem of the maxT and minP MTPs: accelerate the computational components for these MTPs; or, develop an efficient approximation approach and improve its accuracy. The past decade has seen research primarily focused upon the latter approach. We employed the former approach and have developed GPER, an optimized GPU-based algorithm in conducting multiple tests of association within large- scaled categorical genetic data. Our algorithm presents a significant improvement in computational performance over that of the widely utilized GWAS PLINK software, and is on par with the fastest alternative methods (e.g., PRESTO, PERMORY). However, unlike these methods, our approach is novel insofar as we exploit offloading the computational burden for the maxT and minP MTPs to the GPU of the personal computer. Due to frequency (a measure of the speed for a single processing core) scaling limitations of CPUs, the future of HPC upon the PC is arguably parallel computing. Parallel computing upon the GPU of the PC is a very efficient approach to tackling a computational problem, and has begun to see its interface within the statistics discipline (see e.g., [107, 108]). Our implementation of this approach demonstrates the utility of the GPU in tackling an exceptionally demanding computational problem to a sampled GWAS data set, but its utility is not limited to sampled GWAS data (e.g., the Bipolar data set utilized within §2.6). We utilize GPER within the simulation analysis of the next chapter, in demonstrating two key notions therein: (1) that ones’ assumption of an asymptotic null distribution for the Cochran-Armitage trend test statistic under H0, can lead to the gold standard maxT and minP MTP approach yielding unbalanced multiplicity adjustment in a GWAS; and (2) to provide empirical evidence in support of the proposed methodology.
We have developed the GPER algorithm, to address the computational issues of the maxT and minP MTPs within the realm of GWAS. However, by modifying the algorithm, this GPU approach can be extended to include other computationally demanding areas of statistics. In particular, the algorithm can be extended to include other parametric multiple hypothesis testing circumstances in which the maxT or minP MTPs are applicable. For example, our approach can be adapted to mi-
croarray experiments, where the maxT MTP can be utilized to correct for MHT of differential gene expression across probesets of a microarray (i.e., MHT correction for parametric t-tests and F -tests; see [60] for an excellent overview of MHT correction in microarray experiments). Additionally, we have successfully modified/adapted the GPER algorithm in extending the methodology of [109] to include the maxT approach thereof (see pg. 5 of this article for the connection of their methodology to the maxT MTP), for MHT correction when testing for gene-environment interactions. The GPER algorithm can also be extended to controlling, say the kth-level generalized FWER (gFWER(k)), k = 1, . . . , n. Control of the gFWER(k) is a generalization of the FWER, where the maxT and minP MTPs are modified and based upon the respective kth and (n − k + 1)st distributions of the order statistics for the test statistics (maxT) and p-values (minP) (see e.g., pp. 256–257 of [110]) – note: taking k = n recovers the FWER and the respective maxT and minP MTPs. Finally, outside the realm of the maxT and minP MTPs – and extensions to controlling the gFWER(k) thereof – our GPU approach could be adapted to other resampling based MHT procedures, such as SAM (see [111] and [112]).
ENHANCEMENTS TO THE STATISTICAL INFERENCE OF GENOME-WIDE ASSOCIATION STUDIES
3.1 Introduction
There seems to be confusion within the literature regarding the central underlying dilemma encompassing multiplicity correction within a GWAS. Contrary to the focus of recent methodolog- ical approaches (i.e., the Meff and MVN approaches described within §2.1.1), this dilemma does not entail the computational problem – a consequence – which arises from the implementation of permutation MTPs. Rather, said dilemma is proper application of the implemented multiple testing procedure; this notion is essentially lost in the GWAS literature. In conducting statistical inference within GWAS, the asymptotic-based Cochran Armitage Trend test statistic is commonly employed to test the null hypothesis of no genotype-phenotype association on a per-marker basis. Due to the extremely small significance level on a per-marker basis, we have found a discrepancy between the asymptotic chi-square distribution for this test statistic and its true underlying null distribution. Reliance upon asymptotic assumptions for this test statistic in this regard can result in improper control of the FWER within a GWAS.
Herein, we develop a methodology to correct the discrepancy between the chi-square distribu- tion for the asymptotic-based Cochran-Armitage Trend test statistic and its true underlying null distribution. Furthermore, this method embraces the minP MTP, thereby accounting for correlation within the sampled data and achieving unbiased strong control of the FWER in a GWAS. Adap- tation of this methodology in practice has several key positive repercussions, including: correcting upon improperly obtained statistical results within historical GWAS; and providing multiple hy- pothesis testing tools, so that statistical inference is properly conducted within current and future genetic association studies.
3.2 The Test Statistics Null Distribution for the Multiple Hypothesis Testing Problem