Explicit Cell Probabilities of the Case-Control Contingency Table

2.5 Supplemental Methods

2.5.1 Explicit Cell Probabilities of the Case-Control Contingency Table

The cell probabilities of the case-control contingency table for the cases were Pr(dd

| case), Pr(Dd | case), and Pr(DD | case) and for the controls were Pr(dd | control),

Pr(Dd | control), and Pr(DD | control), where d and D were the alleles at a bi-allelic

d and D were fd and fD, respectively, and we assumed Hardy-Weinberg Equilibrium

such that the dd, Dd, and DD genotype frequencies were fdd =fd2, fDd = 2fDfd, and

fDD =fD2. The disease prevalence, K, was defined to be

(2.2)

Multiplicative Genetic Mode-of-Inheritance Risk Model

The genetic relative risk (GRR) under a multiplicative genetic mode-of-inheritance risk model was defined to be

(2.3)

We had two equations and sought to determine the case and control cell probabilities of the contingency table as described above. From Equation 2.3 followed

(2.4)

which we substituted into Equation 2.2 to yield

Thus,

(2.6) Applying Bayes’ Law and then substituting the above penetrances gave us the cell probabilities for the cases,

For the study controls, the penetrances followed from the cases’ penetrances,

(2.8)

Applying Bayes’ Law and then substituting the study controls’ penetrances gave us the cell probabilities for the study controls, as we similarly did for the cases,

Dominant Genetic Mode-of-Inheritance Risk Model

The GRR under a dominant genetic mode-of-inheritance risk model was defined to be

(2.10)

Similarly to the proof shown above for the multiplicative model, the genotype probabilities for the cases were

and the genotype probabilities for the controls were

(2.12)

Recessive Genetic Mode-of-Inheritance Risk Model

The GRR under a recessive genetic mode-of-inheritance risk model was defined to be

Similarly to the proof shown above for the multiplicative model, the genotype probabilities for the cases were

(2.14)

and the genotype probabilities for the controls were

Unscreened and Public Controls

For unscreened and public controls, the genotype probabilities for controls were set to the genotype probabilities in the general population, namely

(2.16)

Of note was the observation that the cases’ genotype probabilities were not a function of K for the multiplicative, dominant, and recessive genetic mode-of-inheritance risk models (as shown in the above sections), whilst the study controls’ genotype probabilities were indeed a function of K. This provided justification for the result in Table 2.1 of the manuscript whereby the power for the 1-stage design with “Study Controls Only” varied with varying levels of K, but the power for the 1-stage design with “Public Controls Only” did not vary with varying levels of K (and similarly for the analogous tables in which the true genetic mode-of-inheritance risk models were dominant and recessive).

2.5.2 Alternative 1- and 2-df Tests

The results in the main manuscript were based on the Cochran-Armitage trend test and assuming an underlying multiplicative genetic mode-of-inheritance risk model. In Supplemental Figure 2.2, we present power curves for the one- and two-stage designs using 2,000 cases, 2,000 study controls and 5,000 public controls over a range of GRRs, disease prevalences, and susceptibility allele frequencies. In Supplemental Table 2.6, we calculated the Cochran-Armitage trend test for the three models considered in Table 2.1

for disease. Supplemental Tables 2.7 through 2.12 present analogous results to Tables 2.1 and 2.2 in the main manuscript, though utilizing the general 2-df, dominant 1-df, and recessive 1-df tests under multiplicative, dominant, and recessive models, respectively. Specifically, Supplemental Tables 2.7, 2.8, and 2.9 are the general, dominant, and recessive versions of Table 2.1 in the main manuscript and Supplemental Tables 2.10, 2.11, and 2.12 are analogous to Table 2.2. The single- and two-staged association study designs as described in the methods of the main manuscript were also used for the Supplemental Tables in terms of the number of study cases, study controls, and public controls and the size of the GWA and follow-up genotyping platforms (Models 1, 2, and 3 of the main manuscript). In addition, the same disease prevalences were specified.

However, depending on the genetic model, we allowed the risk allele frequency (fD) and

GRR to vary. For the 1- and 2-df tests, we computed power using the “cost effective”

(CE) method proposed by Buksz´ar and van den Oord (Buksz´ar and van den Oord,

2006a). The CE is an approximation for computing the power of Pearson’s statistic for 2 x m (where m refers to the number of categories) contingency tables that is accurate

and efficient in terms of computer time. The authors point out (Buksz´ar and van den

Oord, 2006b) that the CE is very close to the true value of the distribution of Pear- son’s statistic and more accurate than a commonly used approximation (based on a non-central chi-square) that overestimates power in some scenarios and underestimates it in others.

General 2-df Test

For Supplemental Tables 2.7 and 2.10, the general 2-df test was employed assuming an underlying multiplicative genetic mode-of-inheritance risk model. The specific genotype cell probabilities for the cases and controls are shown above. For Supplemental Table

order to compute power using the general 2-df test, we carried out the CE for 2 x m tables where m = 3 columns / categories (genotypes dd, Dd, and DD) and the rows

pertained to the cases and controls, using the R script costeff2by3 provided by Buksz´ar

and van den Oord (http://www.vipbg.vcu.edu/ edwin/). Contrary to the 1-df test, closed form analytical formulae did not exist for the 2 x 3 tables, though numerical solutions were computed with the costeff2by3 R code.

Dominant 1-df Test

For Supplemental Tables 2.8 and 2.11, the dominant 1-df test was employed assuming an underlying dominant genetic mode-of-inheritance risk model. The specific genotype cell probabilities for the cases and controls are shown above. For Supplemental Table

2.8, fD was set to 0.3 (as with the multiplicative model), though the GRR was set to

1.4. In order to compute power using the dominant 1-df test, we carried out the CE for 2 x m tables where m = 2 columns / categories (genotypes dd and Dd, or DD, i.e. the Dd and DD genotype columns were combined) and the rows pertained to the cases and controls. The power for critical value c (corresponding to the 1 - type I error) of a

central chi-square distribution was (Buksz´ar and van den Oord, 2006a)

(2.17)

where λ was the largest eigenvalue of matrix J (discussed by Buksz´ar and van den

Oord) and F_χ2 was the cdf of the non-central chi-square distribution with 1 degree of

freedom and non-centrality parameter

where n was the total sample size, p was the proportion of controls in the total sample, q = 1 - p was the proportion of cases in the total sample, and subscripts 1 and 2 referred to the two genotype categories. These computations were carried out with the R script

costeff2by2 provided by Buksz´ar and van den Oord (http://www.vipbg.vcu.edu/ ed-

win/).

Recessive 1-df Test

For Supplemental Tables 2.9 and 2.12, the recessive 1-df test was employed assuming an underlying recessive genetic mode-of-inheritance risk model. The specific genotype cell probabilities for the cases and controls are shown above. For Supplemental Table

2.9, fD and GRR were set to 0.5 and 1.45, respectively. The power calculations were

performed in the same manner as the dominant 1-df test detailed above though the 2 x 2 table was constructed differently, namely, the dd and Dd columns were merged.

In document Novel statistical methods for the study design and analysis of genome-wide association studies (Page 66-76)