Specifically the steps were:
1. Preload the details that define the factor combinations for each MOI category. The factors are specified in Table 3.1.
Table 3.1. Factors that define a synthetic gene data file
Factor Symbol Number of factors
Sample size N NCC
Penetrance P NP
Phenotype error rate Perr NY
Genotype error rate Gerr NX
Relative genetic risk Φ NGR
Relative environmental risk Π NER
2. Draw a genotype distribution at random from the master set of genotype distributions obtained from real distribution data (i.e., the study by Schymick et al.2). At this stage, Chan et al. recommends that a minor
allele frequency (MAF) threshold not be applied.3 They argue that filtering
MAFs out of the process because of low frequencies or to maintain Hardy– Weinberg equilibrium (HWE) deviation has little effect on the overall false positive rate and in some cases, filtering MAF only serves to exclude SNPs. This step effectively selects a specific genotype distribution (at random) from the master distribution.
3. Use Table 3.2 to assign a case (1) or a control (0) based on the selected genetic relative risk (Φ), penetrance (P) and MOI category. This step converts the Φ ratio value into the probability that the case occurs for the MOI gene model of interest. This process is represented by the following logic that was derived from Iles1:
Major Homozygote (AA). Assume that the AA genotype is selected. The
probability of a case given this selection is equal to the disease penetrance P, or ΨAA = P.
Minor Homozygote (aa): Liability Increasing Allele. Assume the aa genotype is
Creating the Synthetic Gene Data 33
probabilities: the probability of a case for a minor homozygote divided by the probability of a case for a major homozygote, or
Ψaa = Prob(case/aa) / Prob(case/AA) = x/P. (3.1)
From (3.1) the probability of a case given the minor genotype = x = Ψaa × P, (3.2) where Ψaa = one of the assigned risk factors and P is one of the assigned penetrance factors.
Heterozygote (aA). Assume the aA genotype is selected. By the same
argument, the phenotype risk given a heterozygote is:
ΨaA = Prob(case/aA) / Prob(case/AA) = y/P. (3.3)
By the same argument, the risk of a case given the heterozygote is
y = ΨaA × P, (3.4)
where ΨaA = one of the assigned risk factors and P is one of the assigned penetrance factors.
Using the estimate of x and y, assign a case or control at random using the four different MOI models in conjunction with equations (3.2) and (3.4) and Table 3.2. We assigned cases in proportion to x (y) and controls in proportion to 1 x (1y) for the minor homozygote (heterozygote) genotypes respectively. For the MOI models that assume an elevated risk from the minor and the hetero genotypes, we would expect a higher proportion of cases to be more easily identified via the statistical procedures. The specification of risk depends on specific and unknown disease mechanisms. A relative risk of 1.7 is considered strong and is associated with positive replication,4 and a risk of 1.3 is considered
by Ziegler et al.5 to be a realistic assumption for complex diseases. In
summary, individuals are either assigned as cases or controls according to the probabilities given in Table 3.2.
4. Systematically select subjects. If the subject is a case (control), change its phenotype designation to a control (case) at a rate determined by Perr. 5. Systematically select subjects. If the genotype is a disease (nondisease) allele
change the allele to a nondisease allele (disease) allele at a rate determined by Gerr.
6. Continue with the previously described process until n1 cases and n2
controls (N = n1 + n2) are generated (note that n1 and n2, are not required to be equal).
7. Apply a set of statistical methods to predict associations and record the results.
8. Generate NR (typically NR = 1,000) replicate experiments for each factor combination.
9. Analyze the data.
Table 3.2. Relative risk assumptions, by mode of inheritance
Major homozygote Minor homozygote Heterozygote
Inheritance model ΨAA Ψaa = Pr(case/aa) Pr(case/AA) ΨAa = Pr(case/aA) Pr(case/AA) Recessive 1 Φ 1 Dominant 1 Φ Φ Additive 1 2 × Φ-1 Φ Multiplicative 1 Φ × Φ Φ Source: Iles.1
• Ψaa is the relative risk of homozygous minor to homozygous major. • ΨaA is the relative risk of heterozygote to homozygous major. Figure 3.1 presents a flow description of the data generation process.
Creating the Synthetic Gene Data 35
Figure 3.1. Schema of data-generation process
No Yes
Loop Penetrance = p = Pr (Case/AA) = Low, Med, High
Number of cases = N1
Number of controls = N2 Phenotype error rates = Perr Genotype error rates = Gerr
Risk rates = φ
Compute master genotype distributions for each SNP G = number of wild type alleles
Select a G:Genotypedistribution
Select a genotype value g = (0,1,2) at random fromG With g, p, φ, MOI Compute case (1) or control (0)
using Table 2
Alter genotype from error rate(0 to 1, 1 to 2, 2 to 0)
Record databaseentry
Completedloops? End
Preload factorvalues
Low, Med, High Low, Med, High Low, Med, High Low, Med, High Low, Med, High Low, Med, High
Loop components: P MOI N1 N1 Perr Gerr Φ Replicates
MOI G = 0 (AA) G = 2 (aa) G = 1 (Aa)
ΨAA Ψaa= Pr(case/aa) Pr(case/AA) ΨAa= Pr(case/aA) Pr(case/AA) R 1 Φ 1 D 1 Φ Φ A 1 2 × Φ-1 Φ M 1 Φ × Φ Φ