• No results found

The Data Generation Process

In document Methods-Statistical-Genomics.pdf (Page 36-40)

Specifically the steps were:

1. Preload the details that define the factor combinations for each MOI category. The factors are specified in Table 3.1.

Table 3.1. Factors that define a synthetic gene data file

Factor Symbol Number of factors

Sample size N NCC

Penetrance P NP

Phenotype error rate Perr NY

Genotype error rate Gerr NX

Relative genetic risk Φ NGR

Relative environmental risk Π NER

2. Draw a genotype distribution at random from the master set of genotype distributions obtained from real distribution data (i.e., the study by Schymick et al.2). At this stage, Chan et al. recommends that a minor

allele frequency (MAF) threshold not be applied.3 They argue that filtering

MAFs out of the process because of low frequencies or to maintain Hardy– Weinberg equilibrium (HWE) deviation has little effect on the overall false positive rate and in some cases, filtering MAF only serves to exclude SNPs. This step effectively selects a specific genotype distribution (at random) from the master distribution.

3. Use Table 3.2 to assign a case (1) or a control (0) based on the selected genetic relative risk (Φ), penetrance (P) and MOI category. This step converts the Φ ratio value into the probability that the case occurs for the MOI gene model of interest. This process is represented by the following logic that was derived from Iles1:

Major Homozygote (AA). Assume that the AA genotype is selected. The

probability of a case given this selection is equal to the disease penetrance P, or ΨAA = P.

Minor Homozygote (aa): Liability Increasing Allele. Assume the aa genotype is

Creating the Synthetic Gene Data 33

probabilities: the probability of a case for a minor homozygote divided by the probability of a case for a major homozygote, or

Ψaa = Prob(case/aa) / Prob(case/AA) = x/P. (3.1)

From (3.1) the probability of a case given the minor genotype = x = Ψaa × P, (3.2) where Ψaa = one of the assigned risk factors and P is one of the assigned penetrance factors.

Heterozygote (aA). Assume the aA genotype is selected. By the same

argument, the phenotype risk given a heterozygote is:

ΨaA = Prob(case/aA) / Prob(case/AA) = y/P. (3.3)

By the same argument, the risk of a case given the heterozygote is

y = ΨaA × P, (3.4)

where ΨaA = one of the assigned risk factors and P is one of the assigned penetrance factors.

Using the estimate of x and y, assign a case or control at random using the four different MOI models in conjunction with equations (3.2) and (3.4) and Table 3.2. We assigned cases in proportion to x (y) and controls in proportion to 1­ x (1­y) for the minor homozygote (heterozygote) genotypes respectively. For the MOI models that assume an elevated risk from the minor and the hetero genotypes, we would expect a higher proportion of cases to be more easily identified via the statistical procedures. The specification of risk depends on specific and unknown disease mechanisms. A relative risk of 1.7 is considered strong and is associated with positive replication,4 and a risk of 1.3 is considered

by Ziegler et al.5 to be a realistic assumption for complex diseases. In

summary, individuals are either assigned as cases or controls according to the probabilities given in Table 3.2.

4. Systematically select subjects. If the subject is a case (control), change its phenotype designation to a control (case) at a rate determined by Perr. 5. Systematically select subjects. If the genotype is a disease (nondisease) allele

change the allele to a nondisease allele (disease) allele at a rate determined by Gerr.

6. Continue with the previously described process until n1 cases and n2

controls (N = n1 + n2) are generated (note that n1 and n2, are not required to be equal).

7. Apply a set of statistical methods to predict associations and record the results.

8. Generate NR (typically NR = 1,000) replicate experiments for each factor combination.

9. Analyze the data.

Table 3.2. Relative risk assumptions, by mode of inheritance

Major homozygote Minor homozygote Heterozygote

Inheritance model ΨAA Ψaa = Pr(case/aa) Pr(case/AA) ΨAa = Pr(case/aA) Pr(case/AA) Recessive 1 Φ 1 Dominant 1 Φ Φ Additive 1 2 × Φ-1 Φ Multiplicative 1 Φ × Φ Φ Source: Iles.1

• Ψaa is the relative risk of homozygous minor to homozygous major. • ΨaA is the relative risk of heterozygote to homozygous major. Figure 3.1 presents a flow description of the data generation process.

Creating the Synthetic Gene Data 35

Figure 3.1. Schema of data-generation process

No Yes

Loop Penetrance = p = Pr (Case/AA) = Low, Med, High

Number of cases = N1

Number of controls = N2 Phenotype error rates = Perr Genotype error rates = Gerr

Risk rates = φ

Compute master genotype distributions for each SNP G = number of wild type alleles

Select a G:Genotypedistribution

Select a genotype value g = (0,1,2) at random fromG With g, p, φ, MOI Compute case (1) or control (0)

using Table 2

Alter genotype from error rate(0 to 1, 1 to 2, 2 to 0)

Record databaseentry

Completedloops? End

Preload factorvalues

Low, Med, High Low, Med, High Low, Med, High Low, Med, High Low, Med, High Low, Med, High

Loop components: P MOI N1 N1 Perr Gerr Φ Replicates

MOI G = 0 (AA) G = 2 (aa) G = 1 (Aa)

ΨAA Ψaa= Pr(case/aa) Pr(case/AA) ΨAa= Pr(case/aA) Pr(case/AA) R 1 Φ 1 D 1 Φ Φ A 1 2 × Φ-1 Φ M 1 Φ × Φ Φ

In document Methods-Statistical-Genomics.pdf (Page 36-40)