• No results found

Evaluation of the Proposed Method Using Simulation Studies

Chapter 3: Variance Components Analysis-Based Approaches on Pleiotropic Effects

3.3 Analysis of Pleiotropic Effect Using Genetic Correlation

3.3.2 Evaluation of the Proposed Method Using Simulation Studies

We conducted simulation studies to investigate the performance of VC-based genetic correlation approach. One thousand data sets were simulated using the ‘simqtl’

command in the statistical software package ---- Sequential Oligogenic Linkage Analysis Routines (SOLAR) (http://www.vipbg.vcu.edu/software_docs/solar/doc/index.html) that generated the marker and the phenotypic data. Two normally distributed quantitative traits and a di-allelic SNP with 10% or 50% minor allele frequency (MAF) were generated using 1,000 uncorrelated trios (2 parents and a child). The simulation designs were based on: residual polygenic correlation ( ) of 0.1, 0.6 and 0.9; environmental

correlation ( ) of 0.0, 0.6 and 0.9; and residual heritability ( ) of 0.4. Seven SNP heritabilities were considered ( 0%, 0.5%, 1%, 2%, 3%, 10% and 30% giving the size of effect for each SNP as 0, 0.1, 0.1414, 0.2000, 0.2450, 0.4472 and 0.7746 units, and the standard deviation for each SNP as 1, 0.998, 0.995, 0.990, 0.985, 0.949 and 0.837, respectively). Twenty pairs were set for the bivariate traits with different SNP heritability combinations (Table 3.1). We used SOLAR’s polygenic analysis function for multivariate models to conduct our VC-based analyses.

The term, t0_t0 denotes a SNP that has no effect on either trait and t0_t denotes a SNP that has 0% of effect on trait1 but % of effect on trait2, a circumstance with no pleiotropy. t _t denotes a SNP with % effect on trait1 and % effect on trait2, indicating the existence of a SNP-specific pleiotropic effect.

Table 3. 1 Pairs of SNP effects on bivariate traits in simulations of VC analysis

No pleiotropy (No effects on T1 and T2) t0_t0

No pleiotropy (No effect on T1) t0_t05, t0_t1, t0_t2, t0_t3, t0_t10, t0_t30

Pleiotropic effect on both T1 and T2

t05_t05, t05_t1, t05_t2, t05_t3, t1_t1, t1_t2, t1_t3, t2_t2, t2_t3, t3_t3, t10_t10, t10_30, t30_t30

*t _t denotes SNP with % effect on trait1 and % effect on trait2.

Simulated data from each sample were analyzed using VC analysis by the

“polygenic” command in SOLAR to obtain the genetic correlation. The difference of genetic correlations from models excluding the SNP effect (3.12) and including the SNP effect (3.13) was computed as the contribution of SNP to the polygenic pleiotropic effect.

The percent bias (%) was calculated as

( ) ,

which was used to indicate the performance of the methods being assessed, provided the true value does not equal to zero (Burton et al. 2006). Estimates with percent bias <10%

are considered as acceptable.

The whole procedure of simulation can be summarized as follows:

Simulation Steps:

Step1: Generate 1,000 uncorrelated trios (2 parents and a child);

Step2: Simulate a dataset with two normally distributed quantitative traits and a di-allelic SNP using the pedigree structure in Step1. The simulation designs include 20 SNP heritability scenarios, 3 residual polygenic correlation conditions and 3 environmental correlation conditions;

Step3: Conduct bivariate VC analysis for a polygenic model excluding the SNP effect as a predictor variable. Estimate the polygenic correlation ( ̂ );

Step4: Conduct bivariate VC analysis for a polygenic model including the SNP effect as a predictor variable whose genotypes were fixed and observable. Estimate the residual polygenic correlation ( ̂ );

Step5: Calculate the contribution of the SNP to the polygenic pleiotropic effect by : ̂ ̂ ̂ ;

Step6: Repeat Steps 2-5 1,000 times for 1,000 replicates;

Step7: Compute the mean of ̂ ̂ as ̂ .

Next, 500 resamples were randomly drawn with replacement from each replicate by sampling from 1,000 uncorrelated trios with the same bootstrap sample size. VC-based analyses were performed for these bootstrap resamples. The intervals between the 2.5% and 97.5% percentiles of the bootstrap distribution of the difference of genetic correlations were used for statistical inference by examining if zero was included in these

CIs. We evaluated type I error and power by determining the proportion of CIs containing zero in these 1,000 replicates. If the CI does not contain zero then we would reject a null hypothesis of zero at the 5% significance level, allowing we to evaluate type I error. We used Bradley’s criterion for determining inflated versus conservative type I error rates (Bradley 1978). The fraction of rejections above 0.055 is termed as “inflated”, whereas an empirical value below 0.045 is termed as “conservative” for a nominal .

The genetic correlation approach using bootstrapping in simulation method can be implemented according to the following steps:

Bootstrap Steps:

Step1: Use the same 1,000 simulated data from the simulation procedure;

Step2: Draw 1,000 trios from replicate 1 with replacement;

Step3: Compute ̂ ( ) using ;

Step4: Repeat Steps 2-3 B=500 times. With a large number of new samples, generate an empirical sampling distribution for : { ̂ ( ) ̂ ( )};

Step5: Compute the mean of ̂ ( ) ̂ ( ) as ̂ ;

Step6: Construct the 95% confidence interval from the empirical sampling distribution of ̂ ( ) ̂ ( ) as ( ) ;

Step7: Repeat Steps 2-6 1,000 times for 1,000 replicates, resulting in 1,000 confidence intervals ( ) ( ) ;

Step8: Determine if these confidence intervals cover 0 or not, and calculate the proportion of times the 95% confidence interval excludes 0;

Step9: Compute the mean of ̂ ̂ as ̂ .

The polygenic VC analysis using SOLAR is costly and time consuming. The average time for accessing the storage and computing the analysis requires 2 minutes for one process on an Advanced Micro Devices (AMD) based compute cluster which includes one head node (two quad-core 2.7 GHz 64-bit Opteron processors and 16 GB of RAM), 32 servers (two 1.8 GHz 64-bit Opteron processors and 2 GB RAM each) and 56 blades (two dual-core 2.6 GHz 64-bit Opteron processors and 12 GB RAM each (one compute node has 32 GB of RAM)). In order to calculate ̂ , we need 2 processes for the full model and reduced model. The total time required to perform VC analysis for bootstrapping depends on the simulation size and bootstrapping size. We performed 1,000 simulations and 500 bootstrapping in each simulation. We can run 20 processes simultaneously on many compute nodes. A formula that give an approximate estimate of the time is:

( ) ( ) ( ) ( ) ( ) ( ) ( )

It takes 69 days for each SNP heritability scenario. Thus, it is an incentive to conduct the bootstrapping only for selected scenarios. We limit our discussion to nine

scenarios with zero, low, medium, or high SNP effect on one or two traits, respectively (t0_t0, t0_t3, t0_t10, t0_t30, t1_t1, t3_t3, t10_t10, t10_t30 and t30_t30).