Materials and methods
2.7 Statistical methods II
2.7.1 Single marker association analyses
After QC filtering, single marker association and conditional data were generated using a case-control format and the continuous covariate function in SNPTEST v2 under the additive model (Marchini., 2010). A frequentist statistical paradigm and a probabilistic method was used to treat genotype uncertainty. A logistic regression model which was additive on the log-odd scale was used to evaluate TNFSF4 variants. Under this model, the score test, an asymptotic test of hypothesis, was used to test association of the variants for the binary phenotype (case, control or phenotype-control), under the null hypothesis. For non-imputed variants and the high certainty imputed SNPs with info>0.7 that were included in analysis post QC, the test statistic reduced to the Cochran-Armitage trend test statistic. The score test was presumed to produce a sensible result since the validity of the quadratic function (of the log likelihood curve) was not undermined by small sample size, low allele frequency or increasing genotype uncertainty. The trend test exploited the suspected effect direction to increase power to detect association.
To preserve the type 1 error, the variance of the score test was adjusted using genomic control to control for inflation. GC was calculated on null loci to estimate variance. Association was computed at each of the null SNPs, and λ calculated as the empirical median, divided by its expectation under the χ2 distribution (Balding., 2006). The association was then computed for candidate SNPs, where they reached λ > 1, the test statistics were divided by λ, testing required 2 df. The quantile of the score test statistic was interpreted by calculating
76
p-values. An arbitrary locus-wide significance level for rejecting the null hypothesis was set at P=5 x10-5. Odds ratios (OR) with 95% confidence intervals (95% CI) were taken from the exponent of the beta coefficient of the logistic regression model together with the standard errors. Significance of association of corrected p-values were based on permutation testing (5000 permutations). Data are represented as nominal uncorrected p-values and permuted (Pp) values.
Per SNP significance level α′ should satisfy α = 1 − (1 − α′), leading to the Bonforroni correction n α′ ≈ α / n for independent variants tested under a statistical paradigm. However, this correction was judged conservative for the genotyped variants at TNFSF4, many of which exhibited high LD (r2>0.7).
Instead, the type-1 error was approximated by a permutation procedure. Case- control status was randomised x5000 whilst maintaining the LD structure of variants for each dataset, to satisfy the null hypothesis, in order to estimate the false-positive rate (Balding., 2006).
Technical details of the aforementioned score test used in SNPTEST v2 are found at URL:http://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.v2.pdf
2.7.2 Meta-analysis
A logistic regression model fitted with an interaction term (effect) in the R statistical package was used to investigate cross-study heterogeneity. P-values for individual associated SNPs were generated using the likelihood-ratio test.
Rs1234314, rs1234317, rs2205960, rs12039904, and rs10912580 were selected as representatives for this test. I implemented a fixed effects meta-analysis method combining the association results for African-Americans, East Asians, Europeans and Hispanics to more powerfully estimate the true effect size, the results of these analyses are described in Table 4.6, chapter 4. The average effect size across all datasets was computed using inverse variance weighting of each
study. SNPs were organised into two categories (TNFSF4 gene or 5´ region) and are highly correlated with one another (r2 > 0.7) within each group. Associated SNPs from African-American cohort were tested for heterogeneity, and included in the meta-analysis where the associated allele was the same.
2.7.3 Haplotype bifurcation
The Long Range Haplotype (LRH) test was used to investigate common alleles with long-range linkage disequilibrium (LD): I was able to represent the breakdown of the risk and non-risk haplotypes. TNFSF4risk and TNFSF4non-risk were anchored by a core associated marker, rs1234314, in all groups and conveniently positioned at the boundary of the TNFSF4 gene and 5′ region, also at the boundary of two haplotype blocks. Haplotype bifurcation diagrams were then generated in the program Sweep™.
2.7.4 Haplotype association and conditional regression
Haplotypes in the TNFSF4 gene and 5′ region were constructed in Haploview 4.2 using a custom algorithm, based on the r2 measure of linkage disequilibrium (LD). Markers and haplotypes with frequencies greater than 5% and 4%
respectively, were included in the analyses. Haplotypes were anchored using tag SNP genotype data and boundaries were inferred using recombination data. SLE case-control association and step-wise conditional logistic regression data for each haplotype was generated in PLINK, as were OR (95% CI) these are represented as nominal uncorrected p-values and x5000 permuted (Pp) p-values.
2.7.5 Sub-phenotype association
Searching for TNFSF4 alleles linked to specific clinical manifestations of lupus may prove informative with regards to mechanism and so better resolve causal alleles because of greater genetic homogeneity compared to the disease per se:
78
Phenotypes amenable to study, and relevant to the biologic function of TNFSF4 in lupus, are described in section 1.1.2 of chapter 1. TNFSF4 variants were tested for association using the same methods described in section 2.7.1 against interquartile age at diagnosis using a case-only format. The variants were also tested against leukopenia and lymphopenia, anti-La, anti-Ro and anti-Sm autoantibody subsets, which are associated with SLE, together with renal disease, using both case-only and phenotype-control formats. A covariate for the most associated marker per aforementioned phenotype was included for each population to investigate independent effects. The SNPTEST v2 program was used for these tests.