CHAPTER TWO Materials and Methods
2.2.18 A ssociation analysis
Statistical methods and formulae were taken from Bland (1987) and Gardner and Altm an (1997). Probability (p) values were determ ined from tables of the ^ distribution for v degrees of freedom and the significance level ( a ) of 0.05 was used in all tests.
Calculation o f allele frequencies: For a sample with a
hom ozygous individuals of one type, p heterozygous individuals and y
hom ozygous individuals of the other type, the allele frequencies p and q w ere calculated as
p = (2a+ p)/2n and q=(2y+P)/2n where n=a+p+y.
95% confidence intervals, for example for p, are given as p -(1 .9 6 x S E ) to p+(1.96xSE), where SE is the standard error calculated as S E = V p(l-p)/2n.
Com parison o f allele frequencies: Allele frequencies were com pared either by using the 2by2 program of the HGM P linkage utilities package, or by calculating the standard error of the difference in allele frequencies, SEjaf.
The 2by2 program perform s a Fisher's exact test (Bland, 1987) on a 2x2 table and calculates a one-sided p value. The 2x2 table is constructed using the num bers o f each o f the alleles in the populations compared.
The standard error of the difference in allele frequencies is calculated using the form ula
SEdaf=Vpi(l—pi)/2ni -I- p2(l-p2)/2U2
w here p] and p2 are the allele frequencies com pared and ni and n2 the num ber o f individuals in the populations. The difference in the allele frequencies (DAF) is:
D AF = p i—p2 [95%CI pi—p2~(1.96x
SE(jaf)
to p i—p2+(1.96xSE^af)]
If the lower lim it is zero, the difference is not significant.Fitness to H ardy-W einberg equilibrium: samples were checked for fitness to the Hardy-W einberg equilibrium by calculating expected
genotype frequencies from the allele frequencies. If p and q are the allele frequencies in a sample , p^ and q^ are the expected frequencies o f the two hom ozygous types and 2pq the expected frequency of the heterozygotes. By m ultiplying p^, q^ and 2pq by the num ber of individuals in the Î sample , the expected num ber of individuals of each genotype is calculated. Fitness to Hardy-W einberg equilibrium is tested by com paring observed (obs) and expected (exp) numbers of individuals of each genotype with a sim ple test, w here %^=X[obs-exp]^/exp. Probability p is determ ined for 1 degree of freedom.
C ontingency tables: Contingency tables were used to com pare the distribution o f genotypes in different samples . Tables were constructed with the num bers of individuals of each genotype. The total of each row and colum n and the grand total of the table were calculated and used to calculate the expected genotype frequencies for each cell o f the contingency table as:
row total X colum n total grand total
A value is calculated as the sum of the difference betw een observed and expected values for each cell, %^=Z[obs-exp]^/exp. The p value is determ ined for (number of r o w s - 1 )x(num ber of co lu m n s-1) degrees o f freedom . The test is only valid if at least 80% of the expected frequencies exceed 5 and all the expected frequencies exceed 1 (Cochran criterion; Bland, 1987). If the criterion is not satisfied, rows and columns are deleted or com bined to give bigger expected values.
Transm ission disequilibrium test (TDT): The TD T is a pow erful association test, which is not affected by population stratification and can be used on multiplex and simplex families. In the TD T the transm ission o f alleles from heterozygous parents to their affected offspring is evaluated. U nder the hypothesis o f no association, an equal num ber o f transm issions o f the two variant alleles is expected. The TDT is perform ed using the form ula
X^td-(b-c)^/ (b+c)
w here b is the total num ber of transmissions o f one allele to affected offspring and c is the total num ber of transmissions of the variant allele. The p value for one degree of freedom is read from a distribution table. The TD T can be used in the same way to test for meiotic segregation distortion, by evaluating the transm ission to unaffected offspring.
For the haplotype TDT, the transmission o f each haplotype is evaluated separately. All parents heterozygous for a particular haplotype are considered, irrespective of the type of the other haplotype available for transm ission, b is the num ber o f transm issions of a particular haplotype and c is the num ber of times this haplotype was not transmitted.
A ssociate program : The associate program of the H G M P linkage utilities package was used to detect linkage disequilibrium betw een
polym orphism s and to test for association between markers in the same or different genes. The program uses the numbers of individuals w ith each
com bination o f genotypes to estimate allele frequencies and, assum ing random association, expected haplotype frequencies. D, the linkage disequilibrium param eter, is the difference between observed and expected haplotype frequencies. D ' is the norm alised linkage disequilibrium param eter, is independent of the allele frequencies and is calculated as D /Dm ax, w here D m ax is the lesser of p%q2 and p2qi (pi and p2 are frequencies o f the com m on alleles at the two loci and qi and q2 the frequencies of the rare alleles).
In the investigation of interactions betw een polym orphism s (section 3.6), the “other associations” values in table 3.15 refer to causes o f phenotypic association other than allelic association, for example, clustered sampling or adm ixed populations. T he expected numbers of individuals w ith a particular genotype are calculated from the allele frequencies, assum ing random
association. For example, the expected num ber o f individuals hom ozygous for the com m on allele at one loci and heterozygous for the other loci is
p]^ X 2 p 2q 2 X total num ber of individuals.