MATERIALS AND METHODS
2.4 STATISTICAL ANALYSIS
2.4.1 Parametric linkage analysis
The LINKAGE package (Lathrop & Lalouel, 1984, Lathrop et a l, 1984)
includes computer programs which enable the calculation of lod scores at various
positions and which are conditional of transmission parameters of the disease locus,
the relevant allele frequencies and the genetic distance between the disease locus and
the genetic markers. For this study, the linkage analysis was carried out using the
computer program MLINK. For the analyses carried out assuming dominant
transmission the gene frequency of the abnormal allele was set to 0.0085 and the
heterozygote penetrance was set to the same value as for the abnormal homozygote.
For analyses carried out assuming recessive transmission the gene frequency of the
abnormal allele was set to 0.13 and the heterozygote penetrance was set to the same
value as for the normal homozygote. The penetrance values used were 0.4 and 0.6 for
the DOMS and DOMSS models respectively. Sporadic cases were accounted for by
In order to investigate the possibility that only a subset of pedigrees might
have a susceptibility locus in the region studied, admixture test was also carried out
(Ott, 1999). This is reported as the lod2 statistic in the presence of an estimated
portion (alpha) of families linked (Risch, 1989).
2.4.2 ^^Model-free” linkage analysis
“Model-free” linkage analysis was carried out using the computer program
MFLINK (Curtis & Sham, 1995). This analysis is model-free in the sense that the
transmission model parameters are not fixed in advance (as in classical lod score
method) but are freed in both the numerator and denominator of the likelihood ratio.
MFLINK calculates the likelihood of the data with the disease locus at a given map
position using a range of different dominant and recessive transmission models, all
yielding the same disease prevalence (Kp) and parameterized using a single variable,
the heterozygote penetrance (fr, which is varied from 0 to 1).
The likelihood for the observed data under the hypothesis that a locus at a
particular test position influences susceptibility in a proportion of families and under
the hypothesis that it has no effect are compared. Both likelihoods are maximized
independently over the heterozygote penetrance, fi, and the likelihood under linkage
is additionally maximized over alpha (the proportion of families linked). The
difference between the two maximized log likelihoods provides the “model-free” lod
score (MFLOD) for the position tested.
MFLINK also reports the maximum lod scores under the assumptions of
homogeneity (MLOD) and admixture (MALOD), which are obtained over disease-
In this study, MFLINK analyses were carried out using both the core and
spectrum affection models, with population prevalence being set to 0.011 or 0.019
respectively, to match the prevalences used in the lod score analyses. Two-point
analyses were carried out with each marker using a test position at a recombination
fraction o f 0.05 with the marker.
2.4.3 Allelic association analysis using CLUMP
The data obtained from the case-control allelic association studies was analysed
using the CLUMP program (Sham & Curtis, 1995b). It is a program designed to
assess the significance of the departure of observed values from the expected values
under the null hypothesis for a contingency table conditional on marginal totals.
The chi-squared statistic (j^) associated with a contingency table is defined in the
usual way as the sum over all cells of the squared difference between observed and
expected value divided by the expected value. The expected value for a cell is the
total for its row multiplied by the total for its column divided by the overall total
number of observations.
The four statistics produced by CLUMP are:
(i) T l, a straightforward Pearson’s statistic of the ‘raw’ contingency table,
(ii) T2, the statistic of a table with rare alleles grouped together to prevent small
expected cell counts,
(iii) T3, the largest of the statistics of 2x2 tables each of which compares one allele
against the rest grouped together and finally
(iv) T4, the largest of the statistics of all possible 2x2 tables comparing any
The significance of each of the values is evaluated using a Monte Carlo
approach, by performing repeated simulations to generate tables having the same
marginal totals as the one under consideration. The proportion of times the value
produced by the real data is reached yields an estimate of the significance of the
departure of the observed data fi*om the expectation under the null hypothesis. In this
way an empirical p value is produced.
This method obviates problems related to dealing with sparse contingency tables or to correcting for testing of multiple alleles.
One thousand simulations were performed for each analysis.
2.4.4 Pair-wise linkage disequilibrium calculations using EH
Pair-wise linkage disequilibrium was calculated between each marker locus
using the computer program EH (Terwilliger & Ott, 1994). Maximum likelihood
values for the haplotype frequencies for each pairwise combination of alleles are
estimated under Ho (no allelic association) and Hi (allelic association). A ^ statistic
is produced, which is the difference in 2 In(likelihood) and therefore pairwise p-