STATISTICAL ANALYSIS - MATERIALS AND METHODS

MATERIALS AND METHODS

2.4 STATISTICAL ANALYSIS

2.4.1 Parametric linkage analysis

The LINKAGE package (Lathrop & Lalouel, 1984, Lathrop et a l, 1984)

includes computer programs which enable the calculation of lod scores at various

positions and which are conditional of transmission parameters of the disease locus,

the relevant allele frequencies and the genetic distance between the disease locus and

the genetic markers. For this study, the linkage analysis was carried out using the

computer program MLINK. For the analyses carried out assuming dominant

transmission the gene frequency of the abnormal allele was set to 0.0085 and the

heterozygote penetrance was set to the same value as for the abnormal homozygote.

For analyses carried out assuming recessive transmission the gene frequency of the

abnormal allele was set to 0.13 and the heterozygote penetrance was set to the same

value as for the normal homozygote. The penetrance values used were 0.4 and 0.6 for

the DOMS and DOMSS models respectively. Sporadic cases were accounted for by

In order to investigate the possibility that only a subset of pedigrees might

have a susceptibility locus in the region studied, admixture test was also carried out

(Ott, 1999). This is reported as the lod2 statistic in the presence of an estimated

portion (alpha) of families linked (Risch, 1989).

2.4.2 ^^Model-free” linkage analysis

“Model-free” linkage analysis was carried out using the computer program

MFLINK (Curtis & Sham, 1995). This analysis is model-free in the sense that the

transmission model parameters are not fixed in advance (as in classical lod score

method) but are freed in both the numerator and denominator of the likelihood ratio.

MFLINK calculates the likelihood of the data with the disease locus at a given map

position using a range of different dominant and recessive transmission models, all

yielding the same disease prevalence (Kp) and parameterized using a single variable,

the heterozygote penetrance (fr, which is varied from 0 to 1).

The likelihood for the observed data under the hypothesis that a locus at a

particular test position influences susceptibility in a proportion of families and under

the hypothesis that it has no effect are compared. Both likelihoods are maximized

independently over the heterozygote penetrance, fi, and the likelihood under linkage

is additionally maximized over alpha (the proportion of families linked). The

difference between the two maximized log likelihoods provides the “model-free” lod

score (MFLOD) for the position tested.

MFLINK also reports the maximum lod scores under the assumptions of

homogeneity (MLOD) and admixture (MALOD), which are obtained over disease-

In this study, MFLINK analyses were carried out using both the core and

spectrum affection models, with population prevalence being set to 0.011 or 0.019

respectively, to match the prevalences used in the lod score analyses. Two-point

analyses were carried out with each marker using a test position at a recombination

fraction o f 0.05 with the marker.

2.4.3 Allelic association analysis using CLUMP

The data obtained from the case-control allelic association studies was analysed

using the CLUMP program (Sham & Curtis, 1995b). It is a program designed to

assess the significance of the departure of observed values from the expected values

under the null hypothesis for a contingency table conditional on marginal totals.

The chi-squared statistic (j^) associated with a contingency table is defined in the

usual way as the sum over all cells of the squared difference between observed and

expected value divided by the expected value. The expected value for a cell is the

total for its row multiplied by the total for its column divided by the overall total

number of observations.

The four statistics produced by CLUMP are:

(i) T l, a straightforward Pearson’s statistic of the ‘raw’ contingency table,

(ii) T2, the statistic of a table with rare alleles grouped together to prevent small

expected cell counts,

(iii) T3, the largest of the statistics of 2x2 tables each of which compares one allele

against the rest grouped together and finally

(iv) T4, the largest of the statistics of all possible 2x2 tables comparing any

The significance of each of the values is evaluated using a Monte Carlo

approach, by performing repeated simulations to generate tables having the same

marginal totals as the one under consideration. The proportion of times the value

produced by the real data is reached yields an estimate of the significance of the

departure of the observed data fi*om the expectation under the null hypothesis. In this

way an empirical p value is produced.

This method obviates problems related to dealing with sparse contingency tables or to correcting for testing of multiple alleles.

One thousand simulations were performed for each analysis.

2.4.4 Pair-wise linkage disequilibrium calculations using EH

Pair-wise linkage disequilibrium was calculated between each marker locus

using the computer program EH (Terwilliger & Ott, 1994). Maximum likelihood

values for the haplotype frequencies for each pairwise combination of alleles are

estimated under Ho (no allelic association) and Hi (allelic association). A ^ statistic

is produced, which is the difference in 2 In(likelihood) and therefore pairwise p-

CHAPTER 3 GENETIC LINKAGE STUDIES

In document Molecular genetics of the 8p21-22 schizophrenia susceptibility locus (Page 93-97)