4. Discussion
4.2.4. Concurrent developments in association analysis for common complex disease
4.2.4.1. Extent of LD
Recent estimations of the extent to which detectable LD extends away from a common
disease susceptibility locus vary. A population simulation of a common disease variant
arising prior to the expansion of the human population suggested that detectable LD
would extend only 3kb from the variant (Kruglyak, 1999). On the other hand, a genetic
epidemiological study comparing the results from several common disease association
studies suggested that detectable LD extends beyond lOOkb from common disease loci,
possibly because of subsequent bottlenecks following population expansions (Collins et ciL, 1999). In the present candidate gene association studies, the denser estimation of required marker density was used to thoroughly test each gene for association to IBD.
In each study, inter-marker LD extended across the entire region (Figures 3.26. - 3.27.),
with the largest region being approximately 60kb. These results support the later theory
of a larger extent of LD. Similarly, other recent studies have also observed larger than
in a European American population (McMahon et ciL, 2000). The extent of LD varied, with the largest region of detectable LD being approximately 450kb.
The existence of large regions of inter-marker LD does not, however, suggest that
positive association can be detected with a marker set of lower density. One study
showed that even within large regions of inter-marker LD, the extent of LD required to
detect positive association declined at a rate such that beyond lOkb from the disease
variant, positive association would not be detected (Durocher et a l , 2000). Instead, studies have shown that LD between polymorphisms varies and is not solely a factor of
distance. Two polymorphisms separated by a few bases can be in almost equilibrium
while two other polymorphisms separated by IcM can be in strong LD (Mohike et al.,
2000; Pakstis et al., 2000). Therefore, it would seem that a better association analysis strategy would be to genotype all polymorphisms showing incomplete disequilibrium
within a candidate gene region rather than just one polymorphism every n \ b . In agreement, Risch et al. (2000) have shown that negative association does not exclude a
significant gene effect in the region and that the genome-wide random polymorphism
approach, even at high density, would miss many disease-causing genes.
4.2.4.2. Population choice
Previously, it was thought that isolated and founder-effect populations were the best
populations to choose for genetic disease association studies. It was believed that large
regions of LD were recently created by novel disease mutation within the isolated
populations and bottlenecks experienced by the founder-effect populations. However, it
has recently been shown that common disease variants probably existed before the
Models of these populations show that the marker-variant LD within these populations
would be the same as within the general population (Kruglyak, 1999).
It has now been suggested that European American populations, of constant size or
inbred which have experienced recent admixture, may be better suited for common
disease association studies (Wright et al., 1999). This is for a number of reasons. Firstly, LD has repeatedly been shown to increase from Africa and India through
Europe and East Asia to America due to the migration out of Africa and the
accumulation of random genetic drift at the front of the expansion (Jorde e t a l , 2000; Pakstis et a i , 2000). Secondly, recent populations undergoing admixture after long periods of separation (i.e. African American, Mexican American, etc) may still have
sufficient LD remaining for detecting common disease association. Thirdly,
populations that have been of a constant size rather than expanding, like the Saami in
northern Fenno-Scandinavia, show greater extent of LD (Laan and Paabo, 1997).
Finally, inbred populations show extended regions of LD because of increased
homozygosity. Overall, because common disease variants probably arose prior to the
expansion of the human population, all populations have had the same amount of time
to decrease LD. Yet, because of events within certain populations that either decreased
the breakdown of LD or increased LD, certain populations may require fewer markers
to detect the same common disease association.
Still, it should be noted that the gains made by finding such a disease population might
easily be removed by lack of power due to the small size of such populations. Thus,
studies using large random American European populations may ultimately prove to be
In the present study it was a small European Caucasian population that resulted in
positive association to CCR5 (Section 4.2.1.1.). If instead, as commented throughout
this chapter, a population of greater power had been used for linkage and association
analysis, results of higher significance and greater certainty may have been detected.
4.2.4.3. Biallelic marker allele frequency choice
In has recently been shown that for association analysis it is optimal for the allelic
frequencies of the biallelic polymorphisms chosen for genotyping to match the disease
variant frequency (Muller-Myhsok and Abel, 1997). However, because for a common
disease it is not known how many disease loci are present and how frequent the
respective variants are, it is impossible to know what frequency biallelic polymorphisms
to use for association analysis. Instead, it has been suggested that biallelic markers with
a range of allele frequencies from 0.25-0.75 should be used for analysis within a disease
region or across a candidate gene (Kruglyak and Lander, 1996). This would allow for
the frequent and most common variants to be detected with two-point association
analysis, while the less frequent variants could still be detected by haplotype association
analysis.
In the candidate gene association studies reported here across GNA12, CCR2, CCR5,
and CCRL2, allele frequencies between 0.10-0.90 were used. Both two-point
association analysis and haplotype association analysis were conducted. Thus, if the
associations detected with both CCR2 and CCR5 are real, the CCR5 variant with
stronger two-point association may be a more common disease variant and the CCR2
variant with stronger haplotype association may be a less common disease variant