Concurrent developments in association analysis for common complex disease

4. Discussion

4.2.4. Concurrent developments in association analysis for common complex disease

4.2.4.1. Extent of LD

Recent estimations of the extent to which detectable LD extends away from a common

disease susceptibility locus vary. A population simulation of a common disease variant

arising prior to the expansion of the human population suggested that detectable LD

would extend only 3kb from the variant (Kruglyak, 1999). On the other hand, a genetic

epidemiological study comparing the results from several common disease association

studies suggested that detectable LD extends beyond lOOkb from common disease loci,

possibly because of subsequent bottlenecks following population expansions (Collins et ciL, 1999). In the present candidate gene association studies, the denser estimation of required marker density was used to thoroughly test each gene for association to IBD.

In each study, inter-marker LD extended across the entire region (Figures 3.26. - 3.27.),

with the largest region being approximately 60kb. These results support the later theory

of a larger extent of LD. Similarly, other recent studies have also observed larger than

in a European American population (McMahon et ciL, 2000). The extent of LD varied, with the largest region of detectable LD being approximately 450kb.

The existence of large regions of inter-marker LD does not, however, suggest that

positive association can be detected with a marker set of lower density. One study

showed that even within large regions of inter-marker LD, the extent of LD required to

detect positive association declined at a rate such that beyond lOkb from the disease

variant, positive association would not be detected (Durocher et a l , 2000). Instead, studies have shown that LD between polymorphisms varies and is not solely a factor of

distance. Two polymorphisms separated by a few bases can be in almost equilibrium

while two other polymorphisms separated by IcM can be in strong LD (Mohike et al.,

2000; Pakstis et al., 2000). Therefore, it would seem that a better association analysis strategy would be to genotype all polymorphisms showing incomplete disequilibrium

within a candidate gene region rather than just one polymorphism every n \ b . In agreement, Risch et al. (2000) have shown that negative association does not exclude a

significant gene effect in the region and that the genome-wide random polymorphism

approach, even at high density, would miss many disease-causing genes.

4.2.4.2. Population choice

Previously, it was thought that isolated and founder-effect populations were the best

populations to choose for genetic disease association studies. It was believed that large

regions of LD were recently created by novel disease mutation within the isolated

populations and bottlenecks experienced by the founder-effect populations. However, it

has recently been shown that common disease variants probably existed before the

Models of these populations show that the marker-variant LD within these populations

would be the same as within the general population (Kruglyak, 1999).

It has now been suggested that European American populations, of constant size or

inbred which have experienced recent admixture, may be better suited for common

disease association studies (Wright et al., 1999). This is for a number of reasons. Firstly, LD has repeatedly been shown to increase from Africa and India through

Europe and East Asia to America due to the migration out of Africa and the

accumulation of random genetic drift at the front of the expansion (Jorde e t a l , 2000; Pakstis et a i , 2000). Secondly, recent populations undergoing admixture after long periods of separation (i.e. African American, Mexican American, etc) may still have

sufficient LD remaining for detecting common disease association. Thirdly,

populations that have been of a constant size rather than expanding, like the Saami in

northern Fenno-Scandinavia, show greater extent of LD (Laan and Paabo, 1997).

Finally, inbred populations show extended regions of LD because of increased

homozygosity. Overall, because common disease variants probably arose prior to the

expansion of the human population, all populations have had the same amount of time

to decrease LD. Yet, because of events within certain populations that either decreased

the breakdown of LD or increased LD, certain populations may require fewer markers

to detect the same common disease association.

Still, it should be noted that the gains made by finding such a disease population might

easily be removed by lack of power due to the small size of such populations. Thus,

studies using large random American European populations may ultimately prove to be

In the present study it was a small European Caucasian population that resulted in

positive association to CCR5 (Section 4.2.1.1.). If instead, as commented throughout

this chapter, a population of greater power had been used for linkage and association

analysis, results of higher significance and greater certainty may have been detected.

4.2.4.3. Biallelic marker allele frequency choice

In has recently been shown that for association analysis it is optimal for the allelic

frequencies of the biallelic polymorphisms chosen for genotyping to match the disease

variant frequency (Muller-Myhsok and Abel, 1997). However, because for a common

disease it is not known how many disease loci are present and how frequent the

respective variants are, it is impossible to know what frequency biallelic polymorphisms

to use for association analysis. Instead, it has been suggested that biallelic markers with

a range of allele frequencies from 0.25-0.75 should be used for analysis within a disease

region or across a candidate gene (Kruglyak and Lander, 1996). This would allow for

the frequent and most common variants to be detected with two-point association

analysis, while the less frequent variants could still be detected by haplotype association

analysis.

In the candidate gene association studies reported here across GNA12, CCR2, CCR5,

and CCRL2, allele frequencies between 0.10-0.90 were used. Both two-point

association analysis and haplotype association analysis were conducted. Thus, if the

associations detected with both CCR2 and CCR5 are real, the CCR5 variant with

stronger two-point association may be a more common disease variant and the CCR2

variant with stronger haplotype association may be a less common disease variant

In document Investigation of putative susceptibility regions to inflammatory bowel disease (Page 197-200)