Sequence Variation and Haplotype Structure at the Human
HFE
Locus
Christopher Toomajian*
,1and Martin Kreitman*
,†*Committee on Genetics, University of Chicago, Chicago, Illinois 60637 and†Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637
Manuscript received January 2, 2002 Accepted for publication May 3, 2002
ABSTRACT
TheHFElocus encodes an HLA class-I-type protein important in iron regulation and segregates replace-ment mutations that give rise to the most common form of genetic hemochromatosis. The high frequency of one disease-associated mutation, C282Y, and the nature of this disease have led some to suggest a selective advantage for this mutation. To investigate the context in which this mutation arose and gain a better understanding ofHFEgenetic variation, we surveyed nucleotide variability in 11.2 kb encompassing the HFE locus and experimentally determined haplotypes. We fully resequenced 60 chromosomes of African, Asian, or European ancestry as well as one chimpanzee, revealing 41 variable sites and a nucleotide diversity of 0.08%. This indicates that linkage to the HLA region has not substantially increased the level ofHFEvariation. Although several haplotypes are shared between populations, one haplotype predominates in Asia but is nearly absent elsewhere, causing higher than average genetic differentiation among the three major populations. Our samples show evidence of intragenic recombination, so the scarcity of recombination events within the C282Y allele class is consistent with selection increasing the frequency of a young allele. Otherwise, the pattern of variability in this region does not clearly indicate the action of positive selection at this or linked loci.
H
FEwas the first gene to be associated with heredi- extended class-I region. The HLA has been the focus tary hemochromatosis, a recessive disease com- of polymorphism studies for decades, since the most mon in many populations of European descent and polymorphic loci in the human genome are found here. characterized by iron overload (Federet al. 1996). Re- Initial studies revealed that balancing selection has cently, rare alleles in additional genes have been identi- acted at a number of HLA genes (e.g.,HughesandNei fied that are associated with the hemochromatosis phe- 1988), and its effect can be seen in the increase in notype (Roettoet al. 1999; Camaschella et al. 2000; variation at neighboring loci (e.g.,Grimsleyet al. 1998; Njajouet al. 2001;Katoet al. 2001) and lead to distinct Satta et al. 1999). However, as diversity studies have disorders. However, the majority of hereditary hemo- broadened to include additional loci, the observation chromatosis cases in Europe are due to changes in the of many peaks and valleys of nucleotide diversity sug-HFEgene (Beutleret al. 1996;Federet al. 1996;Care- gests that simple models of diversifying selection acting llaet al. 1997). HFE plays an important role in regulat- on a handful of exons cannot explain the full complexity ing iron levels in the body (Feder et al. 1998;Salter- of variation in this region (e.g.,Gaudieri et al. 2000). Cidet al. 1999). Although much progress has been made Even though the HFE gene is ⵑ4 Mb away from the in determining the function of HFE, questions concern- highly polymorphic HLA-A locus, the genetic distance ing its specific mechanism of iron regulation still remain between them isⵑ1 cM (Malfroyet al. 1997). In fact, (e.g.,DrakesmithandTownsend2000). The potential the HFE gene was first localized to chromosome 6 on for iron requirements or availability to change in novel the basis of the association between hemochromatosis environments makesHFEa possible target of local adap- and HLA-A3 (Simonet al. 1976). Another report notestation. an extended HLA haplotype common in Europeans,
A feature ofHFErelevant to population genetic infer- A1-B8, which maintains associations with microsatellites ence is its chromosomal context.HFEis found telomeric distal toHFE(Worwoodet al. 1997). These associations to the human leukocyte antigen (HLA) class-I region between HFE and HLA alleles raise the question of on chromosome 6p21 in an area referred to as the whether selection on HLA alleles has influenced the pattern and level of variability found atHFEwithin and between populations.
Studies of HFEvariation have focused on two amino
ThePan troglodytesnucleotide sequence data from this article have
been deposited with the EMBL/GenBank Data Libraries under acces- acid polymorphisms that were discovered when the gene sion no. AF447807.
was mapped by Feder et al. (1996). One mutation, 1Corresponding author:Committee on Genetics, University of
Chi-C282Y, disrupts an intramolecular disulfide bridge and
cago, 1101 E. 57th St., Chicago, IL 60637.
E-mail: [email protected] renders the protein nonfunctional (Federet al. 1997).
This mutation, found at a frequency of up to 10% in the origin of different alleles and the forces that have acted to produce their current global distribution and European populations (Merryweather-Clarke et al.
1997), is by far the major mutation that leads to hemo- frequency. chromatosis. The H63D mutation is generally found at
a higher frequency in Caucasians and also appears to
MATERIALS AND METHODS be associated with hemochromatosis, although its
pene-trance is low (Risch1997). Several other rare hemo- DNA samples: A total of 30 samples (60 chromosomes) were chosen to represent ancestry from African, Asian, and chromatosis-associated HFE mutations have been
de-European peoples. Identifiers in parentheses indicate the sam-scribed (Barton et al. 1999; de Villiers et al. 1999;
ple numbers from the Coriell Cell Repositories’ National Insti-Muraet al. 1999). However, not all cases of hemochro- tute of General Medical Sciences Human Genetic Mutant Cell matosis can be explained by the knownHFEmutations, Repository (http://arginine.umdnj.edu). Samples without these identifiers were either collected at the University of leaving open the possibility of additional minor
disease-Chicago or provided by other labs. The 10 African samples associated mutations. Effort to characterizeHFE
varia-include five Mbuti Pygmies (NA10492–NA10496) from the tion has concentrated on Caucasian populations, and
Ituri forest in northeast Zaire and one sample each of Kikuya the full spectrum of variation atHFEhas not been con- (NA00522), Ghanaian (NA02064A), Zulu (NA02476), !Kung sidered. Therefore, it is difficult to make inferences (NA03043), and Luo (NA03190A) descent. The 10 Asian sam-ples include five individuals of Chinese descent (including about the forces governing this variation.
NA11321–NA11323), two samples of Korean descent (includ-Several groups have proposed that selection has
fa-ing NA00726), and one sample each of Filipino (NA10798), vored the C282Y mutation, but a detailed knowledge
Khmer (NA11373), and Vietnamese (NA03037) descent. The of the linked variation around this site is necessary to 10 European samples include four samples from the previous independently test this hypothesis at the nucleotide study ofEdwardset al. (1988) that come from Utah and its surrounding states and six samples collected in Chicago from level. Two lines of evidence have led to the hypothesis
mixed European ancestry. The samples of Edwards et al. of selection. One is based on the function of HFE, with
(1988) are members of hemochromatosis pedigrees that we the selective advantage for C282Y possibly stemming
selected on the basis of an apparently normal phenotype and from its potential to prevent iron deficiency. The second the lack of the C282YHFEmutation. All other samples were is based on the seemingly incongruous observation that chosen either from healthy volunteers or from individuals with disorders that have no known association with variation C282Y appears extremely young but is a relatively
high-in the HFE gene. In addition to the 10 European popula-frequency mutation in European populations. Models
tion samples, two unrelated samples of European descent of the decay of linkage disequilibrium (LD) over time
(NA14620 and NA14621) that were homozygous for theHFE estimate a young age (⬍100 generations) for the C282Y C282Y mutation were selected for sequencing. The Institu-mutation (Ajioka et al. 1997;Thomas et al. 1998). In tional Review Board of the University of Chicago approved
this project. contrast, the expected age of an allele at 5% frequency
PCR and sequencing:The region under study consists of
in a population with an effective size of 104 is ⬎6000
bases 43,385–54,657 of the human hereditary hemochro-generations (Kimura and Ohta 1973). Selection for
matosis region (GenBank accession no. U91328;Laueret al. the young C282Y mutation or hitchhiking with a linked 1997); for convenience we change the base numbering to mutation may have allowed it to reach its present fre- 1–11,273, respectively. A large number of primers were de-signed throughout this region from the reference sequence, quency. While experiments to test the selective
advan-which does not carry the C282Y or H63D mutation. The prim-tage of C282Y on the basis of functional differences
ers and conditions used for amplification and sequencing are are difficult and will not necessarily shed light on the
available upon request. For 23 of the 30 samples, overlapping historical reason for C282Y’s high frequency, the pat- diploid PCR products ofⵑ1 kb were sequenced to determine terns of nucleotide variability and LD in and around the identity of nucleotides 31–11,244 (excluding external primer sequence). PCR products were used as templates for the HFEregion can provide evidence for the selective
dRhodamine terminator cycle-sequencing reactions that were advantage of particular mutations or the pattern of
subsequently cleaned and run on an ABI 377 automated se-hitchhiking. For example, if a rapid change in the
fre-quencer (Applied Biosystems, Foster City, CA). Chromato-quencies of alleles under diversifying selection at classi- grams were imported into Sequencher v. 3.0 (Gene Codes, cal HLA loci were responsible for C282Y’s high fre- Ann Arbor, MI) for manual assembly of contigs and identifica-tion of polymorphic sites. Each base in the study was called, quency, one might expect that other allele classes at
using at least single-fold coverage sequencing reads for each
HFEwould display a lack of variation similar to that of
strand, except for a few small regions where sequence repeats the C282Y class.
made the reads in one direction of poor quality and bases were In this report we describe the nucleotide variation and called using information primarily from one strand. Sequence haplotype structure inHFEfor a worldwide population from the HFE gene was also obtained from one common chimpanzee (Pan troglodytes) from DNA provided by Dr. D. H. sample. We test whether the pattern of variability is
Ledbetter. Most PCR and sequencing primers worked with consistent with an equilibrium neutral model of
evolu-the chimpanzee sample, and where gaps remained new PCR tion. We compare the level of population variation and
primers were designed.
lis), cloned with the Topo XL PCR cloning kit (Invitrogen, The mutational relationships among the experimentally de-termined haplotypes were visualized by using the reduced Carlsbad, CA), and the whole insert was sequenced directly.
This method leads to the sequencing of PCR errors and may median (RM) algorithm of the program Network 2.0 (http:// www.fluxus-engineering.com;Bandeltet al. 1995). This algo-produce hybrid sequences derived from the maternal and
paternal alleles. Therefore, we confirmed each difference rithm is designed for use with nonrecombining DNA types, but it provides a convenient means of displaying haplo-from the reference sequence by either sequencing or DHPLC
analysis (Transgenomic, Omaha) of smaller PCR products type relationships, especially for very similar haplotypes, when recombination is not very prevalent in a region. In con-produced from genomic DNA.
Haplotype determination: For the 4 samples from the structing the network, the algorithm links haplotypes that
Edwardset al. (1988) study, pedigree analysis was performed differ at only one site and then assumes that mutational events to determine haplotypes. For the 19 other samples with more proceed from a more frequent haplotype to a less frequent than one heterozygous site, the 11-kb long-range PCR product one to choose between equally parsimonious mutational paths was amplified and cloned as described above. For at least linking more distantly related haplotypes. In cases where this two clones per sample, we directly sequenced small regions assumption does not resolve alternate paths, the uncertainty containing the heterozygous sites. Comparison of these se- is shown as a reticulation, or loop, in the network.
quences with the corresponding diploid sequence was used The program Arlequin 2.000 (Schneideret al. 2000) was to determine if any clones were hybrids of the maternal and used to perform an AMOVA analysis of the genetic differentia-paternal alleles. This occurred for only one sample, and the tion among population samples.CHRMwas estimated from coa-sequencing of the heterozygous sites from a third clone re- lescent simulations of a constant size population conditioned solved which clone was hybrid and which were true alleles. on the number of segregating sites with J. Wall’s program For the 7 samples that were cloned before being sequenced, hrmpg2, available on the Hudson lab homepage (http:// initially one clone per sample was sequenced. To find the home.uchicago.edu /ⵑrhudson1 / source / JWallCode.html). allelic clone for each of these samples, individuals were A total of 105replicates are run for each tested value ofC. screened for heterozygosity at intermediate frequency poly- Initially, 31 values ofC were tested (0, 1, 2, . . . , 30) and morphisms by DHPLC. All 7 samples were found to be hetero- then an additional 11–21 values were tested around the above zygous at one or more sites, and a second clone for complete maximum-likelihood estimate [e.g., 4.0, 4.1, . . . , 6.0 for an sequencing that carried the other base at a heterozygous site initial maximum-likelihood estimate (MLE) of 5].
CL( Hud-was chosen. To ensure that neither of the sequenced clones son2001) was estimated under a model without gene conver-were hybrids, some sites that appeared homozygous and sion with a program provided by R. Hudson. In assessing LD, matched the published sequence after sequencing the two we have assumed that at site 7633 a C to G mutation preceded clones were tested for homozygosity by DHPLC. Analysis a G to A mutation, while for site 3877 (also triallelic) the proved homozygosity for all sites tested in this way and demon- order of mutations is not clear, and we have excluded the site strated no further hybrid clones. in populations segregating all three alleles. LD was calculated
Data analysis:The program DnaSP, ver. 3.53 (Rozasand asD⬘(Lewontin1964) and significance was assessed by
Fish-Rozas1999) was used to estimate parameters and perform er’s exact test. statistical analyses unless otherwise noted. Length
polymor-phisms in repetitive DNA (most frequently mono- or dinucleo-tide repeats) could not always be determined with certainty,
are not reported here, and were ignored for these analyses. RESULTS Also, length differences in mono- or dinucleotide runs
be-HFE diversity at the nucleotide level: A total of 41
tween human and chimpanzee were not scored, and fixed
differences that interrupt these repeat stretches have also been variable sites were identified in the 11,214-bp region excluded. Only single-nucleotide polymorphisms have been of the 60 chromosomes surveyed: 38 diallelic single-included in summaries of nucleotide variability, and one
sin-nucleotide polymorphisms (SNPs), two triallelic SNPs, gleton insertion/deletion (indel) polymorphism found in
and one diallelic single-base indel polymorphism. The complex sequence (site 9681) has been excluded. For the
computation of diversity estimates, the number of observed two SNPs found triallelic in the pooled sample indicate mutations is used in place of the number of segregating sites. that at least two mutations have occurred at these sites The confidence interval forwas calculated as described in in the history of this sample. Of the 41 polymorphic
KreitmanandHudson(1991). Significance for Tajima’sD
sites, 8 segregate singletons, with the more frequent (1989),Fu andLi’sD(1993), andFayand Wu’sH (2000)
allele matching the chimpanzee sequence in each case. tests was assessed by comparison to the output of neutral
coalescent simulations of 103random samples with identical
Only two SNPs were in exons: SNP 6724, which causes sample size and polymorphism level as the observed data, the H63D amino acid polymorphism (Federet al. 1996), assuming constant population size and no recombination
and SNP 3470, which is a synonymous change. The (which makes the tests conservative). The H test was
per-bottom of Figure 1 indicates the location of all polymor-formed using a program provided by J. Fay. The program
K-Estimator v5.5 (Comeron1999) was used to test the differ- phisms relative to the intron-exon structure ofHFE. ence between human-chimp divergence levels with 104Monte
Summary statistics describing the sequence diversity Carlo simulation replicates. Noncoding regions conserved be- in the pooled and individual populations are presented tween human and mouse (seeresults) were compared to
in Table 1. Overall, average per-nucleotide expected simulation results of random divergence between two
se-heterozygosity,w, for the total sample, estimated from quences of the same length and G⫹C content as theHFE
noncoding sequence ( J.Comeronand M.Kreitman, unpub- the observed number of mutations (Watterson1975), lished results). Significance was assessed by measuring the was 0.080% [0.040–0.148%, 95% confidence interval longest region with at least 75% identity in each of 103
repli-(C.I.)]. Nucleotide diversity (), an estimate ofbased cates. Hudson-Kreitman-Aguade´ (HKA) tests were performed
on the average pairwise sequence difference (Nei1987), via coalescent simulation using the program of J. Hey (http://
se-Figure 1.—Observed HFE haplo-types. Bases identical to the chimpan-zee haplotype are marked with a dot. Haplotypes are numbered 1–19 to in-dicate relative frequency from high-est to lowhigh-est and ordered so that closely related haplotypes are near each other. Population counts are shown at the right. Polymorphisms are grouped by different regions of the HFE locus, as indicated in the gene diagram at the bottom. The arrow marks the direction of tran-scription and its start site (the 5⬘of the gene is at the right); solid regions are either flanking sequence or in-trons, striped regions are coding ex-ons, and checked regions are the 3⬘ and 5⬘ UTRs. The locations of poly-morphisms are shown below the gene diagram, with the location of C282Y (4762 in our notation; not observed in our population sample) indicated by a star.
quence is excluded, increases by ⵑ8% to 0.091%. tide diversity for Europeans is only slightly lower than that for Africans, but they have many fewer population-These estimates of assume an infinite-sites model,
which is clearly violated since the data contain two SNPs specific SNPs than the Africans (4vs. 11, Table 1). Euro-peanHFEvariability is not strictly a subset of that found that are triallelic. However, estimates ofderived from
finite-sites models lead to only negligible differences in Africa, consistent with the previous finding of HFE polymorphisms with an apparent European origin from the infinite-sites estimates for our data (Tajima
1996). Even when we allow the mutation rate to vary (Fairbanks2000). The Asian samples have the lowest nucleotide diversity, which is still 65% of the African extensively among sites (with mutation rates following
a␥distribution with parameter␣ ⫽0.1), the finite-sites value, indicating that all populations studied have sub-stantial HFE variability. Asians also have relatively few estimates ofare not substantially changed (increases
from 0.084 to 0.085%). Our estimate of for HFE is population-specific polymorphisms (4, Table 1), but in this case, they are all singletons, contributing to a higher slightly lower than the average for fourfold degenerate
coding sites in humans (0.11%;LiandSadler 1991), wrelative to.
Allele frequency spectrum:Test statistics that utilize the but conforms to this value and the average estimated for
similar gene regions in humans (0.081%;Przeworskiet frequency spectrum of alleles within a locus may detect departures from an equilibrium neutral model caused al. 2000), for the whole genome (0.075%;
Interna-tionalSNPMap Working Group2001), and for chro- by demographic forces such as population growth, con-traction, and subdivision or by the effect of diversifying mosome 6 in particular (0.074%;International SNP
Map Working Group2001) on the basis of our confi- or directional selection on linked sites.Tajima’s (1989) D statistic compares the two estimates of described dence intervals.
Diversity in continental populations: When the coding above, whileFuandLi’s (1993)Dstatistic compares the number of singleton polymorphic sites with the estimate region is excluded,increases by nearly the same
per-centage (ⵑ8%) for each population. As is commonly of based on segregating sites and incorporates out-group information to infer derived alleles. Additionally, observed, the Africans have the highest nucleotide
diver-sity and the largest number of population-specific SNPs, FayandWu’s (2000)Hstatistic can detect departures in the frequency spectrum due to recent hitchhiking at 11. However, only 2 of these 11 are singletons, so
TABLE 2 significant at the 5% level, so the equilibrium model of
neutral evolution cannot be rejected. However, it should Results of HKA tests be noted that the small size of the individual populations
(n ⫽ 20) limits the power of these tests (Simonsen et HFEnoncoding region al. 1995). The pattern observed for the different
popula-Population na Sb Dc P(HKA)d
tions is informative, though, since negative values of
Tajima’s and Fu and Li’s test statistics indicate an excess Africa 20 29 76.6 0.89
Asia 20 27 76.6 0.36
of low-frequency polymorphisms, which is expected
un-Europe 20 26 75.5 0.46
der a model of human population growth from its
ances-tral size. Only the Asian population shows a slight excess aNumber of chromosomes.
of low-frequency polymorphisms. Among their variable bNumber of segregating sites in humans. cAverage pairwise divergence from chimpanzee.
sites, 17 have a minor allele frequency of 10% or less
dProbability of the HKA test using the pooled data from 10
in Asians, with a comparison to chimpanzee sequence
noncoding regions (Frisseet al. 2001) as the reference locus. implying the rare alleles are derived. However, 8 of the
remaining 10 derived SNP alleles have a frequency in Asians of⬎50%. It is possible that a selective sweep or
mologous sequence in mouse. Sa´nchez et al. (1998) hitchhiking has boosted these 8 SNP alleles to high
reported conservation between human and rodent in frequency. This pattern in Asians may be consistent with
theHFE promoter region, and we estimated conserva-drift and a stronger founder effect than is seen in the
tion with the published mouseHFEsequence (GenBank other populations. For the African, European, and
AF007558) across our entire 11-kb region, using avail-pooled populations, these test statistics show no clear
able global alignment and visualization tools (Mayor sign of population expansion when all observed SNP
et al. 2000). Because the extent of divergence for non-alleles are assumed neutral.
coding sequence between human and mouse prevents Divergence from chimpanzee and haplotype
varia-the creation of a definitive global alignment, one cannot
tion:The SNP alleles in the total sample of 60
chromo-calculate a net divergence in the same way one would somes were found to occur in 18 distinct haplotypes (19
for the human/chimp comparison. Aside from the ex-if the singleton indel is included). These haplotypes are
ons, two conspicuously conserved regions were found displayed in Figure 1 along with a haplotype composed
in noncoding sequence, one 74.6% identical over 114 of the ancestral state of each allele inferred from
chim-bp in the 3⬘UTR and one 74.5% identical over 188 bp panzee (P. troglodytes). The chimpanzee sequence for
in intron 5. These regions are significantly longer (P⬍ the complete region revealed 71 fixed differences from
0.001) than what is expected for unconstrained regions human, including 69 SNPs, a 2-bp indel, and a complex
of equal or higher identity due only to common ancestry mutation involving a base change and a single-base indel
and given the observed human/mouse KS for HFE at a neighboring site. Only one fixed difference was
(0.65). They provide strong evidence that some func-found in the coding region, and this was a synonymous
tional constraint exists outside the coding region that change. The average number of nucleotide
substitu-could contribute to the low divergence observed be-tions per site between human and chimpanzee is 0.690%
tween human and chimp. The 3⬘UTR conserved region (0.750% for noncoding sequence). A recent analysis
contains two fixed differences between human and of divergence levels between humans and chimpanzee
chimp while neither region contains polymorphisms in reports an average distance of 1.03⫾0.04% for introns,
humans, although divergence with chimp and polymor-from 32 loci, with a combined length ⬎41 kb once
phism within human throughout the whole region are repetitive regions were removed (ChenandLi 2001).
too low to provide evidence for or against functional The same calculation for nonrepetitive sequence inHFE
constraint of specific subregions. introns (4032 bp) is 0.623%, which is significantly lower
In addition to functional constraint, a low neutral than this average under a model where the mutation
mutation rate could contribute to the low divergence. rate is constant among sites (P ⫽0.006). However,
Both mutation rates and sequence divergence are in-tron divergence values in 4 of the 32 loci from the Chen
fluenced by G ⫹ C content (Wolfe et al. 1989), but and Li study (including one region of 9556 bp) are lower
HFE has an intermediate G ⫹ C content (45.1%) so than that forHFE, suggesting that the low divergence
that low divergence caused by a low mutation rate is observed forHFEintrons is not exceptionally low
not expected. If the low divergence did reflect a rela-tive to the actual distribution of intron divergence
tively low neutral mutation rate, then we might expect values.
to see a level of polymorphism lower than the average The low observed divergence might result from a high
level observed. HKA tests (Hudsonet al. 1987) compar-average degree of constraint for the wholeHFEregion,
ing the noncoding portion of theHFEregion with the which is composed primarily of introns, untranslated
10 “locus pairs” ofFrisseet al. (2001) for each popula-regions (UTRs), and intergenic sequence. To address
and thought to evolve neutrally, providing a suitable lation expansion is apparent based on haplotype diver-sity in each population.
reference for the HKA test. Although HFEhas a
poly-morphism to divergence ratio higher than that of the Haplotype network:Figure 2 displays an RM network of haplotypes constructed from the pooled samples. Most pooled locus pair data, particularly in Asians and
Euro-peans, at least 1 locus pair has a ratio higher than that haplotypes are unique to one particular population, since the relatively small sample size of each population ofHFEin each population. None of the HKA tests are
significant, so we cannot conclude that the ratio of poly- reduces the chance of including rare haplotypes in each population sample. But other features, such as a branch morphism to divergence is significantly higher forHFE.
For every SNP in humans, one of the alleles was pres- that leads to four haplotypes found exclusively in Afri-cans and representing 40% of all African samples, sug-ent in the homologous position in chimpanzee. In all
but three cases, the more common SNP allele in the gest the population differentiation seen in this sample may be real. The branch leading to the chimpanzee pooled sample corresponds to the inferred ancestral
allele. For these exceptions (SNP alleles 11204C, 519A, haplotype (Figure 2, arrow) contains the fixed differ-ences as well as site 11,204, which is polymorphic in and 7451A), the derived allele frequencies are 58, 67,
and 72%, respectively. Inference of ancestral state based humans. The presence of recombinant haplotypes com-plicates the inference of haplotype relationships and on only one outgroup can be incorrect, but at least 3
derived neutral alleles out of 42 are expected to have results in mutations that have occurred only once in the history of a sample to be displayed multiple times ⬎50% frequency in a sample of this size. No haplotypes
in humans have the same configuration at the 41 poly- in the network. Of the 43 mutations inferred from the human sample (including the indel), 10 are found twice morphic sites as has the chimpanzee; that closest to
this configuration is haplotype 9, found exclusively in on the network. Excluding the loop, 5 mutations show up twice on the network, indicating either recurrent Africans, which differs at 3 polymorphic sites and has
an additional 71 fixed differences from the chimpanzee mutations or recombination events. The chance of re-current mutation is appreciable, since 144 CpG sites are sequence.
Because the C282Y hemochromatosis mutation was found in the region studied and two nucleotide sites segregate three alleles each. In fact, CpG sites have a not observed in the random population samples, we
sequenced two Caucasians known to be C282Y homozy- mutation rate that is estimated from our data to be⬎15 times higher than that of other nucleotide sites, similar gotes and observed that this mutation occurs on the
haplotype 3 background. Haplotype 3 is found in all to results for theLPLgene (Templetonet al. 2000). A similarly high rate of mutation at these sites is apparent three continental populations and is the second most
common haplotype in Europeans. No additional poly- from the divergence data with chimp. Both triallelic SNPs are found in CpG sites, with 1 of the 2 inferred morphisms were discovered by sequencing the complete
11,214 bp of these two homozygotes (four chromo- mutations at each site consistent with a transition from 5-methylcytosine to thymine. Of the diallelic sites that somes), consistent with the previous conclusion (e.g.,
Ajiokaet al. 1997) that samples carrying C282Y have a occur more than once in the network, 3 out of 10 are CpG sites but none are consistent with this type of transi-relatively recent common ancestry.
Number of haplotypes:Fu’s (1997)Fsstatistic compares tion. This makes recombination a more likely cause of the observed homoplasy. By pruning certain haplotypes the observed number of haplotypes in a sample to the
number expected assuming an infinite-sites model of from the network we can reduce the number of homo-plasic sites and find evidence for which haplotypes are mutation under neutrality and no recombination and
is useful for detecting population growth or hitchhiking. recombinant. Potential recombinant haplotypes in-clude 17 and 19, both occurring only once and generat-In each population as well as the pooled sample, no
excess haplotype variation is observed atHFE, as seen ing recurrent mutations. Haplotypes 1, 12, and 18 may all stem from one or more recombination events, as by nonsignificantFsvalues for all populations (see Table
1). In fact, in each case Fs is positive and therefore their exclusion will remove most instances of recurrent mutation. Potentially ancient recombination events that indicates a deficit of haplotypes given the observed level
of nucleotide diversity. This finding is surprising, as created haplotypes found at high frequency today make it difficult to produce a more accurate genealogy of both recurrent mutation and recombination have likely
affected the samples, and both are expected to produce HFE alleles. But the network facilitates the design of strategies to inferHFEhaplotypes from samples by typ-additional haplotypes. The deficit of haplotypes in this
case suggests that either recombination and recurrent ing only a few polymorphisms.
Population subdivision:To investigate population
dif-mutation have not increased the haplotype diversity of
these samples greatly or other forces have kept the num- ferentiation at HFE, we describe the unequal distribu-tion of variadistribu-tion among populadistribu-tions. In the sample, ber of observed haplotypes low. However,Strobeck’s
statistic (1987), which tests for the opposite pattern of 19 (44%) derived SNP alleles were found in all three populations, while 11 were restricted to African popula-observing too few haplotypes, is also not significant for
popula-Figure 2.—RM network of HFE
haplotypes. Mutational relationships are indicated by lines linking the 19 unique haplotypes, represented in the network as circles. The size of each circle is proportional to the rela-tive frequency of the haplotype in the total sample. Circle fill patterns show in which population(s) each haplo-type was observed, with pie diagrams used for haplotypes found in multi-ple populations. Mutational differ-ences between haplotypes are indi-cated on the branches of the network (amino acid replacement mutations are blocked, and mutations found on the network more than once are underlined). The arrow points to the node where the chimpanzee haplo-type connects to this network.
tions (a total of 20 or 47% restricted to one population). Recombination and linkage disequilibrium:Since we have unambiguously determined the haplotypes atHFE, Also, European and Asian populations shared 4 SNP
alleles that were not found in Africa, while Africans did we are more confident in making conclusions about the evidence for recombination and LD in the region. When not share any SNP alleles with only Europeans or Asians.
This supports the shared ancestry of the Asian and Euro- four gametic combinations are found in a sample of two-site haplotypes, this is an indication of recombination pean samples after their split from African populations,
which is also apparent since Europeans and Africans (crossing over between the two sites or gene conversion) or repeated mutation. Even with moderate mutation share 1 haplotype in common and Europeans and
Asians share 2. When alleles are grouped together as rate heterogeneity, one can assume the probability of repeated mutation at any one site is very small since this haplotypes, each haplotype is expected to have a more
limited distribution. We see 14 haplotypes (74%) probability is no greater than that of a single mutation at the same site. A moderate level of recombination is unique to single populations (7 in Africa, 3 in Asia, and
4 in Europe) while all three populations share only 2 consistent with the results of this four-gamete test, in which 25/528 site pairs (excluding singletons) have all haplotypes.
Wright’s (1931)FSTstatistic serves to quantify popu- four gametes found in the pooled sample (Table 4). A minimum number of recombination events (RM) can lation differentiation by expressing the genetic variance
among populations divided by the genetic variance of be inferred from the data to explain all instances of four gametes (Hudson andKaplan1985). The Asian the total population. Using an AMOVA analysis based
on polymorphic sites and in which our samples were and European populations show a number of site pairs with four gametes and, therefore, nonzeroRM’s (Table split into three continental groups (Weir 1996), we
estimate anFSTvalue of 0.23 (P⬍0.00001, by haplotype 1). Although no site pairs have four gametes in the African sample (RM⫽0), recombination may still have permutation). This value exceeds the average calculated
for many other genes (Cavalli-Sforzaet al. 1994). The occurred. In the pooled sample, one inferred crossing-over event is between sites⬍1 kb apart (6567 and 7451). FST estimate based on treating the region as a single
locus with many alleles corresponding to each different haplotype is 0.17 (P⬍0.00001). For both calculations,
TABLE 3 continental populations are found to be significantly
differentiated at HFE. Table 3 shows two estimates of Pairwise estimates of population subdivision (FST) pairwise populationFSTvalues. The comparison of
Afri-Africa Asia Europe
can and Asian samples produces the highest values. The Asian samples are distinguished by the extremely high
Africa — 0.352a 0.143
frequency (13/20) of haplotype 1, which is absent from Asia 0.237b — 0.151
Africans and at low frequency in Europeans. The African Europe 0.060 0.229 — samples tend to carry haplotypes unique to Africa (15/
aAbove-diagonal entries are generated from sequence data
20), and some of these haplotypes are not closely related
as described in Hudsonet al. (1992).
to European and Asian haplotypes (see Figure 2). Both bBelow-diagonal entries are generated by Arlequin AMOVA
of these patterns contribute to the higher-than-average analysis on the basis of haplotype frequencies (Weir 1996; Schneideret al. 2000).
TABLE 4
Significant pairwise linkage disequilibria and four-gamete site-pair counts
African Asian European Pooled
Total pairwise four-gamete comparisons 378 351 351 780
Total excluding singletons 231 120 171 528
No. with four gametes 0 13 18 25
Total pairwise LD comparisons 378 351 351 780
No. with power to detect LD (P⬍0.001) 55 12 52 310
No. significant atP⬍0.001 35 8 21 86
All instances of site pairs with four-gamete types in which considered). The pooled sample has the most power to detect LD, but pooling can also cause spurious LD due the sites flank both sides of this interval (22/25) could
be explained by this one recombination event. only to allele frequency differences between popula-tions.D⬘values for all alleles that are found at least twice SinceRMgives only a lower bound on the amount of
recombination, we can use other methods to estimate in each of a pair of populations are highly correlated between populations (Africanvs. European,r⫽0.999; the population recombination parameter C (⫽ 4Nr,
whereNis the effective population size andris the per- Africanvs. Asian, r ⫽ 0.888; Asian vs. European, r ⫽ 0.718), with no cases of significant LD in opposite direc-locus recombination rate per generation). Estimators
ofCthat use patterns of sequence variation can avoid tions for pairs of populations. The high correlations are not surprising since most pairs of alleles are in complete inaccuracies due to local variation in recombination
rates found in estimates based on observed crossing- LD in each population.
Figure 3 shows the location of site pairs in significant over events between distant markers.CHRM, which uses
the observed number of haplotypes andRM from data LD for the European population. LD appears evenly distributed throughout the region, with a minor concen-to estimateCunder a model without gene conversion,
performs well against other estimators of C (Wall tration of significant LD at the 5⬘ end of the gene, particularly among two sites in the first intron and two 2000). It is ⵑ5 for HFE in different populations,
al-though it is 0 in Africans, due to their lack of four- sites in the 5⬘ flanking region, which have a perfect gamete site pairs (Table 1).Hudson(2001) has recently
developed a composite-likelihood method for
estimat-ing C, CL, which performs better than another
com-monly used estimator of C (Hudson 1987) and that Frisseet al. (2001) have used to estimate crossing over and gene conversion rates in the human genome. When the gene conversion rate is held at 0, then this method’s estimates ofC are in good agreement with those pro-duced by CHRM (Table 1). In the African sample, the likelihood of theCLestimate is not much greater than that ofCL⫽0, indicating little evidence for recombina-tion and consistent with the CHRM value. This result is unusual, as the study ofFrisseet al. (2001) found that this same estimate for the pooled data of 10 loci is much higher in Africans than in non-Africans. Their higher population recombination parameter estimate in Afri-cans is also consistent with the finding that LD decays more rapidly in Africans than in non-Africans ( Tish-koff et al. 1996, 1998, 2000; Kidd et al. 1998, 2000;
Mateuet al.2001;Reichet al. 2001). Figure3.—Significance of pairwise linkage disequilibrium A moderate level of LD is observed throughout the for the European samples. Cell shading highlights site pairs with higher significance levels (as assessed by Fisher’s exact region, consistent with the estimates of the population
tests without any correction for multiple testing). Singleton recombination parameter. Due to the low frequency of
sites have been removed. Site 7633 is triallelic in Europeans; most polymorphisms and our modest sample size, most
given our data we infer the order of the two mutations at this pairwise LD comparisons (Table 4) do not have the site to be C to G and then G to A. We therefore code all power to detect significant LD at the 0.001 level (a 7633A alleles as 7633G alleles with another mutation at the
making them more informative in resolving which evo-lutionary forces have affected patterns of variation and governed the fate of alleles that alter the amount or function of a protein. Another significant aspect of our study is the experimental determination of haplotypes for an autosomal gene. These haplotypes provide a level of resolution greater than that of SNPs alone when drawing conclusions from genomic variability.
Summary ofHFEvariation:The results of any survey
of population variation can be roughly separated into three categories: the level of variation, the frequency spectrum of that variation, and the haplotype structure of the variants. These categories are not independent, as they all reflect the underlying population history of a sequence of DNA, but they do capture different as-pects of the data. Before this study, we had little informa-tion about the level of nucleotide variainforma-tion atHFE. Pro-tein polymorphisms seemed few and rare, but this could
Figure4.—Pairwise linkage disequilibrium as a function of
be due to high conservation imposed by the protein physical distance. LD is measured by |D⬘|, which varies between
0 and 1, calculated for the pooled sample. Solid circles show function and the slightly deleterious nature of most |D⬘| only for pairs of polymorphic sites with both minor allele amino acid substitutions. TheHFE gene is thought to frequencies at 25% or greater. Solid triangles represent the lie in or near a region of low recombination, asMalfroy average of these points in windows of 1 kb. Open triangles
et al. (1997) have estimated a rate of 0.2 cM/Mb for the represent the average |D⬘| in windows of 1 kb when
polymor-region between HLA and HFE. Despite this fact, we phisms with minor allele frequency at 10% or greater are
considered. found that its level of polymorphism is about average for the genome. Both hitchhiking and background se-lection are expected to lower neutral variation in re-association (sites 9013, 10,047, 10,701, and 11,204). gions of low recombination and produce a positive cor-Plots from the other populations (not shown) reveal a relation between levels of recombination and variation similar, even distribution of LD, providing no evidence (Maynard SmithandHaigh1974;Charlesworthet of a recombination hotspot within this region. The rate al. 1993). Evidence for this correlation in humans is of decay of LD with distance varies depending on the debated, but the increasing number of polymorphism measure of LD used and the minimum frequency of studies at individual loci and SNP data gathered from variants included in the analysis. We have plotted |D⬘| the Human Genome Project may resolve this debate from the pooled populationvs. distance for variants at (reviewed inNachman2001). If the correlation holds 25% frequency or greater (Figure 4). This plot indicates in humans, this suggests that either recombination in that LD falls to one-half of its maximum value at ⵑ6 and directly flanking HFEis not severely reduced or a kb, a rate of LD decline that lies in between that of the simple model of either genetic hitchhiking or back-ACE and LPL regions (Figure 1 in Przeworski et al. ground selection may not apply to this region. This 2000). Intermediate frequency variants tend to be the second alternative might result from the nature of diver-oldest ones in the total population, and therefore time sifying selection acting on the linked HLA region. This has allowed more recombination events to break up could produce a deeper genealogy for the HFE locus their association. For lower-frequency variants, almost and raise its level of variation. When contrasted with all site pairs have |D⬘|⫽1. As expected, when a broader the below-average divergence from chimpanzee, HFE range of SNP frequencies is considered (10% and variation appears slightly increased. This is not reflected higher, see Figure 4), the distance at which LD reaches in the HKA test and due to the stochastic nature of the one-half of its maximum value substantially increases mutation process may not have any real biological basis.
(⬎11 kb). Test statistics reveal no major deviations from the
equilibrium neutral expectation in the frequency spec-trum of SNP alleles, although when the haplotype struc-DISCUSSION
ture is considered, the Asian population shows an un-usual pattern that is not seen in Africans or Europeans We have performed a full resequencing survey of
nu-cleotide variation at the HFE locus using nonclinical and may be consistent with a founder effect or hitchhik-ing. Thus,HFEprovides no evidence for the long-term samples from three major human population groups.
The benefit of including noncoding variation in the growth of the human population. However, on the basis of their study, Frisseet al. (2001) conclude that non-study is that these polymorphisms are often more
sum-marize results that support a bottleneck in these popula- been acting on the protein over the past few million years, although this does not exclude the possibility of tions but point out that more complicated demographic
much more recent or weaker selection on polymor-scenarios must be invoked to account for all of their
phism. non-African data. This appears true of theHFEdata as
The effect of sampling strategy:The structure of our
well, as our non-Africans do not show the expected
population samples represents a compromise between deficit of rare variants produced by a bottleneck.
sampling intensely from a few populations and broadly HFE haplotype structure reveals some evidence for
surveying many populations. For a number of analyses, recombination, although fewer haplotypes than
ex-samples are pooled on the basis of their continental pected are observed on the basis of the number of
origin as is frequently done for humans. This can have variants.PrzeworskiandWall(2001) have noted that
an effect on results when significant genetic subdivision estimates ofC derived from sequence variation for
hu-exists among populations within a continent. Many ge-man genes tend to be much higher than those based
netic surveys are consistent with subdivision stronger in on experimentally measured crossing-over rates. This
Africans than in Asians or Europeans (e.g.,Jordeet al. pattern is true for theHFEgene if we assume an effective
2000), although Zhao et al. (2000) suggest that the population size of 104 and a crossing-over rate that is
frequency spectrum of variation of a presumably neutral lower than the genomewide average (0.2 cM/Mb for
noncoding region on chromosome 22 indicates the op-the region between HLA andHFE;Malfroyet al. 1997).
posite pattern. As implied by the study ofFu(1996) and But, if the crossing-over rate in theHFEgene were 1 cM/
demonstrated byYuet al. (2001), population subdivision Mb (five times the estimated value), then the estimate of
tends to reduce the proportion of low-frequency variants Cwould be 4.5, in good agreement withCHRM. Recent
and therefore make Fu and Li’sDpositive. A positive studies (Ardlieet al. 2001;Frisseet al. 2001;
Przewor-DFLis found only in our African samples, while only the skiandWall2001) find evidence that actual sequence
Africans have a significantFSTvalue when it is calculated data sets on an intragenic scale are more likely under
for individual populations. This indicates that the effect a model of gene conversion and crossing over than
of population subdivision within continents may be lim-under a model of crossing over alone. The data forHFE
ited to results from these samples. Additionally, a num-do not provide sufficient power to reject a model of
ber of points indicate that our sampling scheme is not crossing over without gene conversion, although gene
solely responsible for the observed patterns of variation conversion is known to have played a role in the allelic
within and between populations. First, for each conti-diversity of HLA genes (e.g.,Zangenberget al. 1995).
nental group the sampling scheme is similar, with ap-Our study detects the second most common HFE
proximately one-half of the samples coming from one variant, H63D, at a moderate frequency in Europeans.
focal population and the other one-half pooled from Other reports have found this variant outside of Europe
several populations (or in the case of Europeans, from at a frequency consistent with gene flow from the
Medi-samples of highly mixed ancestry). Second, unusual pat-terranean region. We find this mutation on two different
terns in our data are not due to the choice of the focal haplotypes, consistent with the conclusion that this
al-population for each continental group. In general, focal lele has an origin much older than that of C282Y. The
populations have slightly fewer SNPs and haplotypes fact thatHFEvariation fits an equilibrium neutral model than the other samples from the same continent, as does not conclusively resolve the question of the history would be expected given some amount of population and potential fitness effects of HFE amino acid polymor- differentiation within each continent. For example, the phisms. Amino acid polymorphisms represent only a Asian samples have the lowest nucleotide diversity. This small proportion ofHFEvariation. Additionally, if the does not result from low diversity found only in the focal C282Y allele has been the target of positive selection, Chinese population, as the non-Chinese Asian samples it is still far from fixation in any population, and there- have only a slight increase in nucleotide diversity fore we expect the signature of this selection will be (0.00058vs. 0.00053 for the Chinese). The Asian sample very subtle. Diversifying selection rapidly changing the is also characterized by low haplotype diversity, even frequency of alleles at linked HLA loci could have af- given their low nucleotide diversity. The Chinese sam-fected the frequency of several alleles at theHFElocus. ples contain as many different haplotypes (four) as the But after investigating the pattern of HFE haplotype non-Chinese Asian sample, and the high frequency of diversity, only haplotype 1 in the Asian populations (dis- haplotype 1 is not unique to the Chinese sample (0.6 cussed below) indicates the lack of variation relative to in Chinesevs. 0.7 in non-Chinese Asian samples). How-the frequency of an allele class expected under this type ever, whether different sampling designs can lead to of scenario, providing little evidence that the C282Y quite different conclusions about human diversity is a allele has increased in frequency due to this effect. An question worthy of further study.
alli-Sforzaet al. (1994) have reported average values between estimates of the effective population size based on variation vs. recombination data would suggest a ofFSTfor a large number of DNA polymorphisms (0.139)
departure from an equilibrium neutral model of evolu-and non-DNA polymorphisms (0.119). In most cases,
tion. This finding is remarkable, since similar depar-less differentiation is seen between populations within
tures are not often found in African populations (e.g., continents than between continents, consistent with
Frisseet al. 2001).Wall(2001) has reported that popu-simple isolation by distance.Wakeley(1999) has found
lation structure and population bottlenecks can lead to evidence that, as might be expected, the level of
migra-underestimates of the population recombination pa-tion in humans underwent an increase in the past. At
rameter. Our finding could result if our sampling strat-theHFElocus, significant genetic differentiation is seen
egy has captured population structure in Africa more among continental populations, andFSTis higher than
than previous studies. However, of the above-cited refer-the average values reported by Cavalli-Sforza et al.
ences, most share our strategy of broadly sampling mul-(1994). When FSTforHFEis recalculated by including
tiple populations within continents. only the Mbuti, Chinese, and Utah samples it is actually
The Asian samples provide a more striking contrast higher (0.27vs.0.23), so that this high value is not an
to the other populations. They have the lowest variation, artifact of the diverse sample composition from each
due to both fewer observed haplotypes and the low continent. However, these populations do not
necessar-frequency of many SNPs. Over 60% of SNP alleles in ily represent all of the variation within each continent,
Asians are rare (at 10% frequency or lower). Remark-making a comparison with the results from a larger
ably, when the direction of mutation is inferred from number of populations difficult.
the chimpanzee sequence, nearly 30% of Asian-derived The comparison of variability in different populations
SNP alleles are at a frequency⬎60%. This proportion can provide evidence for local adaptive evolution but
of high-frequency-derived alleles could result from a can be complicated by population history. The greater
past hitchhiking event, although the H statistic does variability of populations from sub-Saharan Africa is
not reach significance. The low number of haplotypes seen in almost every study, and African samples have
observed in Asians and the predominance of haplotype been contrasted to non-African samples to reveal
differ-1 (rarely found outside of Asia) suggest that this haplo-ences in their population history. In a recent study of
type has risen to a high frequency rapidly since the split
⬎300 genes (Stephenset al. 2001), the
African-Ameri-of the Asian populations from the other populations can sample was found to have⬎1.5 times the number
studied. The large number of low-frequency variants of both SNPs and haplotypes than either the Asian or
could mean that, had a selective sweep occurred, it the Caucasian sample. Also, the African-American
sam-happened long enough ago that many new segregating ple had ⬎2 times the number of rare alleles than the
sites have since arisen and thus decreased the power other samples. Results fromHFEshow the most variation
of theHtest (Przeworski2002). However, when the in African samples, but this difference is minor, with
pattern of haplotypes in all samples is considered, the Asians and Europeans having nearly as many SNPs.
Afri-haplotype diversity of Asia is not consistent with muta-cans also have fewer rare SNPs compared to the other
tion arising solely from the haplotype 1 background, populations, in contrast to the Stephens study
(popula-but instead suggests haplotype 1 never reached fixation tion subdivision within our African samples could con- in Asians or migration from other populations has intro-tribute to this result). Thus, the contrast between Afri- duced new haplotypes. The pattern of haplotype varia-can and non-Afrivaria-can samples is very subtle, becoming tion in Asians is also consistent with drift following a apparent only in the measure of nucleotide diversity relatively strong founder event for this group.
since the African SNP alleles tend to be shifted toward The high frequency of haplotype 1 reveals the advan-intermediate frequencies. Due to the similar level of tage of haplotype data over SNP sharing when assessing variability in the different populations and the finding the similarity of populations. Haplotype 1 can explain of SNPs and haplotypes specific to each population, a great deal of the population subdivision seen atHFE non-African variation is not just a subset of African varia- and can provide clues to how the interaction of selection
tion. with population history may have differed for individual
Rochetteet al. (1999) have observed this allele at 8% relative to other possible explanations of the allele’s high frequency given its young apparent age.
in French, 20% in Sri Lankans, and 32% in Burmese.
This report is consistent with our observation of haplo- We thank all those who agreed to donate DNA for this study and type 1 at low frequency in Europeans and also suggests Ami Rice for help collecting samples. Additionally, we thank R. Ajioka and L. Jorde for providing European DNA samples from
hemochro-an increasing gradient in this allele moving east in
Eu-matosis pedigrees; D. Ledbetter for the chimpanzee sample; J. Fay
rasia.BeutlerandWest(1997) have also found a high
and R. Hudson for providing computer programs; E. Stahl, M.
Fuller-frequency of this SNP allele in Asians (63%), although ton, and members of A. Di Rienzo’s laboratory for helpful discussions; they observe it at nearly 13% in Europeans and at 15% J. Comeron for help with the analysis of divergence from chimp and mouse; and J. Comeron, A. Di Rienzo, K. Dyer, M. Hamblin, and two
in a small sample of African-Americans. These higher
anonymous reviewers for helpful comments on the manuscript. This
frequencies outside of Asia make the Asian-specific
ori-work was supported by National Institutes of Health grant GM39355
gin of SNP 4600G less likely. The Beutler and West study to M.K. and a National Science Foundation Doctoral Dissertation infersHFEhaplotypes defined using SNP 4600 and two Improvement Grant (DEB-0073297) to C.T. and M.K. C.T. was par-additional sites. In Asian samples 4600G is always (34/ tially supported by a Howard Hughes Medical Institute predoctoral fellowship and by National Institutes of Health training grant T32
34) found on a haplotype consistent with our haplotype
GM07197 (genetics and regulation).
1, while in the Caucasian samples, 4600G is inferred to appear on haplotype 1 in 7 of 9 cases. This pattern is consistent with a more recent increase in 4600G in Asia,
LITERATURE CITED since it has not had time to recombine onto other
haplo-types, and supports the idea of a founder effect following Ajioka, R. S., L. B. Jorde, J. R. Gruen, P. Yu, D. Dimitrova et al., 1997 Haplotype analysis of hemochromatosis: evaluation
a bottleneck or a hitchhiking event.
of different linkage-disequilibrium approaches and evolution of
Conclusions:The discovery of these additional poly- disease chromosomes. Am. J. Hum. Genet.60:1439–1447.
morphisms in theHFEregion and the haplotypes they Ardlie, K., S. N. Liu-Cordero, M. A. Eberle, M. Daly, J. Barrettet al., 2001 Lower-than-expected linkage disequilibrium between
create may help identify other alleles that have an effect
tightly linked markers in humans suggests a role for gene
conver-on the irconver-on regulaticonver-on phenotype. Several polymor- sion. Am. J. Hum. Genet.69:582–589.
phisms described in this study are found in the 3⬘UTR Bandelt, H. J., P. Forster, B. C. SykesandM. B. Richards, 1995 Mitochondrial portraits of human populations using median
net-of the messenger RNA and could conceivably affect
works. Genetics141:743–753.
mRNA stability or levels of protein translation. Regula- Barton, J. C., R. Sawada-Hirai, B. E. RothenbergandR. T. Acton, tory regions that affect levels of transcription can be 1999 Two novel missense mutations of the HFE gene (I105T and G93R) and identification of the S65C mutation in Alabama
found in introns or flanking a gene, where most of our
hemochromatosis probands. Blood Cells Mol. Dis.25:147–155.
polymorphisms are found. Also, tightly linked regula- Beutler, E., andT. Gelbart, 2000 A common intron 3 mutation
(IVS3 -48c→g) leads to misdiagnosis of the c.845G→A (C282Y)
tory polymorphisms could be in disequilibrium with our
HFE gene mutation. Blood Cells Mol. Dis.26:229–233.
observed haplotypes. The effects of these
polymor-Beutler, E., andC. West, 1997 New diallelic markers in the HLA
phisms in regulatory regions are likely to be quite subtle, region of chromosome 6. Blood Cells Mol. Dis.23:219–229.
Beutler, E., T. Gelbart, C. West, P. Lee, M. Adamset al., 1996
but they could help explain the finding of
hemochro-Mutation analysis in hereditary hemochromatosis. Blood Cells
matosis in individuals that carry only one copy of a Mol. Dis.22:187–194.
hemochromatosis-associated allele. Beutler, E., C. WestandT. Gelbart, 1997 HLA-H and associated proteins in patients with hemochromatosis. Mol. Med.3:397–402.
Finally, knowing more about the levels of variation
Brookes, A. J., H. Lehva¨slaiho, M. Siegfried, J. G. Boehm, Y. P. Yuan
and recombination atHFEwill help evaluate the pecu- et al., 2000 HGBASE: a database of SNPs and other variations in liar pattern of high frequency and young age estimated and around human genes. Nucleic Acids Res.28:356–360.
Camaschella, C., A. Roetto, A. Cali, M. De Gobbi, G. Garozzoet
for the C282Y allele. Evidence for intragenic
recombina-al., 2000 The gene TFR2 is mutated in a new type of
haemochro-tion at HFE has been sparse. Our estimates of local matosis mapping to 7q22. Nat. Genet.25:14–15.
recombination rates based on sequence variation allow Carella, M., L. D’Ambrosio, A. Totaro, A. Grifa, M. A. Valentino
et al., 1997 Mutation analysis of the HLA-H gene in Italian
a reevaluation of the high LD around the C282Y allele,
hemochromatosis patients. Am. J. Hum. Genet.60:828–832.
providing support for its young age. Although this pat- Cavalli-Sforza, L. L., P. MenozziandA. Piazza, 1994 The History
and Geography of Human Genes. Princeton University Press,
tern seems consistent with a selective advantage for
Princeton, NJ.
C282Y, until an appropriate population genetic test for
Charlesworth, B., M. T. Morgan and D. Charlesworth,
positive selection is performed, alternative causes of the 1993 The effect of deleterious mutations on neutral molecular
variation. Genetics134:1289–1303.
pattern such as drift or a population bottleneck or
Chen, F.-C., andW.-H. Li, 2001 Genomic divergences between
hu-growth cannot be ruled out. We have begun to use the mans and other hominoids and the effective population size of polymorphisms and haplotypes reported here to study the common ancestor of humans and chimpanzees. Am. J. Hum.
Genet.68:444–456.
the extent of LD between HFE alleles besides C282Y
Comeron, J. M., 1999 K-Estimator: calculation of the number of
and markers in the several megabases aroundHFE. By nucleotide substitutions per site and the confidence intervals. analyzing LD found around otherHFE alleles, as well Bioinformatics15:763–764.
de Villiers, J. N. P., R. Hillermann, L. LoubserandM. J. Kotze,
as around alleles produced by neutral coalescent
simula-1999 Spectrum of mutations in the HFE gene implicated in
tions for different demographic parameters, we can eval- haemochromatosis and porphyria. Hum. Mol. Genet.8:1517–
1522.