Sequence Variation and Haplotype Structure at the Human HFE Locus

(1)



Sequence Variation and Haplotype Structure at the Human

HFE

Locus

Christopher Toomajian*

,1

**_{and Martin Kreitman*}**

,†

*Committee on Genetics, University of Chicago, Chicago, Illinois 60637 and†_{Department of Ecology and Evolution, University of Chicago,} Chicago, Illinois 60637

Manuscript received January 2, 2002 Accepted for publication May 3, 2002

ABSTRACT

TheHFElocus encodes an HLA class-I-type protein important in iron regulation and segregates replace-ment mutations that give rise to the most common form of genetic hemochromatosis. The high frequency of one disease-associated mutation, C282Y, and the nature of this disease have led some to suggest a selective advantage for this mutation. To investigate the context in which this mutation arose and gain a better understanding ofHFEgenetic variation, we surveyed nucleotide variability in 11.2 kb encompassing the HFE locus and experimentally determined haplotypes. We fully resequenced 60 chromosomes of African, Asian, or European ancestry as well as one chimpanzee, revealing 41 variable sites and a nucleotide diversity of 0.08%. This indicates that linkage to the HLA region has not substantially increased the level ofHFEvariation. Although several haplotypes are shared between populations, one haplotype predominates in Asia but is nearly absent elsewhere, causing higher than average genetic differentiation among the three major populations. Our samples show evidence of intragenic recombination, so the scarcity of recombination events within the C282Y allele class is consistent with selection increasing the frequency of a young allele. Otherwise, the pattern of variability in this region does not clearly indicate the action of positive selection at this or linked loci.

H

FEwas the first gene to be associated with heredi- extended class-I region. The HLA has been the focus tary hemochromatosis, a recessive disease com- of polymorphism studies for decades, since the most mon in many populations of European descent and polymorphic loci in the human genome are found here. characterized by iron overload (Federet al. 1996). Re- Initial studies revealed that balancing selection has cently, rare alleles in additional genes have been identi- acted at a number of HLA genes (e.g.,HughesandNei fied that are associated with the hemochromatosis phe- 1988), and its effect can be seen in the increase in notype (Roettoet al. 1999; Camaschella et al. 2000; variation at neighboring loci (e.g.,Grimsleyet al. 1998; Njajouet al. 2001;Katoet al. 2001) and lead to distinct Satta et al. 1999). However, as diversity studies have disorders. However, the majority of hereditary hemo- broadened to include additional loci, the observation chromatosis cases in Europe are due to changes in the of many peaks and valleys of nucleotide diversity sug-HFEgene (Beutleret al. 1996;Federet al. 1996;Care- gests that simple models of diversifying selection acting llaet al. 1997). HFE plays an important role in regulat- on a handful of exons cannot explain the full complexity ing iron levels in the body (Feder et al. 1998;Salter- of variation in this region (e.g.,Gaudieri et al. 2000). Cidet al. 1999). Although much progress has been made Even though the HFE gene is ⵑ4 Mb away from the in determining the function of HFE, questions concern- highly polymorphic HLA-A locus, the genetic distance ing its specific mechanism of iron regulation still remain between them isⵑ1 cM (Malfroyet al. 1997). In fact, (e.g.,DrakesmithandTownsend2000). The potential the HFE gene was first localized to chromosome 6 on for iron requirements or availability to change in novel the basis of the association between hemochromatosis environments makesHFEa possible target of local adap- and HLA-A3 (Simonet al. 1976). Another report notes

tation. an extended HLA haplotype common in Europeans,

A feature ofHFErelevant to population genetic infer- A1-B8, which maintains associations with microsatellites ence is its chromosomal context.HFEis found telomeric distal toHFE(Worwoodet al. 1997). These associations to the human leukocyte antigen (HLA) class-I region between HFE and HLA alleles raise the question of on chromosome 6p21 in an area referred to as the whether selection on HLA alleles has influenced the pattern and level of variability found atHFEwithin and between populations.

Studies of HFEvariation have focused on two amino

ThePan troglodytesnucleotide sequence data from this article have

been deposited with the EMBL/GenBank Data Libraries under acces- _{acid polymorphisms that were discovered when the gene} sion no. AF447807.

was mapped by Feder et al. (1996). One mutation, 1_{Corresponding author:}_{Committee on Genetics, University of}

Chi-C282Y, disrupts an intramolecular disulfide bridge and

cago, 1101 E. 57th St., Chicago, IL 60637.

E-mail: [email protected] renders the protein nonfunctional (Federet al. 1997).

(2)

This mutation, found at a frequency of up to 10% in the origin of different alleles and the forces that have acted to produce their current global distribution and European populations (Merryweather-Clarke et al.

1997), is by far the major mutation that leads to hemo- frequency. chromatosis. The H63D mutation is generally found at

a higher frequency in Caucasians and also appears to

MATERIALS AND METHODS be associated with hemochromatosis, although its

pene-trance is low (Risch1997). Several other rare hemo- DNA samples: A total of 30 samples (60 chromosomes) were chosen to represent ancestry from African, Asian, and chromatosis-associated HFE mutations have been

de-European peoples. Identifiers in parentheses indicate the sam-scribed (Barton et al. 1999; de Villiers et al. 1999;

ple numbers from the Coriell Cell Repositories’ National Insti-Muraet al. 1999). However, not all cases of hemochro- _{tute of General Medical Sciences Human Genetic Mutant Cell} matosis can be explained by the knownHFEmutations, _{Repository (http://arginine.umdnj.edu). Samples without} these identifiers were either collected at the University of leaving open the possibility of additional minor

disease-Chicago or provided by other labs. The 10 African samples associated mutations. Effort to characterizeHFE

varia-include five Mbuti Pygmies (NA10492–NA10496) from the tion has concentrated on Caucasian populations, and

Ituri forest in northeast Zaire and one sample each of Kikuya the full spectrum of variation atHFEhas not been con- _{(NA00522), Ghanaian (NA02064A), Zulu (NA02476), !Kung} sidered. Therefore, it is difficult to make inferences _{(NA03043), and Luo (NA03190A) descent. The 10 Asian} sam-ples include five individuals of Chinese descent (including about the forces governing this variation.

NA11321–NA11323), two samples of Korean descent (includ-Several groups have proposed that selection has

fa-ing NA00726), and one sample each of Filipino (NA10798), vored the C282Y mutation, but a detailed knowledge

Khmer (NA11373), and Vietnamese (NA03037) descent. The of the linked variation around this site is necessary to _{10 European samples include four samples from the previous} independently test this hypothesis at the nucleotide _{study of}_Edwards_{et al}_{. (1988) that come from Utah and its} surrounding states and six samples collected in Chicago from level. Two lines of evidence have led to the hypothesis

mixed European ancestry. The samples of Edwards et al. of selection. One is based on the function of HFE, with

(1988) are members of hemochromatosis pedigrees that we the selective advantage for C282Y possibly stemming

selected on the basis of an apparently normal phenotype and from its potential to prevent iron deficiency. The second _{the lack of the C282Y}_HFE_{mutation. All other samples were} is based on the seemingly incongruous observation that _{chosen either from healthy volunteers or from individuals} with disorders that have no known association with variation C282Y appears extremely young but is a relatively

high-in the HFE gene. In addition to the 10 European popula-frequency mutation in European populations. Models

tion samples, two unrelated samples of European descent of the decay of linkage disequilibrium (LD) over time

(NA14620 and NA14621) that were homozygous for theHFE estimate a young age (⬍100 generations) for the C282Y _{C282Y mutation were selected for sequencing. The} Institu-mutation (Ajioka et al. 1997;Thomas et al. 1998). In _{tional Review Board of the University of Chicago approved}

this project. contrast, the expected age of an allele at 5% frequency

PCR and sequencing:The region under study consists of

in a population with an effective size of 104 _is ⬎6000

bases 43,385–54,657 of the human hereditary hemochro-generations (Kimura and Ohta 1973). Selection for

matosis region (GenBank accession no. U91328;Laueret al. the young C282Y mutation or hitchhiking with a linked _{1997); for convenience we change the base numbering to} mutation may have allowed it to reach its present fre- _{1–11,273, respectively. A large number of primers were} de-signed throughout this region from the reference sequence, quency. While experiments to test the selective

advan-which does not carry the C282Y or H63D mutation. The prim-tage of C282Y on the basis of functional differences

ers and conditions used for amplification and sequencing are are difficult and will not necessarily shed light on the

available upon request. For 23 of the 30 samples, overlapping historical reason for C282Y’s high frequency, the pat- _{diploid PCR products of}_ⵑ_{1 kb were sequenced to determine} terns of nucleotide variability and LD in and around _{the identity of nucleotides 31–11,244 (excluding external} primer sequence). PCR products were used as templates for the HFEregion can provide evidence for the selective

dRhodamine terminator cycle-sequencing reactions that were advantage of particular mutations or the pattern of

subsequently cleaned and run on an ABI 377 automated se-hitchhiking. For example, if a rapid change in the

fre-quencer (Applied Biosystems, Foster City, CA). Chromato-quencies of alleles under diversifying selection at classi- _{grams were imported into Sequencher v. 3.0 (Gene Codes,} cal HLA loci were responsible for C282Y’s high fre- _{Ann Arbor, MI) for manual assembly of contigs and} identifica-tion of polymorphic sites. Each base in the study was called, quency, one might expect that other allele classes at

using at least single-fold coverage sequencing reads for each

HFEwould display a lack of variation similar to that of

strand, except for a few small regions where sequence repeats the C282Y class.

made the reads in one direction of poor quality and bases were In this report we describe the nucleotide variation and _{called using information primarily from one strand. Sequence} haplotype structure inHFEfor a worldwide population _{from the} _HFE _{gene was also obtained from one common} chimpanzee (Pan troglodytes) from DNA provided by Dr. D. H. sample. We test whether the pattern of variability is

Ledbetter. Most PCR and sequencing primers worked with consistent with an equilibrium neutral model of

evolu-the chimpanzee sample, and where gaps remained new PCR tion. We compare the level of population variation and

primers were designed.

(3)

lis), cloned with the Topo XL PCR cloning kit (Invitrogen, The mutational relationships among the experimentally de-termined haplotypes were visualized by using the reduced Carlsbad, CA), and the whole insert was sequenced directly.

This method leads to the sequencing of PCR errors and may median (RM) algorithm of the program Network 2.0 (http:// www.fluxus-engineering.com;Bandeltet al. 1995). This algo-produce hybrid sequences derived from the maternal and

paternal alleles. Therefore, we confirmed each difference rithm is designed for use with nonrecombining DNA types, but it provides a convenient means of displaying haplo-from the reference sequence by either sequencing or DHPLC

analysis (Transgenomic, Omaha) of smaller PCR products type relationships, especially for very similar haplotypes, when recombination is not very prevalent in a region. In con-produced from genomic DNA.

Haplotype determination: For the 4 samples from the structing the network, the algorithm links haplotypes that

Edwardset al. (1988) study, pedigree analysis was performed differ at only one site and then assumes that mutational events to determine haplotypes. For the 19 other samples with more proceed from a more frequent haplotype to a less frequent than one heterozygous site, the 11-kb long-range PCR product one to choose between equally parsimonious mutational paths was amplified and cloned as described above. For at least linking more distantly related haplotypes. In cases where this two clones per sample, we directly sequenced small regions assumption does not resolve alternate paths, the uncertainty containing the heterozygous sites. Comparison of these se- is shown as a reticulation, or loop, in the network.

quences with the corresponding diploid sequence was used The program Arlequin 2.000 (Schneideret al. 2000) was to determine if any clones were hybrids of the maternal and used to perform an AMOVA analysis of the genetic differentia-paternal alleles. This occurred for only one sample, and the _{tion among population samples.}_C_HRM_{was estimated from} coa-sequencing of the heterozygous sites from a third clone re- _{lescent simulations of a constant size population conditioned} solved which clone was hybrid and which were true alleles. _{on the number of segregating sites with J. Wall’s program} For the 7 samples that were cloned before being sequenced, _{hrmpg2, available on the Hudson lab homepage (http://} initially one clone per sample was sequenced. To find the _{home.uchicago.edu /}_ⵑ_{rhudson1 / source / JWallCode.html).} allelic clone for each of these samples, individuals were _{A total of 10}5_{replicates are run for each tested value of}_C_. screened for heterozygosity at intermediate frequency poly- _{Initially, 31 values of}_C _{were tested (0, 1, 2, . . . , 30) and} morphisms by DHPLC. All 7 samples were found to be hetero- _{then an additional 11–21 values were tested around the above} zygous at one or more sites, and a second clone for complete _{maximum-likelihood estimate [}_e.g._{, 4.0, 4.1, . . . , 6.0 for an} sequencing that carried the other base at a heterozygous site _{initial maximum-likelihood estimate (MLE) of 5].}_␳

CL( Hud-was chosen. To ensure that neither of the sequenced clones _son_{2001) was estimated under a model without gene} conver-were hybrids, some sites that appeared homozygous and _{sion with a program provided by R. Hudson. In assessing LD,} matched the published sequence after sequencing the two _{we have assumed that at site 7633 a C to G mutation preceded} clones were tested for homozygosity by DHPLC. Analysis _{a G to A mutation, while for site 3877 (also triallelic) the} proved homozygosity for all sites tested in this way and demon- _{order of mutations is not clear, and we have excluded the site} strated no further hybrid clones. _{in populations segregating all three alleles. LD was calculated}

Data analysis:The program DnaSP, ver. 3.53 (Rozasand _as_D_⬘₍_Lewontin_{1964) and significance was assessed by}

Fish-Rozas1999) was used to estimate parameters and perform _{er’s exact test.} statistical analyses unless otherwise noted. Length

polymor-phisms in repetitive DNA (most frequently mono- or dinucleo-tide repeats) could not always be determined with certainty,

are not reported here, and were ignored for these analyses. RESULTS Also, length differences in mono- or dinucleotide runs

be-HFE diversity at the nucleotide level: A total of 41

tween human and chimpanzee were not scored, and fixed

differences that interrupt these repeat stretches have also been variable sites were identified in the 11,214-bp region excluded. Only single-nucleotide polymorphisms have been _{of the 60 chromosomes surveyed: 38 diallelic} single-included in summaries of nucleotide variability, and one

sin-nucleotide polymorphisms (SNPs), two triallelic SNPs, gleton insertion/deletion (indel) polymorphism found in

and one diallelic single-base indel polymorphism. The complex sequence (site 9681) has been excluded. For the

computation of diversity estimates, the number of observed two SNPs found triallelic in the pooled sample indicate mutations is used in place of the number of segregating sites. _{that at least two mutations have occurred at these sites} The confidence interval for␪was calculated as described in _{in the history of this sample. Of the 41 polymorphic}

KreitmanandHudson(1991). Significance for Tajima’sD

sites, 8 segregate singletons, with the more frequent (1989),Fu andLi’sD(1993), andFayand Wu’sH (2000)

allele matching the chimpanzee sequence in each case. tests was assessed by comparison to the output of neutral

coalescent simulations of 103_{random samples with identical}

Only two SNPs were in exons: SNP 6724, which causes sample size and polymorphism level as the observed data, _{the H63D amino acid polymorphism (}_Feder_{et al}_{. 1996),} assuming constant population size and no recombination

and SNP 3470, which is a synonymous change. The (which makes the tests conservative). The H test was

per-bottom of Figure 1 indicates the location of all polymor-formed using a program provided by J. Fay. The program

K-Estimator v5.5 (Comeron1999) was used to test the differ- phisms relative to the intron-exon structure ofHFE. ence between human-chimp divergence levels with 104_Monte

Summary statistics describing the sequence diversity Carlo simulation replicates. Noncoding regions conserved be- _{in the pooled and individual populations are presented} tween human and mouse (seeresults) were compared to

in Table 1. Overall, average per-nucleotide expected simulation results of random divergence between two

se-heterozygosity,␪w, for the total sample, estimated from quences of the same length and G⫹C content as theHFE

noncoding sequence ( J.Comeronand M.Kreitman, unpub- the observed number of mutations (Watterson1975), lished results). Significance was assessed by measuring the _{was 0.080% [0.040–0.148%, 95% confidence interval} longest region with at least 75% identity in each of 103

repli-(C.I.)]. Nucleotide diversity (␲), an estimate of␪based cates. Hudson-Kreitman-Aguade´ (HKA) tests were performed

on the average pairwise sequence difference (Nei1987), via coalescent simulation using the program of J. Hey (http://

(4)

se-Figure 1.—Observed HFE haplo-types. Bases identical to the chimpan-zee haplotype are marked with a dot. Haplotypes are numbered 1–19 to in-dicate relative frequency from high-est to lowhigh-est and ordered so that closely related haplotypes are near each other. Population counts are shown at the right. Polymorphisms are grouped by different regions of the HFE locus, as indicated in the gene diagram at the bottom. The arrow marks the direction of tran-scription and its start site (the 5⬘of the gene is at the right); solid regions are either flanking sequence or in-trons, striped regions are coding ex-ons, and checked regions are the 3⬘ and 5⬘ UTRs. The locations of poly-morphisms are shown below the gene diagram, with the location of C282Y (4762 in our notation; not observed in our population sample) indicated by a star.

quence is excluded, ␲ increases by ⵑ8% to 0.091%. tide diversity for Europeans is only slightly lower than that for Africans, but they have many fewer population-These estimates of ␪ assume an infinite-sites model,

which is clearly violated since the data contain two SNPs specific SNPs than the Africans (4vs. 11, Table 1). Euro-peanHFEvariability is not strictly a subset of that found that are triallelic. However, estimates of␪derived from

finite-sites models lead to only negligible differences in Africa, consistent with the previous finding of HFE polymorphisms with an apparent European origin from the infinite-sites estimates for our data (Tajima

1996). Even when we allow the mutation rate to vary (Fairbanks2000). The Asian samples have the lowest nucleotide diversity, which is still 65% of the African extensively among sites (with mutation rates following

a␥distribution with parameter␣ ⫽0.1), the finite-sites value, indicating that all populations studied have sub-stantial HFE variability. Asians also have relatively few estimates of␪are not substantially changed (␲increases

from 0.084 to 0.085%). Our estimate of ␪ for HFE is population-specific polymorphisms (4, Table 1), but in this case, they are all singletons, contributing to a higher slightly lower than the average for fourfold degenerate

coding sites in humans (0.11%;LiandSadler 1991), ␪wrelative to␲.

Allele frequency spectrum:Test statistics that utilize the but conforms to this value and the average estimated for

similar gene regions in humans (0.081%;Przeworskiet frequency spectrum of alleles within a locus may detect departures from an equilibrium neutral model caused al. 2000), for the whole genome (0.075%;

Interna-tionalSNPMap Working Group2001), and for chro- by demographic forces such as population growth, con-traction, and subdivision or by the effect of diversifying mosome 6 in particular (0.074%;International SNP

Map Working Group2001) on the basis of our confi- or directional selection on linked sites.Tajima’s (1989) D statistic compares the two estimates of ␪ described dence intervals.

Diversity in continental populations: When the coding above, whileFuandLi’s (1993)Dstatistic compares the number of singleton polymorphic sites with the estimate region is excluded,␲increases by nearly the same

per-centage (ⵑ8%) for each population. As is commonly of ␪ based on segregating sites and incorporates out-group information to infer derived alleles. Additionally, observed, the Africans have the highest nucleotide

diver-sity and the largest number of population-specific SNPs, FayandWu’s (2000)Hstatistic can detect departures in the frequency spectrum due to recent hitchhiking at 11. However, only 2 of these 11 are singletons, so

(5)

(6)

TABLE 2 significant at the 5% level, so the equilibrium model of

neutral evolution cannot be rejected. However, it should _{Results of HKA tests} be noted that the small size of the individual populations

(n ⫽ 20) limits the power of these tests (Simonsen et _HFE_{noncoding region} al. 1995). The pattern observed for the different

popula-Population na _Sb _Dc _P_(HKA)d

tions is informative, though, since negative values of

Tajima’s and Fu and Li’s test statistics indicate an excess Africa 20 29 76.6 0.89

Asia 20 27 76.6 0.36

of low-frequency polymorphisms, which is expected

un-Europe 20 26 75.5 0.46

der a model of human population growth from its

ances-tral size. Only the Asian population shows a slight excess a_{Number of chromosomes.}

of low-frequency polymorphisms. Among their variable b_{Number of segregating sites in humans.} c_{Average pairwise divergence from chimpanzee.}

sites, 17 have a minor allele frequency of 10% or less

d_{Probability of the HKA test using the pooled data from 10}

in Asians, with a comparison to chimpanzee sequence

noncoding regions (Frisseet al. 2001) as the reference locus. implying the rare alleles are derived. However, 8 of the

remaining 10 derived SNP alleles have a frequency in Asians of⬎50%. It is possible that a selective sweep or

mologous sequence in mouse. Sa´nchez et al. (1998) hitchhiking has boosted these 8 SNP alleles to high

reported conservation between human and rodent in frequency. This pattern in Asians may be consistent with

theHFE promoter region, and we estimated conserva-drift and a stronger founder effect than is seen in the

tion with the published mouseHFEsequence (GenBank other populations. For the African, European, and

AF007558) across our entire 11-kb region, using avail-pooled populations, these test statistics show no clear

able global alignment and visualization tools (Mayor sign of population expansion when all observed SNP

et al. 2000). Because the extent of divergence for non-alleles are assumed neutral.

coding sequence between human and mouse prevents Divergence from chimpanzee and haplotype

varia-the creation of a definitive global alignment, one cannot

tion:The SNP alleles in the total sample of 60

chromo-calculate a net divergence in the same way one would somes were found to occur in 18 distinct haplotypes (19

for the human/chimp comparison. Aside from the ex-if the singleton indel is included). These haplotypes are

ons, two conspicuously conserved regions were found displayed in Figure 1 along with a haplotype composed

in noncoding sequence, one 74.6% identical over 114 of the ancestral state of each allele inferred from

chim-bp in the 3⬘UTR and one 74.5% identical over 188 bp panzee (P. troglodytes). The chimpanzee sequence for

in intron 5. These regions are significantly longer (P⬍ the complete region revealed 71 fixed differences from

0.001) than what is expected for unconstrained regions human, including 69 SNPs, a 2-bp indel, and a complex

of equal or higher identity due only to common ancestry mutation involving a base change and a single-base indel

and given the observed human/mouse KS for HFE at a neighboring site. Only one fixed difference was

(0.65). They provide strong evidence that some func-found in the coding region, and this was a synonymous

tional constraint exists outside the coding region that change. The average number of nucleotide

substitu-could contribute to the low divergence observed be-tions per site between human and chimpanzee is 0.690%

tween human and chimp. The 3⬘UTR conserved region (0.750% for noncoding sequence). A recent analysis

contains two fixed differences between human and of divergence levels between humans and chimpanzee

chimp while neither region contains polymorphisms in reports an average distance of 1.03⫾0.04% for introns,

humans, although divergence with chimp and polymor-from 32 loci, with a combined length ⬎41 kb once

phism within human throughout the whole region are repetitive regions were removed (ChenandLi 2001).

too low to provide evidence for or against functional The same calculation for nonrepetitive sequence inHFE

constraint of specific subregions. introns (4032 bp) is 0.623%, which is significantly lower

In addition to functional constraint, a low neutral than this average under a model where the mutation

mutation rate could contribute to the low divergence. rate is constant among sites (P ⫽0.006). However,

Both mutation rates and sequence divergence are in-tron divergence values in 4 of the 32 loci from the Chen

fluenced by G ⫹ C content (Wolfe et al. 1989), but and Li study (including one region of 9556 bp) are lower

HFE has an intermediate G ⫹ C content (45.1%) so than that forHFE, suggesting that the low divergence

that low divergence caused by a low mutation rate is observed forHFEintrons is not exceptionally low

not expected. If the low divergence did reflect a rela-tive to the actual distribution of intron divergence

tively low neutral mutation rate, then we might expect values.

to see a level of polymorphism lower than the average The low observed divergence might result from a high

level observed. HKA tests (Hudsonet al. 1987) compar-average degree of constraint for the wholeHFEregion,

ing the noncoding portion of theHFEregion with the which is composed primarily of introns, untranslated

10 “locus pairs” ofFrisseet al. (2001) for each popula-regions (UTRs), and intergenic sequence. To address

(7)

and thought to evolve neutrally, providing a suitable lation expansion is apparent based on haplotype diver-sity in each population.

reference for the HKA test. Although HFEhas a

poly-morphism to divergence ratio higher than that of the Haplotype network:Figure 2 displays an RM network of haplotypes constructed from the pooled samples. Most pooled locus pair data, particularly in Asians and

Euro-peans, at least 1 locus pair has a ratio higher than that haplotypes are unique to one particular population, since the relatively small sample size of each population ofHFEin each population. None of the HKA tests are

significant, so we cannot conclude that the ratio of poly- reduces the chance of including rare haplotypes in each population sample. But other features, such as a branch morphism to divergence is significantly higher forHFE.

For every SNP in humans, one of the alleles was pres- that leads to four haplotypes found exclusively in Afri-cans and representing 40% of all African samples, sug-ent in the homologous position in chimpanzee. In all

but three cases, the more common SNP allele in the gest the population differentiation seen in this sample may be real. The branch leading to the chimpanzee pooled sample corresponds to the inferred ancestral

allele. For these exceptions (SNP alleles 11204C, 519A, haplotype (Figure 2, arrow) contains the fixed differ-ences as well as site 11,204, which is polymorphic in and 7451A), the derived allele frequencies are 58, 67,

and 72%, respectively. Inference of ancestral state based humans. The presence of recombinant haplotypes com-plicates the inference of haplotype relationships and on only one outgroup can be incorrect, but at least 3

derived neutral alleles out of 42 are expected to have results in mutations that have occurred only once in the history of a sample to be displayed multiple times ⬎50% frequency in a sample of this size. No haplotypes

in humans have the same configuration at the 41 poly- in the network. Of the 43 mutations inferred from the human sample (including the indel), 10 are found twice morphic sites as has the chimpanzee; that closest to

this configuration is haplotype 9, found exclusively in on the network. Excluding the loop, 5 mutations show up twice on the network, indicating either recurrent Africans, which differs at 3 polymorphic sites and has

an additional 71 fixed differences from the chimpanzee mutations or recombination events. The chance of re-current mutation is appreciable, since 144 CpG sites are sequence.

Because the C282Y hemochromatosis mutation was found in the region studied and two nucleotide sites segregate three alleles each. In fact, CpG sites have a not observed in the random population samples, we

sequenced two Caucasians known to be C282Y homozy- mutation rate that is estimated from our data to be⬎15 times higher than that of other nucleotide sites, similar gotes and observed that this mutation occurs on the

haplotype 3 background. Haplotype 3 is found in all to results for theLPLgene (Templetonet al. 2000). A similarly high rate of mutation at these sites is apparent three continental populations and is the second most

common haplotype in Europeans. No additional poly- from the divergence data with chimp. Both triallelic SNPs are found in CpG sites, with 1 of the 2 inferred morphisms were discovered by sequencing the complete

11,214 bp of these two homozygotes (four chromo- mutations at each site consistent with a transition from 5-methylcytosine to thymine. Of the diallelic sites that somes), consistent with the previous conclusion (e.g.,

Ajiokaet al. 1997) that samples carrying C282Y have a occur more than once in the network, 3 out of 10 are CpG sites but none are consistent with this type of transi-relatively recent common ancestry.

Number of haplotypes:Fu’s (1997)Fsstatistic compares tion. This makes recombination a more likely cause of the observed homoplasy. By pruning certain haplotypes the observed number of haplotypes in a sample to the

number expected assuming an infinite-sites model of from the network we can reduce the number of homo-plasic sites and find evidence for which haplotypes are mutation under neutrality and no recombination and

is useful for detecting population growth or hitchhiking. recombinant. Potential recombinant haplotypes in-clude 17 and 19, both occurring only once and generat-In each population as well as the pooled sample, no

excess haplotype variation is observed atHFE, as seen ing recurrent mutations. Haplotypes 1, 12, and 18 may all stem from one or more recombination events, as by nonsignificantFsvalues for all populations (see Table

1). In fact, in each case Fs is positive and therefore their exclusion will remove most instances of recurrent mutation. Potentially ancient recombination events that indicates a deficit of haplotypes given the observed level

of nucleotide diversity. This finding is surprising, as created haplotypes found at high frequency today make it difficult to produce a more accurate genealogy of both recurrent mutation and recombination have likely

affected the samples, and both are expected to produce HFE alleles. But the network facilitates the design of strategies to inferHFEhaplotypes from samples by typ-additional haplotypes. The deficit of haplotypes in this

case suggests that either recombination and recurrent ing only a few polymorphisms.

Population subdivision:To investigate population

dif-mutation have not increased the haplotype diversity of

these samples greatly or other forces have kept the num- ferentiation at HFE, we describe the unequal distribu-tion of variadistribu-tion among populadistribu-tions. In the sample, ber of observed haplotypes low. However,Strobeck’s

statistic (1987), which tests for the opposite pattern of 19 (44%) derived SNP alleles were found in all three populations, while 11 were restricted to African popula-observing too few haplotypes, is also not significant for

(8)

popula-Figure 2.—RM network of HFE

haplotypes. Mutational relationships are indicated by lines linking the 19 unique haplotypes, represented in the network as circles. The size of each circle is proportional to the rela-tive frequency of the haplotype in the total sample. Circle fill patterns show in which population(s) each haplo-type was observed, with pie diagrams used for haplotypes found in multi-ple populations. Mutational differ-ences between haplotypes are indi-cated on the branches of the network (amino acid replacement mutations are blocked, and mutations found on the network more than once are underlined). The arrow points to the node where the chimpanzee haplo-type connects to this network.

tions (a total of 20 or 47% restricted to one population). Recombination and linkage disequilibrium:Since we have unambiguously determined the haplotypes atHFE, Also, European and Asian populations shared 4 SNP

alleles that were not found in Africa, while Africans did we are more confident in making conclusions about the evidence for recombination and LD in the region. When not share any SNP alleles with only Europeans or Asians.

This supports the shared ancestry of the Asian and Euro- four gametic combinations are found in a sample of two-site haplotypes, this is an indication of recombination pean samples after their split from African populations,

which is also apparent since Europeans and Africans (crossing over between the two sites or gene conversion) or repeated mutation. Even with moderate mutation share 1 haplotype in common and Europeans and

Asians share 2. When alleles are grouped together as rate heterogeneity, one can assume the probability of repeated mutation at any one site is very small since this haplotypes, each haplotype is expected to have a more

limited distribution. We see 14 haplotypes (74%) probability is no greater than that of a single mutation at the same site. A moderate level of recombination is unique to single populations (7 in Africa, 3 in Asia, and

4 in Europe) while all three populations share only 2 consistent with the results of this four-gamete test, in which 25/528 site pairs (excluding singletons) have all haplotypes.

Wright’s (1931)FSTstatistic serves to quantify popu- four gametes found in the pooled sample (Table 4). A minimum number of recombination events (RM) can lation differentiation by expressing the genetic variance

among populations divided by the genetic variance of be inferred from the data to explain all instances of four gametes (Hudson andKaplan1985). The Asian the total population. Using an AMOVA analysis based

on polymorphic sites and in which our samples were and European populations show a number of site pairs with four gametes and, therefore, nonzeroRM’s (Table split into three continental groups (Weir 1996), we

estimate anFSTvalue of 0.23 (P⬍0.00001, by haplotype 1). Although no site pairs have four gametes in the African sample (RM⫽0), recombination may still have permutation). This value exceeds the average calculated

for many other genes (Cavalli-Sforzaet al. 1994). The occurred. In the pooled sample, one inferred crossing-over event is between sites⬍1 kb apart (6567 and 7451). FST estimate based on treating the region as a single

locus with many alleles corresponding to each different haplotype is 0.17 (P⬍0.00001). For both calculations,

TABLE 3 continental populations are found to be significantly

differentiated at HFE. Table 3 shows two estimates of Pairwise estimates of population subdivision (FST) pairwise populationFSTvalues. The comparison of

Afri-Africa Asia Europe

can and Asian samples produces the highest values. The Asian samples are distinguished by the extremely high

Africa — 0.352a _0.143

frequency (13/20) of haplotype 1, which is absent from _Asia _0.237b _— _0.151

Africans and at low frequency in Europeans. The African _Europe _0.060 _0.229 _— samples tend to carry haplotypes unique to Africa (15/

a_{Above-diagonal entries are generated from sequence data}

20), and some of these haplotypes are not closely related

as described in Hudsonet al. (1992).

to European and Asian haplotypes (see Figure 2). Both b_{Below-diagonal entries are generated by Arlequin AMOVA}

of these patterns contribute to the higher-than-average _{analysis on the basis of haplotype frequencies (W}_eir _1996; Schneideret al. 2000).

(9)

TABLE 4

Significant pairwise linkage disequilibria and four-gamete site-pair counts

African Asian European Pooled

Total pairwise four-gamete comparisons 378 351 351 780

Total excluding singletons 231 120 171 528

No. with four gametes 0 13 18 25

Total pairwise LD comparisons 378 351 351 780

No. with power to detect LD (P⬍0.001) 55 12 52 310

No. significant atP⬍0.001 35 8 21 86

All instances of site pairs with four-gamete types in which considered). The pooled sample has the most power to detect LD, but pooling can also cause spurious LD due the sites flank both sides of this interval (22/25) could

be explained by this one recombination event. only to allele frequency differences between popula-tions.D⬘values for all alleles that are found at least twice SinceRMgives only a lower bound on the amount of

recombination, we can use other methods to estimate in each of a pair of populations are highly correlated between populations (Africanvs. European,r⫽0.999; the population recombination parameter C (⫽ 4Nr,

whereNis the effective population size andris the per- Africanvs. Asian, r ⫽ 0.888; Asian vs. European, r ⫽ 0.718), with no cases of significant LD in opposite direc-locus recombination rate per generation). Estimators

ofCthat use patterns of sequence variation can avoid tions for pairs of populations. The high correlations are not surprising since most pairs of alleles are in complete inaccuracies due to local variation in recombination

rates found in estimates based on observed crossing- LD in each population.

Figure 3 shows the location of site pairs in significant over events between distant markers.CHRM, which uses

the observed number of haplotypes andRM from data LD for the European population. LD appears evenly distributed throughout the region, with a minor concen-to estimateCunder a model without gene conversion,

performs well against other estimators of C (Wall tration of significant LD at the 5⬘ end of the gene, particularly among two sites in the first intron and two 2000). It is ⵑ5 for HFE in different populations,

al-though it is 0 in Africans, due to their lack of four- sites in the 5⬘ flanking region, which have a perfect gamete site pairs (Table 1).Hudson(2001) has recently

developed a composite-likelihood method for

estimat-ing C, ␳CL, which performs better than another

com-monly used estimator of C (Hudson 1987) and that Frisseet al. (2001) have used to estimate crossing over and gene conversion rates in the human genome. When the gene conversion rate is held at 0, then this method’s estimates ofC are in good agreement with those pro-duced by CHRM (Table 1). In the African sample, the likelihood of the␳CLestimate is not much greater than that of␳CL⫽0, indicating little evidence for recombina-tion and consistent with the CHRM value. This result is unusual, as the study ofFrisseet al. (2001) found that this same estimate for the pooled data of 10 loci is much higher in Africans than in non-Africans. Their higher population recombination parameter estimate in Afri-cans is also consistent with the finding that LD decays more rapidly in Africans than in non-Africans ( Tish-koff et al. 1996, 1998, 2000; Kidd et al. 1998, 2000;

Mateuet al.2001;Reichet al. 2001). _Figure_{3.—Significance of pairwise linkage disequilibrium} A moderate level of LD is observed throughout the _{for the European samples. Cell shading highlights site pairs} with higher significance levels (as assessed by Fisher’s exact region, consistent with the estimates of the population

tests without any correction for multiple testing). Singleton recombination parameter. Due to the low frequency of

sites have been removed. Site 7633 is triallelic in Europeans; most polymorphisms and our modest sample size, most

given our data we infer the order of the two mutations at this pairwise LD comparisons (Table 4) do not have the _{site to be C to G and then G to A. We therefore code all} power to detect significant LD at the 0.001 level (a _{7633A alleles as 7633G alleles with another mutation at the}

(10)

making them more informative in resolving which evo-lutionary forces have affected patterns of variation and governed the fate of alleles that alter the amount or function of a protein. Another significant aspect of our study is the experimental determination of haplotypes for an autosomal gene. These haplotypes provide a level of resolution greater than that of SNPs alone when drawing conclusions from genomic variability.

Summary ofHFEvariation:The results of any survey

of population variation can be roughly separated into three categories: the level of variation, the frequency spectrum of that variation, and the haplotype structure of the variants. These categories are not independent, as they all reflect the underlying population history of a sequence of DNA, but they do capture different as-pects of the data. Before this study, we had little informa-tion about the level of nucleotide variainforma-tion atHFE. Pro-tein polymorphisms seemed few and rare, but this could

Figure4.—Pairwise linkage disequilibrium as a function of

be due to high conservation imposed by the protein physical distance. LD is measured by |D⬘|, which varies between

0 and 1, calculated for the pooled sample. Solid circles show function and the slightly deleterious nature of most |D⬘| only for pairs of polymorphic sites with both minor allele _{amino acid substitutions. The}_HFE _{gene is thought to} frequencies at 25% or greater. Solid triangles represent the _{lie in or near a region of low recombination, as}_Malfroy average of these points in windows of 1 kb. Open triangles

et al. (1997) have estimated a rate of 0.2 cM/Mb for the represent the average |D⬘| in windows of 1 kb when

polymor-region between HLA and HFE. Despite this fact, we phisms with minor allele frequency at 10% or greater are

considered. found that its level of polymorphism is about average for the genome. Both hitchhiking and background se-lection are expected to lower neutral variation in re-association (sites 9013, 10,047, 10,701, and 11,204). _{gions of low recombination and produce a positive} cor-Plots from the other populations (not shown) reveal a _{relation between levels of recombination and variation} similar, even distribution of LD, providing no evidence ₍_{Maynard Smith}_and_Haigh_1974;_Charlesworth_et of a recombination hotspot within this region. The rate _al_{. 1993). Evidence for this correlation in humans is} of decay of LD with distance varies depending on the _{debated, but the increasing number of polymorphism} measure of LD used and the minimum frequency of _{studies at individual loci and SNP data gathered from} variants included in the analysis. We have plotted |D⬘| _{the Human Genome Project may resolve this debate} from the pooled populationvs. distance for variants at _{(reviewed in}_Nachman_{2001). If the correlation holds} 25% frequency or greater (Figure 4). This plot indicates _{in humans, this suggests that either recombination in} that LD falls to one-half of its maximum value at ⵑ6 _{and directly flanking} _HFE_{is not severely reduced or a} kb, a rate of LD decline that lies in between that of the _{simple model of either genetic hitchhiking or} back-ACE and LPL regions (Figure 1 in Przeworski et al. _{ground selection may not apply to this region. This} 2000). Intermediate frequency variants tend to be the _{second alternative might result from the nature of} diver-oldest ones in the total population, and therefore time _{sifying selection acting on the linked HLA region. This} has allowed more recombination events to break up _{could produce a deeper genealogy for the} _HFE _locus their association. For lower-frequency variants, almost _{and raise its level of variation. When contrasted with} all site pairs have |D⬘|⫽1. As expected, when a broader _{the below-average divergence from chimpanzee,} _HFE range of SNP frequencies is considered (10% and _{variation appears slightly increased. This is not reflected} higher, see Figure 4), the distance at which LD reaches _{in the HKA test and due to the stochastic nature of the} one-half of its maximum value substantially increases _{mutation process may not have any real biological basis.}

(⬎11 kb). _{Test statistics reveal no major deviations from the}

equilibrium neutral expectation in the frequency spec-trum of SNP alleles, although when the haplotype struc-DISCUSSION

ture is considered, the Asian population shows an un-usual pattern that is not seen in Africans or Europeans We have performed a full resequencing survey of

nu-cleotide variation at the HFE locus using nonclinical and may be consistent with a founder effect or hitchhik-ing. Thus,HFEprovides no evidence for the long-term samples from three major human population groups.

The benefit of including noncoding variation in the growth of the human population. However, on the basis of their study, Frisseet al. (2001) conclude that non-study is that these polymorphisms are often more

(11)

sum-marize results that support a bottleneck in these popula- been acting on the protein over the past few million years, although this does not exclude the possibility of tions but point out that more complicated demographic

much more recent or weaker selection on polymor-scenarios must be invoked to account for all of their

phism. non-African data. This appears true of theHFEdata as

The effect of sampling strategy:The structure of our

well, as our non-Africans do not show the expected

population samples represents a compromise between deficit of rare variants produced by a bottleneck.

sampling intensely from a few populations and broadly HFE haplotype structure reveals some evidence for

surveying many populations. For a number of analyses, recombination, although fewer haplotypes than

ex-samples are pooled on the basis of their continental pected are observed on the basis of the number of

origin as is frequently done for humans. This can have variants.PrzeworskiandWall(2001) have noted that

an effect on results when significant genetic subdivision estimates ofC derived from sequence variation for

hu-exists among populations within a continent. Many ge-man genes tend to be much higher than those based

netic surveys are consistent with subdivision stronger in on experimentally measured crossing-over rates. This

Africans than in Asians or Europeans (e.g.,Jordeet al. pattern is true for theHFEgene if we assume an effective

2000), although Zhao et al. (2000) suggest that the population size of 104 _{and a crossing-over rate that is}

frequency spectrum of variation of a presumably neutral lower than the genomewide average (0.2 cM/Mb for

noncoding region on chromosome 22 indicates the op-the region between HLA andHFE;Malfroyet al. 1997).

posite pattern. As implied by the study ofFu(1996) and But, if the crossing-over rate in theHFEgene were 1 cM/

demonstrated byYuet al. (2001), population subdivision Mb (five times the estimated value), then the estimate of

tends to reduce the proportion of low-frequency variants Cwould be 4.5, in good agreement withCHRM. Recent

and therefore make Fu and Li’sDpositive. A positive studies (Ardlieet al. 2001;Frisseet al. 2001;

Przewor-DFLis found only in our African samples, while only the skiandWall2001) find evidence that actual sequence

Africans have a significantFSTvalue when it is calculated data sets on an intragenic scale are more likely under

for individual populations. This indicates that the effect a model of gene conversion and crossing over than

of population subdivision within continents may be lim-under a model of crossing over alone. The data forHFE

ited to results from these samples. Additionally, a num-do not provide sufficient power to reject a model of

ber of points indicate that our sampling scheme is not crossing over without gene conversion, although gene

solely responsible for the observed patterns of variation conversion is known to have played a role in the allelic

within and between populations. First, for each conti-diversity of HLA genes (e.g.,Zangenberget al. 1995).

nental group the sampling scheme is similar, with ap-Our study detects the second most common HFE

proximately one-half of the samples coming from one variant, H63D, at a moderate frequency in Europeans.

focal population and the other one-half pooled from Other reports have found this variant outside of Europe

several populations (or in the case of Europeans, from at a frequency consistent with gene flow from the

Medi-samples of highly mixed ancestry). Second, unusual pat-terranean region. We find this mutation on two different

terns in our data are not due to the choice of the focal haplotypes, consistent with the conclusion that this

al-population for each continental group. In general, focal lele has an origin much older than that of C282Y. The

populations have slightly fewer SNPs and haplotypes fact thatHFEvariation fits an equilibrium neutral model _{than the other samples from the same continent, as} does not conclusively resolve the question of the history _{would be expected given some amount of population} and potential fitness effects of HFE amino acid polymor- _{differentiation within each continent. For example, the} phisms. Amino acid polymorphisms represent only a _{Asian samples have the lowest nucleotide diversity. This} small proportion ofHFEvariation. Additionally, if the _{does not result from low diversity found only in the focal} C282Y allele has been the target of positive selection, _{Chinese population, as the non-Chinese Asian samples} it is still far from fixation in any population, and there- _{have only a slight increase in nucleotide diversity} fore we expect the signature of this selection will be _(0.00058_vs_{. 0.00053 for the Chinese). The Asian sample} very subtle. Diversifying selection rapidly changing the _{is also characterized by low haplotype diversity, even} frequency of alleles at linked HLA loci could have af- _{given their low nucleotide diversity. The Chinese} sam-fected the frequency of several alleles at theHFElocus. _{ples contain as many different haplotypes (four) as the} But after investigating the pattern of HFE haplotype _{non-Chinese Asian sample, and the high frequency of} diversity, only haplotype 1 in the Asian populations (dis- _{haplotype 1 is not unique to the Chinese sample (0.6} cussed below) indicates the lack of variation relative to _{in Chinese}_vs_{. 0.7 in non-Chinese Asian samples).} How-the frequency of an allele class expected under this type ever, whether different sampling designs can lead to of scenario, providing little evidence that the C282Y quite different conclusions about human diversity is a allele has increased in frequency due to this effect. An question worthy of further study.

(12)

alli-Sforzaet al. (1994) have reported average values between estimates of the effective population size based on variation vs. recombination data would suggest a ofFSTfor a large number of DNA polymorphisms (0.139)

departure from an equilibrium neutral model of evolu-and non-DNA polymorphisms (0.119). In most cases,

tion. This finding is remarkable, since similar depar-less differentiation is seen between populations within

tures are not often found in African populations (e.g., continents than between continents, consistent with

Frisseet al. 2001).Wall(2001) has reported that popu-simple isolation by distance.Wakeley(1999) has found

lation structure and population bottlenecks can lead to evidence that, as might be expected, the level of

migra-underestimates of the population recombination pa-tion in humans underwent an increase in the past. At

rameter. Our finding could result if our sampling strat-theHFElocus, significant genetic differentiation is seen

egy has captured population structure in Africa more among continental populations, andFSTis higher than

than previous studies. However, of the above-cited refer-the average values reported by Cavalli-Sforza et al.

ences, most share our strategy of broadly sampling mul-(1994). When FSTforHFEis recalculated by including

tiple populations within continents. only the Mbuti, Chinese, and Utah samples it is actually

The Asian samples provide a more striking contrast higher (0.27vs.0.23), so that this high value is not an

to the other populations. They have the lowest variation, artifact of the diverse sample composition from each

due to both fewer observed haplotypes and the low continent. However, these populations do not

necessar-frequency of many SNPs. Over 60% of SNP alleles in ily represent all of the variation within each continent,

Asians are rare (at 10% frequency or lower). Remark-making a comparison with the results from a larger

ably, when the direction of mutation is inferred from number of populations difficult.

the chimpanzee sequence, nearly 30% of Asian-derived The comparison of variability in different populations

SNP alleles are at a frequency⬎60%. This proportion can provide evidence for local adaptive evolution but

of high-frequency-derived alleles could result from a can be complicated by population history. The greater

past hitchhiking event, although the H statistic does variability of populations from sub-Saharan Africa is

not reach significance. The low number of haplotypes seen in almost every study, and African samples have

observed in Asians and the predominance of haplotype been contrasted to non-African samples to reveal

differ-1 (rarely found outside of Asia) suggest that this haplo-ences in their population history. In a recent study of

type has risen to a high frequency rapidly since the split

⬎300 genes (Stephenset al. 2001), the

African-Ameri-of the Asian populations from the other populations can sample was found to have⬎1.5 times the number

studied. The large number of low-frequency variants of both SNPs and haplotypes than either the Asian or

could mean that, had a selective sweep occurred, it the Caucasian sample. Also, the African-American

sam-happened long enough ago that many new segregating ple had ⬎2 times the number of rare alleles than the

sites have since arisen and thus decreased the power other samples. Results fromHFEshow the most variation

of theHtest (Przeworski2002). However, when the in African samples, but this difference is minor, with

pattern of haplotypes in all samples is considered, the Asians and Europeans having nearly as many SNPs.

Afri-haplotype diversity of Asia is not consistent with muta-cans also have fewer rare SNPs compared to the other

tion arising solely from the haplotype 1 background, populations, in contrast to the Stephens study

(popula-but instead suggests haplotype 1 never reached fixation tion subdivision within our African samples could con- _{in Asians or migration from other populations has} intro-tribute to this result). Thus, the contrast between Afri- _{duced new haplotypes. The pattern of haplotype} varia-can and non-Afrivaria-can samples is very subtle, becoming _{tion in Asians is also consistent with drift following a} apparent only in the measure of nucleotide diversity _{relatively strong founder event for this group.}

since the African SNP alleles tend to be shifted toward _{The high frequency of haplotype 1 reveals the} advan-intermediate frequencies. Due to the similar level of _{tage of haplotype data over SNP sharing when assessing} variability in the different populations and the finding _{the similarity of populations. Haplotype 1 can explain} of SNPs and haplotypes specific to each population, _{a great deal of the population subdivision seen at}_HFE non-African variation is not just a subset of African varia- _{and can provide clues to how the interaction of selection}

tion. _{with population history may have differed for individual}

(13)

Rochetteet al. (1999) have observed this allele at 8% relative to other possible explanations of the allele’s high frequency given its young apparent age.

in French, 20% in Sri Lankans, and 32% in Burmese.

This report is consistent with our observation of haplo- _{We thank all those who agreed to donate DNA for this study and} type 1 at low frequency in Europeans and also suggests Ami Rice for help collecting samples. Additionally, we thank R. Ajioka and L. Jorde for providing European DNA samples from

hemochro-an increasing gradient in this allele moving east in

Eu-matosis pedigrees; D. Ledbetter for the chimpanzee sample; J. Fay

rasia.BeutlerandWest(1997) have also found a high

and R. Hudson for providing computer programs; E. Stahl, M.

Fuller-frequency of this SNP allele in Asians (63%), although _{ton, and members of A. Di Rienzo’s laboratory for helpful discussions;} they observe it at nearly 13% in Europeans and at 15% J. Comeron for help with the analysis of divergence from chimp and mouse; and J. Comeron, A. Di Rienzo, K. Dyer, M. Hamblin, and two

in a small sample of African-Americans. These higher

anonymous reviewers for helpful comments on the manuscript. This

frequencies outside of Asia make the Asian-specific

ori-work was supported by National Institutes of Health grant GM39355

gin of SNP 4600G less likely. The Beutler and West study _{to M.K. and a National Science Foundation Doctoral Dissertation} infersHFEhaplotypes defined using SNP 4600 and two _{Improvement Grant (DEB-0073297) to C.T. and M.K. C.T. was} par-additional sites. In Asian samples 4600G is always (34/ tially supported by a Howard Hughes Medical Institute predoctoral fellowship and by National Institutes of Health training grant T32

34) found on a haplotype consistent with our haplotype

GM07197 (genetics and regulation).

1, while in the Caucasian samples, 4600G is inferred to appear on haplotype 1 in 7 of 9 cases. This pattern is consistent with a more recent increase in 4600G in Asia,

LITERATURE CITED since it has not had time to recombine onto other

haplo-types, and supports the idea of a founder effect following Ajioka, R. S., L. B. Jorde, J. R. Gruen, P. Yu, D. Dimitrova et al., 1997 Haplotype analysis of hemochromatosis: evaluation

a bottleneck or a hitchhiking event.

of different linkage-disequilibrium approaches and evolution of

Conclusions:The discovery of these additional poly- _{disease chromosomes. Am. J. Hum. Genet.}_60:_1439–1447.

morphisms in theHFEregion and the haplotypes they Ardlie, K., S. N. Liu-Cordero, M. A. Eberle, M. Daly, J. Barrettet al., 2001 Lower-than-expected linkage disequilibrium between

create may help identify other alleles that have an effect

tightly linked markers in humans suggests a role for gene

conver-on the irconver-on regulaticonver-on phenotype. Several polymor- _{sion. Am. J. Hum. Genet.}_69:_582–589.

phisms described in this study are found in the 3⬘UTR Bandelt, H. J., P. Forster, B. C. SykesandM. B. Richards, 1995 Mitochondrial portraits of human populations using median

net-of the messenger RNA and could conceivably affect

works. Genetics141:743–753.

mRNA stability or levels of protein translation. Regula- _{Barton, J. C., R. Sawada}_-_{Hirai, B. E. Rothenberg}_and_{R. T. Acton}_, tory regions that affect levels of transcription can be 1999 Two novel missense mutations of the HFE gene (I105T and G93R) and identification of the S65C mutation in Alabama

found in introns or flanking a gene, where most of our

hemochromatosis probands. Blood Cells Mol. Dis.25:147–155.

polymorphisms are found. Also, tightly linked regula- _{Beutler, E}_{., and}_{T. Gelbart}_{, 2000} _{A common intron 3 mutation}

(IVS3 -48c→g) leads to misdiagnosis of the c.845G→A (C282Y)

tory polymorphisms could be in disequilibrium with our

HFE gene mutation. Blood Cells Mol. Dis.26:229–233.

observed haplotypes. The effects of these

polymor-Beutler, E., andC. West, 1997 New diallelic markers in the HLA

phisms in regulatory regions are likely to be quite subtle, region of chromosome 6. Blood Cells Mol. Dis.23:219–229.

Beutler, E., T. Gelbart, C. West, P. Lee, M. Adamset al., 1996

but they could help explain the finding of

hemochro-Mutation analysis in hereditary hemochromatosis. Blood Cells

matosis in individuals that carry only one copy of a _{Mol. Dis.}_22:_187–194.

hemochromatosis-associated allele. Beutler, E., C. WestandT. Gelbart, 1997 HLA-H and associated proteins in patients with hemochromatosis. Mol. Med.3:397–402.

Finally, knowing more about the levels of variation

Brookes, A. J., H. Lehva¨slaiho, M. Siegfried, J. G. Boehm, Y. P. Yuan

and recombination atHFEwill help evaluate the pecu- _{et al.}_{, 2000} _{HGBASE: a database of SNPs and other variations in} liar pattern of high frequency and young age estimated and around human genes. Nucleic Acids Res.28:356–360.

Camaschella, C., A. Roetto, A. Cali, M. De Gobbi, G. Garozzoet

for the C282Y allele. Evidence for intragenic

recombina-al., 2000 The gene TFR2 is mutated in a new type of

haemochro-tion at HFE has been sparse. Our estimates of local _{matosis mapping to 7q22. Nat. Genet.}_25:_14–15.

recombination rates based on sequence variation allow Carella, M., L. D’Ambrosio, A. Totaro, A. Grifa, M. A. Valentino

et al., 1997 Mutation analysis of the HLA-H gene in Italian

a reevaluation of the high LD around the C282Y allele,

hemochromatosis patients. Am. J. Hum. Genet.60:828–832.

providing support for its young age. Although this pat- _Cavalli_-_{Sforza, L. L., P. Menozzi}_and_{A. Piazza}_{, 1994} _{The History}

and Geography of Human Genes. Princeton University Press,

tern seems consistent with a selective advantage for

Princeton, NJ.

C282Y, until an appropriate population genetic test for

Charlesworth, B., M. T. Morgan and D. Charlesworth,

positive selection is performed, alternative causes of the ₁₉₉₃ _{The effect of deleterious mutations on neutral molecular}

variation. Genetics134:1289–1303.

pattern such as drift or a population bottleneck or

Chen, F.-C., andW.-H. Li, 2001 Genomic divergences between

hu-growth cannot be ruled out. We have begun to use the _{mans and other hominoids and the effective population size of} polymorphisms and haplotypes reported here to study the common ancestor of humans and chimpanzees. Am. J. Hum.

Genet.68:444–456.

the extent of LD between HFE alleles besides C282Y

Comeron, J. M., 1999 K-Estimator: calculation of the number of

and markers in the several megabases aroundHFE. By _{nucleotide substitutions per site and the confidence intervals.} analyzing LD found around otherHFE alleles, as well Bioinformatics15:763–764.

de Villiers, J. N. P., R. Hillermann, L. LoubserandM. J. Kotze,

as around alleles produced by neutral coalescent

simula-1999 Spectrum of mutations in the HFE gene implicated in

tions for different demographic parameters, we can eval- _{haemochromatosis and porphyria. Hum. Mol. Genet.}_8:_1517–

1522.