Distribution of genome-wide
linkage
disequilibrium
based
on microsatellite
loci in the Samoan population
Hui-Ju
Tsai,
1Guangyun
Sun,
2Diane Smelser,
2Satupaitea Viali,
3Joseph Tufa,
4Li Jin,
2Daniel
E.
Weeks,
1,5*
†Stephen
T
.
McGarvey
6†and Ranjan
Deka
2†1Department of HumanGenetics, University of Pittsburgh, Pittsburgh, PA15261, USA 2Department of EnvironmentalHealth, University of Cincinnati, Cincinnati, OH 45267, USA 3Department of Health, Apia, Samoa
4Department of Health, PagoPago, AmericanSamoa
5Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA15261, USA
6International Health Institute and Department of CommunityHealth, BrownUniversity, Providence, RI02912, USA
*Correspondenceto: Tel:þ1412 624 5388;Fax:þ1412 6243020; E-mail: [email protected] †Thesethree authorshave contributed equally toand co-directedthis study.
Datereceived(in revised form):28th May 2004
Ab
str
ac
t
Whole genome-widescanning for susceptibility loci basedon linkage disequilibrium (LD)hasbeen proposed asapowerful strategyfor mapping commoncomplexdiseases, especiallyinisolatedpopulations.Werecruited389individualsfrom 175 familiesin the USterritory of AmericanSamoa, and96 unrelated individualsfromAmericanSamoa andthe independentcountry of Samoa in order toexamine background LD by using a10centimorgan (cM) mapcontaining381autosomaland18 X-linkedmicrosatellitemarkers.Wetestedthe
relationshipbetweenLD andrecombinationfractionbyfitting aregression model.We estimated aslopeof20.021 (SE0.00354;
p,0.0001).Basedon our results, LD in the Samoan populationdecays steadilyas therecombinationfractionbetweenautosomal markers
increases.Thepatterns of LDobserved in the Samoan populationarequitesimilar to thosepreviously observed inPalaubut markedly
contrast withthoseobserved inanon-isolated Caucasian sample,wherethere isessentially no marker-to-markerLD.Ouranalyses support the hypothesis of arecentbottleneck,which isconsistent withthe knowndemographic history ofthe Samoan population.Furthermore,
population substructuretests support the hypothesis that self-identified Samoans represent one homogenousgeneticpopulation.
Keywords:association mapping, linkage disequilibrium, isolatedpopulation, Samoa
I
ntro
d
u
c
t
i
on
Linkage disequilibrium (LD),the non-randomassociation between loci, canaid ingeneticmappingof complexdiseases. Ithasbeen proposed that underhigh-densitygeneticmaps, genome-wide LDscanning canbe a powerfulapproach for searching for susceptibilitygenesdetermining complex dis -eases.1 There have been severaldebates on the usefulness of
isolatedpopulations for mappingsusceptibilitygenesdeter -mining complex diseases2,3 and whether the extent of
genome-wide LD isindeedlarger in such populationscom -paredwith general populations.4 –6.Boehnke7notes that
negative findingsfroma few populations,or simulation results,8 should be elaboratedwith caution, and evenifthe extent of LD is similar across populations,the benefits of an isolatedpopulation living inarelativelyhomogeneous environmentandthe easeofstudy shouldnotbe ignored.9–13
The Samoans ofthe Western Pacificrepresent oneofthe bestexamples of anisolatedpopulation. Archaeologicaland linguistic evidence14 indicate arapid eastwardmigration of populations into the western Pacific from Southern China, whichtook place about 4,000 to 5,000 yearsbeforethe pr e-sent (BP). Byabout 3,000 yearsBP, archaeologicalevidence indicates thatPolynesianculturewasestablished and flourished inTonga and Samoa,well before furthereastward expan -sion.15,16ThisExpressTrain model of Polynesian settlement17
is supported by mtDNA data.18 –20 Analternativemodel,the
Entangled Bank,proposed byTerrell,21assertsaneighbouring
homeland for the Polynesians inMelanesia, in which the Polynesiansevolved ina complex nexus of interactionsamong the already settled Pacific islanders.Thisis supported by nuclearDNA data.22Our previous work basedon
the Samoans.23Samoanshad onlyfourhaplotypes out of
the 15observed in the entireregion of studycomprising South-East Asia, Melanesia, Micronesia and Polynesia. In other work withmicrosatellite data,we also reported a significant reductioningenetic diversityamong the Samoans comparedwithlarge cosmopolitan populations.24 –26Further
analysis ofmicrosatellite data indicated a smalleffective population size and associated bottleneck eventsduring Samoanhistory.27
Reconstruction of the populationhistory ofthe Samoan islandsisdifficult, and estimates of population sizethrough time, including at the timeof firstEuropeancontactand for the subsequent 100 years, remain debatable.28,29Nonetheless,
itisimportant todescribethe archaeologicalevidence and populationgenetic interpretations of Samoandemographic history. Theoriginal settlers of Polynesiawerethought tobe smallin number, however,the ideal nutritionaland disease ecologyallowedrapid andsustainedpopulationgrowthup until thetimeof first Europeancontact.16Detailed andwell
-dated archaeologicaland ethnohistoricalevidence indicates that prior to Europeancontact, Samoan villages werelocated atall levels on the mountain slopes (including at the top)in bothwhat is nowSamoa and AmericanSamoa.15,16 The
archaeologicalfindings of widespread population, beyondthe littoralarea, are consistent with estimates of the maximum populationdensity,orhumancarrying capacity, derived from ethnographicworkonagriculturalintensityandpopulation on otherPolynesianislands.16Basedon the wealthof archaeolo
-gicaland ethnographic data, contemporary scholarsassert that the pre-European contact population ofthe Samoanislands could nothave comprised fewer than 100,000 people and could have beenas large as 300,000 people.15,16
The Samoan population suffered asignificantdepopulation after Europeancontact,which hasbeenattributedlargely to the impact of introduced diseases.16The earliest reports of
Samoan population sizeweremade in themiddleofthe 19th century,manydecadesafterEuropeancontact, which occurred firstin 1722, and againin 1768 and 1787,with steadycontactestablishedonlyin 1836.From 1849 until 1900, the population size estimatesfor the Samoanislands vary between 30,000 to40,000individuals.28Throughout the19th century,therewere documented epidemics of infectious dis -ease, chiefly influenza, and estimates of high mortality attributableto thesewaves of disease.29,30The impact of
infectiousdiseasesisassumed tohavestarted soonafter Europeancontact in thelater 18th century.By the early 20th century,thepopulationhad againincreased.Approximate population sizesfor the Samoanislandsin the20th centuryare provided byMcArthur29for several timeperiods; these
include: about 39,800for the period 1900–1910;about 50,000in 1930; and about 69,500in 1940. Thus, it is quite clear thatamassive depopulation occurred in the Samoasafter Europeancontact and that populationgrowthwas stagnant until the early 20th century,whenitgrew rapidly.Based
on the2000census of AmericanSamoa(performedwiththe aidofthe US CensusBureau),thepopulation of ethnic PolynesiansinAmericanSamoa is55,704.Basedon projections from the2001 census of Samoa,the population of ethnic Samoansin 2003 wasestimated at approximately 165,000in independentSamoa.
The combination of genetic, archaeologicaland demo -graphic evidencestrongly suggests that the Samoan population wasestablished byarelatively small number of founders, has beenanisolated populationand experienced areductionin population size about 200–300 yearsago. Thesepopulation historyeventsarevery likely tohave influencedthepatterns of LD in the contemporary Samoan population.
In this study,we examinedthe distribution of genom e-wide LD inSamoa asa function ofrecombinationbasedon markerdata from a10centimorgan (cM)genome-widescan. We also tested forbottleneck effectsandpopulation
substructure.
Ma
t
e
r
ia
ls
a
n
d
m
e
t
h
o
d
s
S
u
bjec
ts
Asampleof390 individuals (201 males,189 females) from 177families was recruited inAmericanSamoausingthe Department of Health diabetes registryas part of a genom e-widestudy oftype2diabetes susceptibilitygenes.31Subjects
were asked about theirSamoanancestry in order to limit study participation to thosewho reportedthat allfour grandparents wereof Samoanethnicity,withoutEuropean orAsianancestry. Study protocols were approved by the InstitutionalReviewBoardofthe MiriamHospital, Prov i-dence, RI, USA.Writteninformed consent was obtained fromall participants.
We also sampled96 unrelated individuals (50 males, 46 females), 40 recruited fromAmericanSamoa and 56from Samoa.Noneofthese individualshad diabetesandtheyall self-reportedthatallfourgrandparents were Samoan.These samples were derived from our previous longitudinal study of adiposityand cardiovasculardiseaserisk factorsinAmerican Samoa and Samoa from 1990 to 1995.32,33
Ge
notyp
i
n
g
Buffy coats wereprepared from 10 ml of blood in the field followingstandard protocolsand shippedondryiceto the laboratoryinCincinnati.DNAwasisolated from buffycoats usingthe Puregene DNA isolationkit (Gentra SystemsInc) quantitated and arrayed in 96-well microtitreplates.
The genome scan was conducted with the Applied Biosystems Inc(ABI)PRISM linkagemappingset version 2, consisting of 400 microsatellite markers, with fluorescently labelled primers, spaced at an average 10cM distance between markers.We conducted multiplex polymerase chain reaction (PCR) amplification (three to five markers in each PCR reaction) in a 7.5ml final reaction volume containing ,20 ng of genomic DNA and ,0.8–1.0ml of AmpliTaq Golde DNA polymerase (5 U/ml). Initial incubation occurred for 12 minutes at 958C; the first amplification was carried out for approximately 10–15 cycles in the PE GeneAmpe 9600 thermal cycler using the following par -ameters: denaturation at 948C for 15seconds, annealing at 558C for 15seconds and extension at 728C for 30 seconds. The second amplification was carried out for approximately 20–23 cycles using the following parameters: first
denaturation at 898C for 15seconds, annealing at 558C for 15seconds and extension at 728C for 30 seconds; then denaturation at 948C for 15seconds, annealing at 558C for 15secondsand extensionat 728C for 30 seconds; and then a final extension at 728C for ten minutes and an overnight incubation at 48C.
Amplified DNA products underwent gelelectrophoresis onanABI377DNAsequencer using internal standard Gene Scan-400ROX (PE Applied Biosystems) for 2.5 hours at constant power (3000V, 60 mA, 200W) and at 518C. For quality control, a negative control and two positive control samples of known genotype [Centre du Etude Polymor -phisme Humain (CEPH) sample 1347-02] wererun on each gel. GeneScan 3.1 and Genotyper 2.5 (PE Applied Biosystems) software wereused for sizing and genotyping, respectively.
S
t
a
t
i
st
ica
l m
e
t
h
o
d
s
Measure of LD. We computed a multi-allelic version of the D0 statistic,34 whichwe define hereusing the notationfound in the GOLD program documentation.35 Consider two markers A and B. Let ni† be the number of haplotypes
carrying alleleiat locus A, n†j bethe number of haplotypes carrying allele j at locus B and nij be the number of hap -lotypescarrying allele i at locus A and allele jat locus B. If N is the total number of haplotypes, then the allele fr e-quencies pi† and p†j and haplotypes frequencies pij can be estimated as:
pi†¼nNi†;p†j¼nN†j;pij¼nNij
The multi-allelic definition of D0 is then:
Dij¼pij2pi†p†j
Dij;max¼
min½pi†p†j;ð12pi†Þð12p†jÞ* Dij,0
min½ð12pi†Þp†j;pi†ð12p†jÞ* Dij$0
(
D0¼ i
X
j
X
pi†p†jDjDijj ij;max
Ha
plotyp
e f
r
e
qu
e
n
c
y
e
st
i
m
a
t
i
on
a
n
d LD
t
e
st
i
n
g
Forboththe Samoanandthe NASCsample,weusedthesame sets of381 autosomal microsatellitemarkersand18 X-linked microsatellitemarkers toevaluate LD betweenall pairs of markers on thesame chromosome; therewere3,531autoso -mal marker pairsand153X-linkedmarker pairs.The‘ldmax’ program,which is part ofthe GOLDpackage,35 was usedto estimate haplotype frequenciesforeachmarker pair,usingthe expectation-maximisation (EM)algorithm,36,37 and tocalcu
-latethemulti-allelicversion ofthe D0 statistic34 for the auto -somal markerdata and females’X-linkedmarkerdata(GOLD website). Since haplotypes of X-linked data for malescanbe establishedunequivocally,we computed the D0 statistic for
males’ X-linked datausing a function wewrote inR.38 Haplotype frequencies were estimated ignoring familial relationships— Broman39has shown that such estimates,
whilethey may beslightly less precise, areunbiased.40,41
When thesamplesize is small, Lewontin’sD0measure can be biasedupwards.42We correctedthisbiasby performing a
permutationanalysis. Wepermuted the allelesat the first marker of eachpair, calculated and recorded thenew D0,
repeatedthesetwo steps 1,000 timesand took the average of the D0 over 1,000 permutationsas the permuted D0.We also
recordedthe corresponding empiricalp-value foreachmarker pair. Wethencomputed anadjusted D0 by taking the differ -ence between the observed D0and the permuted D0; the adjusted D0,which we denote asD0
c;should be an unbiased
measureof LD. Teareetal.42 evaluated this permutationcor
-rectionand foundthatit works well under thenullhypothesis and isgenerallyan over-correction under alternative hypot h-eses,whichsuggests that the levels of LDpresented heremay beunderestimated.
Toexaminethe distribution of LD across the entire auto -somalgenome,weregressedtheD0
c from the autosomal
markersagainst the inter-marker recombinationfractions.We alsoinvestigatedthe distribution of LDon the X chromosome by usingtheD0
c measures obtained from male and female data.
Weused 5 Y-chromosomal short tandem repeat (STR) markers (DYS388, DYS390, DYS391, DYS394, DYS395)in 20 unrelated Samoan males to estimatethe diversity of Y haplotypes,whichwe computed as 12SPi;where Pi is the frequency ofthe ithobserved haplotype.
P
opul
a
t
i
on
b
ottl
e
n
eck
Apopulationbottleneck followed by populationexpansion createsanimbalance betweenheterozygosityand allelesize variance.Under such conditions,thevariancewill, foratime, be higher thanexpected.Wemeasuredthisbycomputingthe imbalance index (b) of Kimmeletal.43 using, afterdata
cleaning,371autosomal markersfor the Samoan population and 380autosomal markersfor the NASC population.
P
opul
a
t
i
on su
b
stru
c
tur
e
Using25unlinked autosomal loci,weused a correlated allele frequency model to make inferencesabout the underlying population substructure.We chosethe25unlinked loci from our microsatellitemarkerdata;all oftheselociwere inHardy– Weinberg equilibrium (it was necessary to useunlinkedloci becausethestatistical tests we employed assumethatall loci are unlinked).Wepickedtwo loci from chromosomes 1–3 that are at least 200cM apart,so that the two loci are essentially unlinked.For the other chromosomes,wepicked onelocus fromeach chromosome.For these analyses,weusedthe ‘structure’ program (structurewebsite) tofit clustering models with K¼1,2, and3clusters; the‘structure’ programhasbeen shown to produce accuratepopulationassignments with modest numbers of loci.44
Re
sults
Weused381autosomal microsatellitemarkersand 18 X-linkedmicrosatellitemarkers toevaluate LD betweenall pairs
ofmarkers on the same chromosome.Foreachpair ofmar -kers,we computed apermutation-adjustedmeasureof LD,D0 c;
which adjustsforbiasesinLewontin’sD0measure dueto small
samplesizes.For the autosomal marker pairs,we foundthatD0 c
declines with increasingrecombinationfractions between marker pairs.When we averagedthe D0
c measures within
recombinationfractionintervals,the averageD0 c in the
Samoansdecreased as therecombinationfractionincreased (Table1a).Ingeneral,15.87 per cent ofthe D0
c measuresare
significantlydifferentfrom zero.Bycontrast, in the NASC sample,the averageD0
cisessentially zeroandthepercentageof
significant valuesiscloseto thepercentage expected bychance (Table1a).Notethat thepercent significant isa function of samplesize and sointerpretation of apparent differencescan be confounded by samplesize differences; however,the Samoan samplesize andthe NASCsamplesize areofthesame order ofmagnitude.
We fitted aregression model toexaminethe relationship betweenD0
c and recombinationfraction, and found thatD0c
significantlydecreasesas the recombinationfractionincreases (Figure1); we estimated a slopeof20.021 (SE 0.00354;
p,0.0001).Ifweremovethe potentiallyinfluential outlier with aD0
cof0.38,theresults remainessentially thesame,with
aslopeof20.019 (SE 0.00346; p,0.0001).
We evaluatedwhether therewasheterogeneityin the D0 c
valuesacross the autosomalchromosomesby using analysis of covariance.The predictorsin the analysis werethe chromo -somenumber (treated asaset of indicatorfunctions)and inter -marker recombinationfractions.We didnot find evidenceof heterogeneityacrosschromosomes (F 0.96;df21;p¼0.509), whereas therecombinationfraction was significant (F 38.10; df1;p,0.0001).
For the18 X-linkedmicrosatellitemarkerdata,we didnot observe any steady pattern in the Samoans between the aver -aged LDmeasure and recombinationfractions, either in the maleorfemale data(Table1b).The averageD0
c valuesandthe
Table1a. MeanD0
candpercent significant (at the0.05level)asa function oftherecombinationfractionbetweenall possiblepairs of
autosomal markersdrawnfrom thesame chromosome.
Population Statistic Recombinationfractioninterval
,
,0.1 0.1–,,0.2 0.2–,,0.3 0.3–,,0.4 ..0.4
Samoa(N¼482) MeanD0
c 0.026 0.018 0.016 0.014 0.014
% significant 24.79 16.43 17.79 12.90 15.06
Palau*(N¼84) MeanD0
c 0.031 0.019 0.017 0.012 0.009
% significant 16.2 11.6 11.6 7.1 4.4
NASC(N¼333) MeanD0
c 20.001 0.001 0.000 0.000 0.000
% significant 4.48 3.97 3.87 4.44 4.63
percent significantaremuch higher in the Samoans thanin the NASC sample(Table1b). We alsofitted aregression model to examinethe relationship betweenD0
c and recom
bi-nationfractioninX-linked data.By regressing D0 c on
recombinationfractions,weobtainednegativeslopesin males and females, but thep-values oftheslopes werenot significant (datanot shown).
We estimated the heterozygosity of the autosomal ( X-linked) markers using data from the(female) unrelated in di-viduals and the first (female) sib from each family.Wethen comparedthe Samoanheterozygote frequencies withthe observed heterozygote frequenciesin the CEPH and NASC families.We found the average heterozygosity (0.67)in Samoandatawas 0.12 less thanin the CEPH families (0.79) and0.10 less thanin the NASC families (0.77).Eighty-seven percent ofmarkersin the CEPH data and 83 percent of markersin the NASC data had greaterheterozygosity than the correspondingmarker in the Samoandata; asign test showed that bothofthese differences were highly significant
(p,0.0001forbothtests).A previous studyin the Palauans found similar results.9
The imbalance index (b), anindex sensitivetobottleneck events,43wasestimated forboththe Samoanand NASC
populations usingourautosomal markers.In thepresenceof a bottleneck event,b isexpectedto exceed1.The imbalance indexin the Samoan populationis 3.86,which is muchlarger than thatin the NASCpopulation (b¼1.31), indicatingthat the bottleneck event that occurred in the Samoansisamuch morerecentevent.The bestimated in the NASCpopulation is very similar toanearlierestimation of1.33inEuropeans.43
It should benotedthat thepresenceofpopulation substructure would also leadto anelevated bvalue, which is not the case for the Samoan population, given thatit isgenetically more homogeneous than the other populations.
Using25unlinked autosomal loci,we also tested for population substructure.44Thenumber of allelesat these markers ranged fromfour to 17,with amedian often, in the Samoan population;and fromfiveto 18,with amedian of11, in the NASC population. Our population substructure an a-lyses strongly support the hypothesis that the Samoans rep -resent one genetic population, even thoughmembers ofour sample are drawnfrom two differentcountries.Basedon the ‘structure’ results, assuming auniform prior on thenumber of clustersK,weobtainaposterior probability of1.0 thatK¼1. Note also that we have enhanced homogeneityby selecting individuals with fourSamoangrandparents.Thevery low level Figure1.The adjustedlinkage disequilibrium measure,D0
of non-Polynesianalleles seenin our sample24 indicates that our selection schemereduced admixtureto very low levels.
Di
s
c
uss
i
on
Theresults wepresenthere arerelevant toboththe potential useof LDto mapdisease genesin the Samoan populationand toinferencesabout the evolutionaryhistory ofthe Samoan population. From our results,we find that LD, as measured usingmulti-allelic markers, extends over substantialdistances across thewhole genome inSamoans.When we dividethe LD valuesintogroups according to recombinationfractions, we observethat the average LD in mostintervalsis quitesimilar to that observed inPalau9 (Table1a). Compared withthe auto
-somaldata,we find a higher level of LD in the X-linked data (as would be expected, basedonits smallereffectivepopu -lation size), but when weregressLDon the recombination fractions,theslopesfor the X-linked data arenot significantly negative.Sincewe estimated LD for malesand females sep -arately, itis possiblethat thesamplesizes weretoo small.The patterns of LDobserved in the Samoansarequite different from thoseobserved in the non-isolated CaucasianNASC sample(Tables 1a and 1b),withthe NASCsample displaying essentially no marker-to-markerLD abovethat whichwould be expected bychance(except, perhaps, for markerscloseto eachother on the X-chromosome).
We have also typed 43SNPsfrom a fragment spanning 104 kbof DNAonchromosome21 to study the relationship betweenLD and physicaldistance betweenSNPmarkersin Samoans and four outbred populations,namely, Beninfrom
Nigeria, German, Japanese and Chinese, eachrepresentingthe four majorcontinental populations.45Theresults show that the Samoans havesignificantlyelevated D0valuescompared
with allfourcontinental populations throughout the 100kb region,withoutany sign of attenuation. Itis very likely that this pattern of LDwillextendmuch further than 100kb in the Samoans.Bycontrast, in theotherfour populations, D0shows a declining slopewhen plotted againstincreasing physical distance.
Basedon ourY-linked STR data,the Y haplotype diversity inSamoansis 0.855,which is lower than the observed diversity of0.96in the Palauan study;however, it iscompar -ableto other isolated populations.From the five Y-linked STR data in twoisolated Iberian samples (Basque and Cat a-lan),the diversities of Y haplotypesare0.85 and 0.89in Basque and Catalan,respectively.46
Our statistical tests on ourgenetic datasupport thescenario of arecentbottleneck, which isconsistent withthe known demographic history ofthe Samoan population.Furthermore, our population substructureresults support the hypothesis that self-identified Samoans represent one homogenousgenetic population.Theseresultsare consistent withour prior reports on the geneticstructureof the Samoans.24
Ourfinding of highlevels of genome-wide LD among Samoansare consistent withthemultiplelines of evidence fromgenetic, archaeologicaland demographic history research that indicatethat the Samoan population wasfounded bya relatively small number of founders,remained isolated for about 3,000 yearsand experienced areductionin population size about 200–300 yearsago.15,16,23–27The findings of
Table1b. MeanD0
candpercent significant (at the0.05level)asa function oftherecombinationfractionbetweenall possiblepairs of
X-linkedmarkers.
Population Statistic Recombinationfractioninterval
,
,0.1 0.1–,,0.2 0.2–,,0.3 0.3–,,0.4 ..0.4
Samoanfemales (N¼201) MeanD0
c 0.129 0.062 0.080 0.059 0.082
% significant 55.6 33.3 56.3 48.2 48.2
Samoan males (N¼249) MeanD0
c 0.116 0.050 0.071 0.036 0.067
% significant 44.4 38.9 43.8 37.0 41.0
Palaun males*(N¼54) MeanD0
c 0.123 0.041 0.041 0.020 0.026
% significant 44.0 0 13.6 13.6 21.4
NASC females (N¼161) MeanD0
c 20.002 0.000 20.002 0.006 0.002
% significant 12.5 7.69 9.09 4.55 9.52
NASCmales (N¼172) MeanD0
c 20.021 20.003 20.014 20.018 0.000
% significant 11.11 11.11 0 0 4.82
genome-wide LD in the Samoan populationaresimilar to levels of LDobserved inanother isolated Pacific population, namely, Palau of Micronesia,9as wellasanisolatedpopulation
from the CentralValley of Costa Rica.10ThepresentSamoan
findingsandtheseothers9support the hypothesis that LD
extendsfurtherinisolated populations than incontinental populations.4 –6
The results presented hereon patterns of LDusing arel a-tively sparseset ofmarkersindicatethat amorethorough characterisation of LDpatternsin the Samoans is likely tobe of interest.In particular, it would beof interest tocomparethe Samoan population withother isolatedpopulations thathave grown rapidly.Whilewe have begun togenerate higher density data in limitedregions, asdescribed above for chromosome21,45the results presented hereprovide importantbaseline informationabout thelevels of background LD, but providelittle informationabout the extent of‘useful’ LD, as required for genome-wide association scans.47 While
the precise definition ofwhat levels of LD areusefulfor association mapping remains debatable, in outbrednorthern European populations LD is thought tobe useful only overa relatively short rangeof 10–30kb.47The dramatic difference
between our resultsfor the Samoan populationandthose from the NASCsample favour the hypothesis that thelevel ofuseful LDwillbe higher in the Samoan population.
Ack
nowl
edg
m
e
nts
This researchwas supported byNIH grantsAG09375, HL52611, DK55406 and DK59642.Wethankthemembers ofthe Department of Health, Gov-ernment of Samoa andthe AmericanSamoanDepartment of Health for their assistance indata collection; local political officialsfor theircooperation;and theparticipantsfor their patience.Wewouldliketo thank DrJohnReveille, the PrincipalInvestigator ofthe North AmericanSpondylitisConsortium (NASC) projectforhisgenerous supportin sharingthe genotype data.The genotypingofthe NASCsamples was supported byNIH grant RO1-AR46208.Wethank DrNing Wangofthe CenterforGenome Information, University of Cincinnati, forherhelpincomputingthe imbalance indices.We thankthetwoanonymous reviewers whose commentshelpedus to markedly improvethis manuscript.
E
l
ec
tron
ic da
t
aba
s
e i
n
f
orm
a
t
i
on
URLsfordata presented hereinare asfollows: GOLD, Abecasisand Cookson, http://www.sph. umich.edu/csg/abecasis/GOLD/index.html, University of Michigan (accessed 20th April, 2004).
structure, Pritchard, J.K., Stephens, M.and Donnelly, P., http://pritch.bsd.uchicago.edu/software.html, University of Chicago, [cited 2004 April 20].
Refe
r
e
n
ce
s
1. Risch, N.and Merikangas, K. (1996),‘The futureof geneticstudies of complexhumandiseases’,ScienceVol. 273,pp. 1516–1517.
2. Kruglyak, L. (1999),‘Genetic isolates: Separate butequal?’,Proc.Natl.
Acad.Sci.USAVol. 96,pp. 1170–1172.
3. Peltonen, L., Palotie, A.and Lange, K. (2000),‘Useofpopulationisolates for mapping complex traits’,Nat.Rev.Genet.Vol. 1,pp. 182–190. 4. Taillon-Miller, P., Bauer-Sardina, I., Saccone, N.L.etal.(2000),
‘Juxta-posedregions of extensive andminimal linkage disequilibriuminhuman Xq25 and Xq28’,Nat.Genet.Vol. 25,pp. 324–328.
5. Eaves, I.A., Merriman, T.R., Barber, R.A.etal.(2000),‘The genetically isolatedpopulations of Finland and Sardiniamay notbe apanacea for linkage disequilibrium mappingof commondisease genes’,Nat.Genet. Vol. 25,pp. 320–323.
6. Lonjou, C., Collins, A.and Morton, N.E. (1999),‘Allelic association between marker loci’,Proc.Natl.Acad.Sci.USAVol. 96,pp. 1621–1626. 7. Boehnke, M. (2000),‘Alook at linkage disequilibrium’,Nat.Genet.
Vol. 25,pp. 246–247.
8. Kruglyak, L. (1999),‘Prospectsfor whole-genomelinkage disequili-brium mappingof commondisease genes’,Nat.Genet.Vol. 22, pp. 139–144.
9. Devlin, B., Roeder, K., Otto, C.etal.(2001),‘Genome-wide distri-bution oflinkage disequilibriumin thepopulation of Palauand its implicationsforgene flowinRemote Oceania’,Hum.Genet.Vol. 108, pp.521–528.
10.Service, S.K., Ophoff, R.A.and Freimer, N.B. (2001),‘The genome-wide distribution of backgroundlinkage disequilibriuminapopulationisolate’,
Hum.Mol.Genet.Vol. 10,pp.545–551.
11.Shifman, S.and Darvasi, A. (2001),‘Thevalueof isolatedpopulations’,
Nat.Genet.Vol. 28,pp. 309–310.
12.Mohlke, K.L., Lange, E.M., Valle, T.T.etal.(2001),‘Linkage disequili-briumbetween microsatellitemarkersextendsbeyond1cMon chromo-some20inFinns’,Genome Res.Vol. 11,pp. 1221–1226.
13.Hall, D., Wijsman, E.M., Roos, J.L.etal.(2002),‘Extended inter-marker linkage disequilibriumin the Afrikaners’,Genome Res.Vol. 12, pp. 956–961.
14.Bellwood, P.S. (1979),Man’sConquest ofthe Pacific: The Prehistory of SoutheastAsia and Oceania, Oxford UniversityPress, NewYork, NY. 15.Green, R.C., Davidson, J.M.and Bernice Pauahi BishopMuseum (1969),
ArchaeologyinWesternSamoa, Auckland Institute and Museum, Auckland, NZ.
16.Kirch, P.V. (2000),On the Roadofthe Winds: AnArchaeologicalHistory of
the Pacific Islandsbefore EuropeanContact, University of California Press, Berkeley, CA.
17.Diamond, J.M. (1988),‘Express train toPolynesia’,NatureVol. 336, pp. 307–308.
18.Sykes, B., Leiboff, A., Low-Beer, J.etal.(1995),‘Theorigins ofthe Polynesians: Aninterpretationfrom mitochondrial lineage analysis’,
Am.J.Hum.Genet.Vol.57,pp. 1463–1475.
19.Redd, A.J., Takezaki, N., Sherry, S.T.etal.(1995),‘Evolutionaryhistory ofthe COII/tRNALysintergenic9basepairdeletioninhuman
mito-chondrialDNAsfrom the Pacific’,Mol.Biol.Evol.Vol. 12,pp. 604–615. 20.Melton, T., Clifford, S., Martinson, J.etal.(1998),‘Genetic evidence for
theproto-Austronesianhomeland inAsia:mtDNA andnuclearDNA variationinTaiwanese aboriginal tribes’,Am.J.Hum.Genet.Vol. 63, pp. 1807–1823.
21.Terrell, J. (1988),‘Historyasa family tree, historyasanentangled bank: constructing imagesand interpretations ofprehistoryin the South Pacific’,
AntiquityVol. 62,pp. 642–657.
22.Martinson, J.J. (1996),‘Molecular perspectives on the colonization ofthe Pacific’, In: Boyce, A.J.and Mascie-Taylor, C.G.N. (eds.),Molecular
Biologyand HumanDiversity, Cambridge UniversityPress, Cambridge, UK, pp. 171–195.
23.Su, B., Jin, L., Underhill, P.etal.(2000),‘Polynesian origins: Insightsfrom the Y chromosome’,Proc.Natl.Acad.Sci.USAVol. 97,pp.8225–8228. 24.Deka, R., McGarvey, S.T., Ferrell, R.E.etal.(1994),‘Genetic
charac-terization of Americanand WesternSamoans’,Hum.Biol.Vol. 66, pp.805–822.
25.Deka, R., Jin, L., Shriver, M.D.etal.(1995),‘Populationgenetics of dinucleotide(dC-dA)n.(dG-dT)n polymorphismsin worldpopulations’,
26.Deka, R., Shriver, M.D., Yu, L.M.etal.(1999),‘Geneticvariationat twentythreemicrosatelliteloci in sixteenhuman populations’,J.Genet. Vol. 78,pp. 99–121.
27.Shriver, M.D., Jin, L., Ferrell, R.E.etal.(1997),‘Microsatellite data supportanearly populationexpansioninAfrica’,Genome Res.Vol. 7, pp.586–591.
28.McArthur, N. (1967),Island Populations ofthe Pacific, AustralianNational UniversityPress, Canberra, Australia.
29.McArthur, N.A. (1956),The Populations ofthe Pacific.PartIII American Samoa and PartIV WesternSamoa andthe TokelauIslands, Australian NationalUniversity, Department of Demography, Canberra, Australia. 30.Gilson, R.P. (1970),Samoa1830 to 1900: The Politics of a Multi-cultural
Community, Oxford UniversityPress, NewYork, NY.
31.Tsai, H.-J., Sun, G., Weeks, D.E.etal.(2001),‘Type2diabetesandthree calpain-10genepolymorphismsinSamoans: Noevidenceof association’,
Am.J.Hum.Genet.Vol. 69,pp. 1236–1244.
32.McGarvey, S.T., Levinson, P.D., Bausserman, L.etal.(1993), ‘Population-change inadult obesityand blood-lipidsinAmerican-Samoa from 1976–1978to 1990’,Am.J.Hum.Biol.Vol.5,pp. 17–30.
33.Galanis, D.J., McGarvey, S.T., Quested, C.etal.(1999),‘Dietaryintakeof modernizing Samoans: Implicationsfor riskof cardiovasculardisease’,
J.Am.DietAssoc.Vol. 99,pp. 184–190.
34.Lewontin, R.C. (1964),‘The interaction ofselectionandlinkage. I.Generalconsiderations;heteroticmodels’,GeneticsVol.49, pp.49–67.
35.Abecasis, G.R. and Cookson, W.O. (2000), ‘GOLD — Graphical overview of linkage disequilibrium’, Bioinformatics Vol. 16, pp. 182–183.
36.Excoffier, L.and Slatkin, M. (1995),‘Maximum-likelihood estimation of molecularhaplotype frequenciesina diploidpopulation’,Mol.Biol.Evol. Vol. 12,pp. 921–927.
37.Long, J.C., Williams, R.C.and Urbanek, M. (1995),‘AnE-M algorithm andtestingstrategyfor multiple-locushaplotypes’,Am.J.Hum.Genet. Vol.56,pp. 799–810.
38.Ihaka, R.and Gentleman, R. (1996),‘R: Alanguage fordata analysisand graphics’,J.Comput.Graph.Stat.Vol.5,pp. 299–314.
39.Broman, K.W. (2001),‘Estimation of allele frequencies with dataon sib-ships’,Genet.Epidemiol.Vol. 20,pp. 307–315.
40.Chakraborty, R. (1978),‘Number of independentgenesexamined in family surveysand itseffect ongene frequencyestimation’,Am.J.Hum.
Genet.Vol. 30,pp.550–552.
41.Chakraborty, R. (1991),‘Inclusion of dataon relativesforestimation of allele frequencies’,Am.J.Hum.Genet.Vol.49,pp. 242–244. 42.Teare, M.D., Dunning, A.M., Durocher, F.etal.(2002),‘Sampling
dis-tribution ofsummary linkage disequilibrium measures’,Ann.Hum.Genet. Vol. 66,pp. 223–233.
43.Kimmel, M., Chakraborty, R., King, J.P.etal.(1998),‘Signatures of populationexpansionin microsatelliterepeatdata’,GeneticsVol. 148, pp. 1921–1930.
44.Pritchard, J.K., Stephens, M.and Donnelly, P. (2000),‘Inferenceof population structureusingmultilocusgenotype data’,GeneticsVol. 155, pp. 945–959.
45.Deka, R., McGarvey, S.T., Weeks, D.E.etal.(2003),‘Geneticvariation inanisolatedpopulation,the Samoans of Polynesia: Implications for mapping complex traits’,Am.J.Hum.Genet.Vol. 73(Suppl.), pp. 187.
46.Perez-Lezaun, A., Calafell, F., Seielstad, M.etal.(1997),‘Population genetics of Y-chromosomeshort tandem repeatsinhumans’,J.Mol.Evol. Vol.45,pp. 265–270.