Daniel Ross Fisher
In Partial Fulllmentof the Requirements
for the Degree of
Dotor of Philosophy
California Instituteof Tehnology
Quantum Monte Carlo is a relatively new lass of eletroni struture methods that
has thepotentialtoalulate expetationvaluesforatomi,moleular,and materials
systems to within hemial auray. QMC sales as O(N 3
) or better with the size
of the system, whih is muh more favorable than traditional eletroni struture
methods apableofomparable auray. In addition,the stohasti natureof QMC
makes itrelatively easy toparallelizeovermultipleproessors.
QMC alulations use the Metropolis algorithm to sample the eletron density
of the system. This method has an inherent equilibration phase, during whih the
ongurations donot represent the desired density and must be disarded. Beause
the time spentonequilibrationinreases linearly withthe number ofproessors, this
phase limits the eÆieny of parallel alulations, making it impossible to use large
numbers of proessors tospeed onvergene.
This thesis presents an algorithmthat generates statistially independent walker
equi-libration phase and ensuring the auray of alulations. Shortening the length of
the equilibration phase greatly improves the eÆieny of large parallel alulations,
hetero-geneous, anddistributedomputingresoures toondut highlyauratesimulations
The most ommonformulationof diusion MonteCarlo has two souresof error:
the time step used to propagate the walkers and the nodes of the trialfuntion. In
multionguration self-onsistent eld trial funtions and time steps ranging from
to 10 2
au. The results are ompared to values from experiment and high
quality ab initio alulations, as well as the reently developed X3LYP, M06, and
XYG3 density funtionals. The appropriate time step and trial funtions for the
1 Eletroni Struture Theory 1
1.1 Quantum Mehanis . . . 2
1.1.1 Cusp Conditions . . . 3
1.1.2 The VariationalTheorem. . . 4
1.2 Approximate Methods . . . 5
1.2.1 Hartree-Fok . . . 6
1.2.2 Post Hartree-Fok Methods . . . 8
220.127.116.11 Conguration Interation . . . 8
18.104.22.168 Coupled Cluster . . . 9
22.214.171.124 Multionguration SCF . . . 10
1.2.3 Perturbation Theory . . . 11
1.2.4 ExtrapolatedMethods . . . 12
1.2.5 Density Funtional Theory . . . 13
2 Parallel Computing 16 2.1 Designing Parallel Algorithms . . . 17
2.2 Analyzing Parallel Algorithms . . . 18
3 Random Number Generation 23
3.1 RandomNumber Generation. . . 23
3.1.1 UniformRandom Numbers . . . 24
3.1.2 The Transformation Method . . . 25
3.1.3 The VonNeumann Method . . . 25
3.1.4 The Metropolis Algorithm . . . 26
4 Quantum Monte Carlo 28 4.1 VariationalMonteCarlo . . . 29
4.2 VMCTrialFuntions . . . 31
4.3 DiusionMonte Carlo . . . 33
4.4 GeneratingCongurations inDMC . . . 36
5 An OptimizedInitializationAlgorithmtoEnsure Aurayin Quan-tum Monte Carlo Calulations 40 Abstrat 41 5.1 Introdution . . . 42
5.2 The Metropolis Algorithmand the InitializationCatastrophe . . . 43
5.3 WalkerInitialization . . . 46
5.3.1 STRAW . . . 48
5.3.2 EquilibrationBehavior . . . 51
5.3.3 Timingand SpatialCorrelation . . . 57
5.3.4 Parallel Calulation EÆieny . . . 59
5.4 Conlusion . . . 62
6.3 Experimental and ComputationalResults. . . 68
6.4 ComputationalMethods . . . 69
6.4.1 Quantum Monte Carlo . . . 72
6.4.2 QMC Proedures . . . 76
6.5 Results and Disussion . . . 77
6.5.1 Reation1 . . . 78
6.5.2 Reation2 . . . 81
6.5.3 Reation3 . . . 82
6.5.4 Disussion . . . 83
6.6 Conlusion . . . 85
Eletroni Struture Theory
other properties of moleules and materials based on simulations of their eletroni
struture. The behaviorofpartilesonthis sale isgovernedby the laws of quantum
mehanis. Although these laws are well understood, applying them to nontrivial
systemsleads toequationstooompliatedtosolveexatly. Sine exatsolutionsare
Approximate eletroni struture methods are lassied as ab initio methods,
whih are based only on the laws of quantum mehanis, orsemiempirial methods,
whih use experimental results to determine funtional forms and t parameters.
Methodsof both types are used to understand and preditexperimentalphenomena
suh as reationmehanisms, eletrialproperties, and biologial ativity for a wide
variety of systems. This hapter ontains a very basi introdution to the laws of
quantum mehanis and the approximate methods used to apply them to moleular
and solid statesystems. Further informationonquantummehanis an befound in
1.1 Quantum Mehanis
beobtained bysolving theShrodingerequation. Exatsolutionsforthe Shrodinger
equationare possiblefor onlythe simplestsystems. Largersystems leadveryquikly
toequations with too many dimensions tobe solvable.
Quantummehanisditatesseveral neessary featuresfor ,the wavefuntionof
=j j 2
interpreted as the probabilitydensity funtion for the position of the partile. Sine
the partile must exist somewhere, integrating j j 2
over all spae must give unity.
Wavefuntions that satisfythis ondition are referred toasnormalized. Aordingly,
we multiply by a normalizationonstant so that R
dj j 2
The mathematial framework of a many partile wavefuntion, , must aount
forthe fatthat eletronsare indistinguishablefromeah other. Thismeansthat the
spatial probabilitydensity, , annotvarywith the interhange of any two eletrons:
(1;2)=j (1;2)j 2
=j (2;1)j 2
(1;2)= (2;1): (1.2)
Partiles suh as eletrons with half-integer spin are fermions, for whih the
wave-funtion isantisymmetri: (1;2)= (2;1).
The time-dependent Shrodinger equation determines the evolution of the
wave-funtion of asystem with time [8℄:
j (X;t) i= ^
Hj (X;t)i; (1.3)
of the system.
The solution tothe time-dependentShrodinger equation an be expanded as
j (X;t)i= X j j e iE j t j j
where the oeÆient
j = h
(X)j (X;0)i and the E
(X)i are the
eigen-values and eigenfuntions of the time-independent Shrodingerequation:
Beause they do not hange with time, the eigenfuntionsj
(X)i are known as
the stationary states of the system. Eah stationary state has an assoiated
, whih an be interpreted as its energy. The stationary states are usually
ordered so that E
, with the lowest energy state, j
alled the groundstate.
H is aHermitianoperator,itseigenvalues are realand itseigenfuntions
are orthogonal to eah other and span the spae of all possible solutions. They an
alsobehosen to be normalized,so that
is the Kronekerdelta: Æ
equals 1 if i=j and 0otherwise.
1.1.1 Cusp Conditions
The Hamiltonianoperatorfora system ofN eletrons andK nulei with harges Z
and masses M
L is ^ H= 1 2 N X i=1 r 2 i 1 2 K X L=1 1 M L r 2 L N X i=1 K X L=1 Z L r iL + N X i=1 N X j>i 1 r ij + K X I=1 K X J>I Z I Z J r IJ ; (1.7)
whereloweraseindiesrefer toeletrons, upperase indiesrefertonulei, and r
In equation 1.7 and throughoutthis work, atomi unitsare used, in whih h=1,
= 1, jej = 1, and 4
= 1, where m
is the mass of an eletron, jej is the
magnitude of itsharge, and
isthe permittivity of freespae.
The Coulombtermsinthe Hamiltoniandiverge whentwopartilesapproaheah
other. The Shrodinger equation an be solved analytially for these ongurations
others. In order for the energy of the system to benite, a divergene inthe kineti
energy must exatly anel the divergene in the potential. Solving the Shrodinger
equation for these ongurations to ahieve this anellation leads to the following
usp ondition for the wavefuntion [9℄:
lim r ij !0 r ij = ij q i q j
l+1 lim r ij !0 ; (1.8) where ij
is the redued mass of the partiles, q
i and q
are their harges, and l is 1
for same-spin eletrons and 0 otherwise. An aurate wave funtionmust satisfy the
usp ondition for eahpair of partilesinthe system.
1.1.2 The Variational Theorem
The most powerful tool researhers have in onstruting approximate ground state
wavefuntions is the variational priniple, whih provides a way to ompare their
quality. The exat eigenfuntions, j
(X)i, of the Hamiltonianspan the spae of all
possible wavefuntions for the system. Therefore, any normalizable trial
(X)i,that satisesthe boundaryonditions of thesystem an beexpanded
in termsof the j
(X)i = X i b i j i
where the equality appliesif j
The expetation value of the energy of a trialwavefuntion is anupperboundto
the ground state energy. The loser the trial wavefuntion is to the atual ground
state, the lower its energy will be. This provides a way to approximate the ground
state. First,a parametrizedwavefuntion isonstrutedwith a formthatan easily
be evaluated. Then the parameters are adjusted to minimize the expetation value
of the energy. This is the losest approximation to the ground state in the spae of
the adjustableparameters. Physialarguments must beused inhoosing the formof
the trial wavefuntion: it determines the restritions on the interations that an be
desribed and thereforerepresents a model.
1.2 Approximate Methods
than that of an eletron. The eletrons see the heavy, slow-moving nulei as almost
stationaryharges,and thenulei seethe muhfaster eletronsasessentiallya
three-dimensionaldistribution ofharge. The Born-Oppenheimer approximation simplies
the moleularproblembytreatingtheeletroniand nulearmotionsseparately[10℄.
In this method, one assumes a xed onguration for the nulei, and for this
onguration solves an eletroni Shrodinger equation to nd the eletroni wave
funtion and energy. This proess isrepeated for dierent ongurations togive the
energy levelsfor a given eletroni state.
Thebasis foralmostallmethodstosolvetheeletroni partofthe Shrodinger
equa-tion is the Hartree-Fok (HF) method. In HF, the trial wavefuntion is expressed
as an antisymmetri produt of normalized, orthogonal moleular orbitals,
i . The
simplestway toonstrutatrialwavefuntionfromaset oforbitalsistouse aSlater
determinant, a framework that ensures the antisymmetry of the overall
wavefun-tion [11℄: AS (x 1 ;x 2 ;:::;x
N )= 1 p N! 1 (x 1 ) 2 (x 1 ) N (x 1 ) 1 (x 2 ) 2 (x 2 ) N (x 2 ) . . . . . . . . . 1 (x N ) 2 (x N ) N (x N ) : (1.12)
In equation 1.12, x
ontains the spae and spin oordinates of eletron i. Sine the
determinant of a matrix hanges sign if two rows or olumns are interhanged, the
overall wavefuntion willhave the properantisymmetrywith respet topermutation
of the eletrons.
The moleularorbitals an be fatored intospatial and spin omponents:
(x)= (~r;)=(~r)(); (1.13)
are written aslinear ombinationsof basis funtions:
= X ; (1.14) wherethe
type orbitals (GTO) are usually used as basis funtions. GTOs have the following
Sine the derivativeof a Gaussianis zero atitsorigin, these funtions annotsatisfy
the eletron-nulear usp ondition of setion 1.1.1. The resulting multienter
inte-grals an be evaluated analytially, however, so alarge number of Gaussians an be
used in the basis set with littleomputationalexpense.
Slater type orbitals (STO) have the orret form to satisfy both the
eletron-nulear usp ondition and the long range behavior of moleular orbitals, but are
not typially used in basis sets beause they lead to very ompliated integrals in
restrits them toertain shapes and regions of spae. The more funtions in a basis
set, the more exibility it has to approximate moleular orbitals. Larger basis sets
generally produe better results in omputations, but require more omputer time.
Sine an eletron has a nite probability of existing anywhere in spae, an innite
basis set would be neessary to ompletelydesribe its possible position.
In order to solve for the orbital expansion oeÆients, the Hartree-Fok method
FC =SC; (1.17)
where eah element is a matrix. The Fok matrix, F, represents the average eets
of the eld of all the eletrons on eah orbital. The matrix C ontains the orbital
oeÆients, S indiates the overlap between the orbitals,and is a diagonal matrix
Thus equation 1.17 is not linear and must be solved with an iterative proedure
alled the self-onsistent eld (SCF) method. First, an initial guess for the orbital
oeÆients isformed, and the orrespondingdensity matrixis onstruted. Usingit,
the Fok matrix is formed. Then, solving the eigenvalue problemyieldsa new set of
orbitaloeÆients. This proedure is repeated untilboth the orbitaloeÆients and
the energy have onverged. Atthis point,the orbitalsgenerate a eld that produes
the same orbitals. This method produes both oupied and virtual (unoupied)
Solving the eigenvalue problem is the slowest step of the proess. It involves
diagonalizinga matrix, aproess that sales asO(N 3
),where N is the linear size of
the matrix. In this ase, N is the number of basis funtions.
1.2.2 Post Hartree-Fok Methods
The errors of Hartree-Fok are due to the fat that it treats the repulsion of the
eletrons for eah other in an average way and neglets the details of their motion.
The shape of the orbital an eletron oupies is determined by the potential eld
of the nulei and the density of the other oupied orbitals. An eletron sees only
the \mean eld" of the other eletrons, whih allows them to ome lose together
more oftenthan they shouldand makesit impossible forthe wavefuntion tosatisfy
the eletron-eletron usp onditions of setion 1.1.1. The dierene in energy that
would result from properly allowing the eletrons to avoid eah other is alled the
orrelation energy. Several methods go beyond Hartree-Fok and attempt to treat
this phenomenon properly.
126.96.36.199 Conguration Interation
The onguration interation (CI) method uses the virtual orbitals generated by
Hartree-Fok in addition to the oupied orbitals to onstrut a wavefuntion as a
expansion oeÆientsare found by diagonalizing the resultingHamiltonianmatrix:
N )= X m a m AS m (~x 1 ;~x
If a Slater determinant orreponding to every possible oupation of the orbitals is
inluded in the expansion, the alulation is a \full CI." In most ases, full CI is
impossiblebeausethe number of possible Slaterdeterminantsis too large.
In pratie, CIalulations are usually arriedout by inludinga limitednumber
of determinantsinthe expansion. ACI singles(CIS) alulation exites one eletron
al-ulationinludes singles and doubles, a CISDTalulation inludes singles, doubles,
and triples, et. CI alulations an providequantitative results (within2 kal/mol)
for energies of moleules, but are extremely time onsuming and require immense
amounts of memory,even for small systems and minimalbasis sets. In addition, the
188.8.131.52 Coupled Cluster
In oupled luster (CC) alulations, the trial wavefuntion is expressed as a linear
is used togenerate the ongurations and alulate the energy [12℄:
i=exp ^ T j HF i: (1.19)
groundstate into virtual orbitals. Equation 1.19 an be expanded ina Taylor series:
i = exp ^ T j HF i = j HF i+ ^ T 1 j HF i+ ^ T 2 + 1 2 ^ T 2 1 j HF i + ^ T 3 + ^ T 1 ^ T 2 + 1 ^ T 3 1 j HF
reatesdoubleexitations,and soon. In
equa-tion 1.20, the termsare grouped into levelsof exitation. At eah level of exitation,
several terms ontribute. At the seond level, for example, ^
double exitations, while ^
generates two disonneted single exitations.
Coupled luster makes it easy to treat moleules of dierent sizes with the same
level of orrelation, whih is important for hemial reations, in whih bonds may
form or a large moleule may dissoiate into fragments. Treating the produts and
reatants of areation onsistently is neessary toget aurate energy dierenes.
of exitation inluded in the expansion. A CCSD alulation inludes single and
doubleexitations, while CCSD(T)inludes triplesasa perturbation. CCSD(T) isa
very popular method for onduting aurate alulations with reasonable ost, and
is oftenused asa benhmark to ompare the results of other methods. The expense
of CCSD(T) sales as O(N 7
) with the number of basis funtions, whih limits its
applition tosmall moleules and basis sets.
184.108.40.206 Multionguration SCF
Ina multiongurationselfonsistenteld (MCSCF) alulation,the userdenes an
and both the orbitals and CI expansion oeÆients are variationallyoptimized [13℄.
If a full CI is arried out on the ative spae, and all possible oupations of the
ative orbitals are onsidered, the alulation is alled a omplete ative spae SCF
(CASSCF) [14℄ or fully optimized reation spae (FORS) [15℄ alulation. Beause
both the orbitals and CI oeÆients are optimized, MCSCF oers the most general
approah available to omputing eletroni struture. The large number of
varia-tional parameters makes the optimization a hallenge, so users must be areful to
onlyinludethe eletronsand orbitalsinvolved inthe reationunderinvestigationin
form of MCSCF in whih eletrons are exited pairwise from valene orbitals into
virtual orbitals [16℄. Although the seletion of ongurations is onstrained, the
optimization proedure for GVB alulations is muh more systemati and reliable
thanageneralMCSCFalulation. TheGVBwavefuntionisthe simplestformthat
allows moleules to dissoiate into open shell fragments, whih allows it to produe
aurate dissoiation urvesfor hemial bonds.
1.2.3 Perturbation Theory
Mller-Plesset (MP) perturbation theory is a non-iterative method for alulating
the orrelationenergy of aset oforbitals. In perturbation theory, the Hamiltonianis
divided into two parts:
H = ^
is exatly solvable and ^
V is a perturbation that is assumed to be small
ompared toit. Theperturbed wavefuntionan beexpanded asa powerseries in:
+ 2 (2)
+ 3 ( 3)
The perturbed wavefuntion is substitutedintothe Shrodingerequation:
+E ( 1)
Equatingtermswith thesamepowerofgivesformulasfororretions tothe energy
for varying lengths of the expansion.
the Fok operator and its ground state Slater determinant. The perturbation, ^
integrals between determinants:
where the index t sums over determinants in whih two eletrons have been exited
intovirtual orbitals. It iseasytosee fromthe denominatorof equation1.24 that the
states, whose energy islose to the ground state energy.
Mller-Plesset perturbationtheory is referred toby the order of the expansion of
(MP4) orderare implementedinmany quantumhemistryprograms. Multireferene
MP theory, in whih an MCSCF or CI wavefuntion is used as the unperturbed
wavefuntion, has alsobeen developed [17℄.
While MP2 generally gives good results for moleular geometries and hanges
in energy for hemial reations, reent studies omparing levels of perturbation for
en-ergiesarenot neessarilyonvergentinthe limitofhigherordersofperturbation[18℄.
1.2.4 Extrapolated Methods
The omplete basis set (CBS) methods address the errors due to using a nite
basis set in alulations. They extrapolate to an innite basis using expressions
for the orrelation energy reovered for eletron pairs as funtions of higher angular
Hartree-Fok alulation with a large basis set, an MP2 alulation with a moderate basis
obtained for a high level alulation withan innitebasis set.
The Gaussian-1 (G1) method approximates a quadrati CISD(T) result with a
large basis set using four smaller alulations [20℄. It orrets for trunation of the
basis set by arrying out MP4 alulationswith three dierent basis sets and for the
limitedleveloforrelation by arryingout aquadrati CISD(T) alulationwith the
and G4 [23℄ methodshave subsequently been developed.
The foal point method expliitly examines the onvergene of the energy with
respet to both the basis set and level of orrelation to estimate the ab initio limit
within the Born-Oppenheimer approximation [24℄. In the foal point proedure, HF
energies areextrapolatedtothe CBSlimit,andCCSDTand CCSDT(Q)alulations
are arriedout using a moderatebasis set. The results are ombinedtoestimate the
speial relativisti eets [26℄ are added togivethe foal point result.
1.2.5 Density Funtional Theory
Density funtional theory (DFT) is another widely used lass of methods for
tak-ing into aount the eets of eletron orrelation. DFT is based on the theorem
of Hohenberg and Kohn, whih proves the existene of a funtionalthat determines
the exat eletron density and energy for a given a nulear potential eld [27℄.
Un-fortunately, the theorem does not provide the form of the exat funtional. While
the exat funtional would take an eletron density as input and return the energy,
approximate funtionalspartitionthe energy into several terms[28℄:
E =E T
The rst three terms orrespond to the kineti energy, the attration between the
between the eletrons.
density, a funtion of the three spatial variables. No orbitals would be involved, and
alulations would sale linearly with the size of the system. In pratie, however,
a method similar to Hartree-Fok is used. The wavefuntion is written as a Slater
determinant of orbitals, and the Fok operator is replaed with one that takes the
eets of eletron orrelationintoaount.
The exhange-orrelation energy of equation 1.25 is separated intoexhange and
orrelation terms. The exhange energy arises from the interations between same
spin eletrons, whih are kept apart by the antisymmetry of the spatial part of the
wave funtion. The orrelation energy is due to the interations between opposite
The exhange and orrelation energy terms are alulated by funtionals of the
density. The basis for most funtionals is the loal density approximation, in whih
eletrons uniformlyoupy a volume with a positive bakground harge to keep the
overall harge neutral. Forthis system, the exhange energy has a simple form:
Loal orrelationfuntionals are more ompliated,but are alsoinuse [29℄.
The eletron density of atoms and moleules, however, is not uniform, so
re-searhers have developed exhange and orrelation funtionalsthat use the gradient
of the density aswell as itsvalue [30, 31℄.
Some of the most aurate density funtional methods in use are hybrid
fun-tionals, in whih the Hartree-Fok denition of the exhange energy, whih is based
on moleular orbitals, is inluded as a omponent of the exhange-orrelation
where E X
is a gradient-orreted exhange term, E C
is a loal orrelation
term,and E C
isa gradient-orretedorrelation term. The oeÆients
X , and
Density funtionaltheory isaverypopularway forresearherstoinludeeletron
orrelation in alulations and obtain results that are aurate enough for many
ap-pliationswith moderate omputationalexpense. These methods have been applied
suessfullytoalargevarietyof systemsand havebeenabenettomanyareasof
re-searh. Whilepost-Hartree-Fokmethodsan alwaysbeimproved byinludingmore
ongurations, using a larger basis set, or alulating higher orders of perturbation,
DFT suers from the fat that there is no systemati way to improve its results.
New density funtionalsare ontinuously being developed [34,35, 36℄, but none give
The extraordinary inrease in omputingpoweravailableto researhers over the last
eo-nomis, and many other elds. The long term trend in the number of iruits that
The omplexity forminimum omponentosts has inreased atarate
of roughly a fator of two peryear .... Certainly overthe short term this
ratean beexpetedtoontinue, ifnottoinrease. Overthelongerterm,
therate ofinrease isabitmore unertain,although thereisnoreasonto
by 1975, the number of omponents per integrated iruit for minimum
ost will be 65,000. I believe that suh a large iruit an be built on a
single wafer [37℄.
In 1975, Moore's predition for the time to double the number of transistors on a
iruit was revised to 18 months. The trends in almost every measure of eletroni
devies, suhasproessing speed, memoryapaity,and omputing performane per
unit ost, are losely relatedto Moore's law.
It has oftenbeen preditedthat hip designers would notbeabletokeep up with
hip makers prediting new proessors onsistent with Moore's law for another ten
have developed tehniques for parallel omputing, in whih multiple proessors
on-neted with a network are used together tosolve a problem. This hapter desribes
some of the ways parallel programs are designed and analyzed. Additional
informa-tion an be found in referene [38℄.
2.1 Designing Parallel Algorithms
Inorderforanalgorithmtobeexeuted inparallel,theprogrammermust deompose
the work into tasks and identify whih tasks an be exeuted onurrently. The
onurrent tasks an be assignedto dierent proessors to be exeuted. In order for
a proessor to be able to omplete itstask, the appropriate instrutions, input, and
output must be ommuniated.
The granularity of aproblemreferstothenumberandsize ofthetasksintowhih
itan bedeomposed. The degree of onurreny isthe numberof tasksthat an be
exeuted simultaneously. This number isusually less than the total numberof tasks
due to dependenies amongthem.
There are many tehniques for deomposing aproblemintotasks. The nature of
the problem determines how it an be divided and how the dierent tasks interat
with eah other. A problemforwhiha ne-graineddeompositionintoindependent
tasks is possible is well suited for parallel omputing, and will benet greatly from
the parallel environment. In espeially unfavorable irumstanes, suh as if many
proessors are idle while they waitfor another task tosupply them with input, or if
2.2 Analyzing Parallel Algorithms
Usingtwie as many proessors to exeute a programrarely results init ompleting
in half the time or generating twie as muh useful output. Overhead expenses,
unavoidable in the parallel environment, subtrat from the performane. Computer
sientists have developed several metristomeasure the expense and performane of
The most basi measure of performane is the speedup, the ratio of the serial
, tothe parallel exeutiontime, T
The largest soure of overhead is usually ommuniation of data between
pro-essors. In addition, some proessors may beome idle if they nish their task and
must wait for a new one. The parallel algorithmmay also have to arry out exess
omputationomparedtothe serialalgorithm. Forexample,if theresult ofaertain
eah proessor in a parallel alulation, while the serial algorithm only has to arry
out the alulation one.
The overhead for a parallel algorithm is the dierene between the parallel and
where pis the numberof proessors.
Animportantmeasure fortheeetivenessof aparallelalgorithmisitseÆieny,
whihis the ratio of the serial ost tothe parallel ost:
E = T
inreases with the numberof proessors. As an be seen inequation 2.4, aninrease
in overhead auses a derease in eÆieny. The lossof eÆieny leads to dereasing
returns asmore proessors are used to exeutean algorithm.
The dereasing gain in performane as the number of proessors inreases is
ex-pressed in a slightly dierent way in Amdahl's Law [39℄. If P is the fration of
an algorithm that an be parallelized and S is the fration that must be omputed
serially, the speedup an be writtenas afuntion of the numberof proessors:
As the number of proessors inreases, the speedup asymptotially approahes 1
Aording to this formula, if the serial portion of an algorithmis 10%, the greatest
possible speedup is ten times,no matterhow many proessorsare used. As aresult,
muh of the eort in designing parallel algorithmsgoes intoparallelizing as muh of
the work as possible.
After examiningEqs 2.4 and 2.6, it would be easy to beome skeptial as to the
viability of massively parallel omputers, sine the benet of using more proessors
is bounded. In pratie, however, the size of the problem usually inreases with the
number of proessors. When given more proessors,researhers willusually inrease
the problem size inreases, the fration of the run time spent onoverhead dereases,
whihimproves the eÆieny for large numbers of proessors.
Sine 1993, the Top500 list has kept trak of the most powerful superomputers in
the AdvanedSimulationand Computingprogram[41℄. Theseare homogeneous
ma-hines, meaningthey are onstruted fromone typeof proessor, with huge memory
and fast interonnetion hardware.
Several alulations presented later in this work were arried out using Blue
Gene/L at Lawrene Livermore National Laboratory. The unlassied portion of
this mahine onsists of 81,920 IBM PowerPC proessors running at 700 MHz [42℄.
Although the proessor speed is slow, the great number of proessors gives uBGL
almost 230 TFlopsof omputing power.
The most powerful omputer in the world at the moment is Roadrunner at Los
Alamos NationalLaboratory. Roadrunner has 13,824 1.8GHzAMD Opteron
proes-sors tohandleoperations,ommuniations, andsome omputationand 116,640IBM
PowerXCell 8iproessorstohandleoatingpoint-operations. Roadrunneristhe rst
mahine tohave over1 petaop sustained performane [43℄.
Not to be outdone, LLNL has announed they will be onstruting Sequoia, a
Blue Gene/Q mahine that will exeed 20 petaops, to go online in 2011. Sequoia
willhave more omputingpower than the urrent Top500 listombined [44℄.
The rate ofesalation inomputing poweris easytosee. Aess tothe DOE
inter-onnetion hardware, and management software brought about by the DOE projet
have improved the performane of the mahines an individual researh group an
2.4 Beowulf Clusters and Grid Computing
In ontrast to the massively expensive homogeneous omputers of the previous
se-tion, researhers an assemble a low ost luster using o-the-shelf proessors and
onnetion hardware in a Beowulf framework [45℄. Suh a luster an be
Another development in parallel omputing is the use of loosely oupled, widely
distributed grids of proessors to arry out omputations. Some examples inlude
to searh for extraterrestrial intelligene and simulate protein folding. BOINC is a
projet at UC Berkeley that has tools to help researhers develop software for and
onnet to distributed volunteer omputing resoures [48℄.
Parallelalgorithmsmusthave ertainharateristisinordertoperform wellina
heterogeneous, looselyoupled environment. An appliationthat must ommuniate
large amounts of data among the tasks or is unable to balane the work between
very poorly. A parallel algorithm with low ommuniations requirements and the
ablity to use proessors running at dierent speeds will be able to eÆiently use
inexpensive omputing resoures toarry out large omputing jobs.
The bulkof the omputingeort in traditionaleletroni struture methodssuh
suh as DFT and oupled luster are unable to eÆiently use more than a few tens
of proessors. Sine oupled luster sales as O(N 7
) with the size of the system,
the inability to use large numbers of proessors prevents researhers from arrying
out highly aurate alulations on large systems, suh as nanodevies or biologial
Quantum Monte Carlo (QMC) is an alternate approah to eletroni struture
simulationsthatalulates expetationvalues stohastiallyratherthan analytially.
an,inpriniple, alulateexatexpetationvaluesandsalesasO(N 3
of the system. QMC an beformulatedtohave very small memory and
ommunia-tions requirements and automatially balane the work between proessors running
Random Number Generation
and statistial sampling and analysis. In addition, stohasti, or non-deterministi,
simulationsan beused tomodel manytypes ofphysialand mathematialsystems.
In thesesimulations,the behavior ofsome part of thesystem is randomlygenerated.
Beause of the essential role played by random numbers, they are grouped into a
lass alled \Monte Carlo" methods. Eletroni struture appliations inlude
Vari-ational Monte Carlo (VMC), in whih the parameters of a trial wavefuntion are
optimized, and Diusion Monte Carlo (DMC), whih has the potential to alulate
exat expetation values for many-bodyquantum mehanial systems.
3.1 Random Number Generation
suh as the noise of an analog iruit, the deay of radioative nulei [51℄, or
bak-ground atmospheri radio noise [52℄. Computers, on the other hand, only operate
basedonprogrammedinstrutions. Theyan generatesequenes of\pseudorandom"
numbers that lak patterns, but are determined by a formula. Statistial tests have
beendeveloped todetetorrelationsinsequenesofnumbers. Thequalityofa
3.1.1 Uniform Random Numbers
inthe rangehaving the same probability ofbeing generated. Virtually every sheme
to generate random numbers with respet to a desired probability density relies on
onverting uniform random numbers.
The mostommonwaytogenerate uniformrandomnumbers iswithalinear
on-gruential, or modulo, generator, whih generates a series of integers, fI
by the reurrenerelation
wherem,a,and arepositiveintegersalledthe modulus,multiplier,and inrement.
They denethe linearongruentialgenerator. The rst integer,I
,is alledthe seed.
Using the same seed with a ertain generator willalways give the same sequene of
< m for all j. Therefore, the algorithm an generate at most m
dis-tintintegers. Thesequene ofintegersistransformed intouniformrandomnumbers
between 0and 1by lettingu
The sequene fI
g generated by equation 3.1 will eventually repeat itself with a
periodpthatislessthanorequaltom. Ifm,a,andare properlyhosen,theperiod
will be of maximal length. Several rules have been developed and implemented to
maximizep and givethe best results in statistialtests for randomness [53℄.
Poor hoies of a, , and m, an result in random number sequenes with very
short periods. Many linear ongruential generators implemented as library routines
3.1.2 The Transformation Method
MonteCarlosimulationsoftenrequire randomnumbers distributedwith respet toa
given probability density funtion, (x). The most eÆient way to generate suh a
sequene iswiththetransformationmethod,whihdiretlyonverts uniformrandom
numbers tothe desireddensity.
The umulative distribution funtion represents the probability that a point in
the given density isless than orequal toy:
If (x) is normalized,P (y) willinrease monotonially from 0to 1.
To generate a random number, w, distributed with respet to (x) , a uniform
random number, u, is generated. Then w =P 1
(u), where P 1
is the inverse of P.
This methodrequires that the funtionP beknown and invertible,whihis the ase
more ompliatedfuntions, dierent algorithmsmust be used.
3.1.3 The Von Neumann Method
The Von Neumann, or rejetion, method is a less eÆient but more generally
appli-able way to generate points with respet to a probability density funtion that is
known and an bealulated. The umulative distributionfuntiondoesnot haveto
be known orinvertible.
In order to use this method, one rst nds a funtion, h(x), that is everywhere
greater than and preferably lose to the desired probability density funtion, (x),
and for whih the transformation method an be used. A random number, z, is
generatedwithrespet toh(x)andthe ratioA(z)= (z)
is always greaterthan (x) , this ratio willbe between 0and 1.
z ifu<A(z) andrejetingz ifu>A(z). Theeetofthe rejetionstepistoweight
the density h(x) by ( x)
so that (x) emerges. This method is very simple, but will
leadtoexessive rejetionand beveryineÆientifh(x)isnot loseto(x), inwhih
ase the aeptane probability A(z) will often be small. This loss of eÆieny is
partiularlyimportantfor high-dimensional spaes.
3.1.4 The Metropolis Algorithm
Quantum Monte Carlo alulations require random eletroni ongurations
dis-tributed with respet to the quantum mehanial probability density, the square of
the magnitude of the eletroni wavefuntion. This is an extremely ompliated
and tightly oupled 3N-dimensional funtion, where N is the number of eletrons.
Furthermore, it has appreiable magnitude only in a very small fration of the total
toeÆiently generate randompoints with respet tothis sort of probability density.
In order to distribute eletroni ongurations with respet to their quantum
mehanial probability density, the idea of generating statistially independent
on-gurations must be abandoned. Instead, a Markov hain is used, in whih eah new
previous onguration. The sequene of ongurations forms a \random walk" that
isproportionaltothedesireddensity. Beause eahongurationdepends onthe one
before it, they will have some degree of serial orrelation,whih must be onsidered
when the variane of quantities derived from these ongurations is alulated.
A Markov hain is dened in terms of the transition probability T (x!x 0
the walk. The Metropolisalgorithmisaseriesof rulesfor generatingaMarkovhain
satisfy the followingrelationship:
T (x!x 0
Eq 3.3 is known as the detailed balane ondition. Using it, the probability for
aepting a proposed move fromx tox 0
)=min 1; T (x
T (x!x 0
It should be noted that in equation 3.4, only the ratio (x
is alulated, rather
does not have to be normalized.
In the simplest version of the Metropolis algorithm, the transition probabilities
are hosen so that T (x!x 0
) = T (x 0
!x). The aeptane probability an be
inreased by using importane sampling algorithms, whihmanipulate the transition
probabilities todiret the proposed moves into regionsof high density [55℄.
The Metropolis algorithm guarantees the Markov hain will equilibrate to a
sta-tionary distribution, whih will represent the desired probability density funtion.
This method allows virtually any probability density to be sampled, whih makes it
Quantum Monte Carlo
Quantum Monte Carlo (QMC) is a relatively new lass of methods for onduting
highly aurate quantum mehanial simulations on atomi and moleular systems.
Variationaland diusionMonte Carlo,the mostommonly used eletroni struture
QMC variants, use stohasti methods to optimize wavefuntions and alulate
Beause QMC trialfuntions donot have to be analytially integrable, there is
on-siderable freedomastotheirform. The inlusionofexpliit interpartileoordinates,
whih is impossible with traditional eletroni struture methods, allows QMC trial
)orbetter [61, 62,63, 64℄. Mean eld methodsapableof omparable
au-ray, suh as oupled luster, sale muh less favorably, as O(N 6
) to O(N!). Sine
proessors. Although QMC is not \perfetly parallel,"as has been laimed[65℄, the
parallel overhead funtion an be very small, and large numbers of proessors an
be used with high eÆieny. The use of large numbers of proessors allows QMC
alulations to nish in a reasonable amount of time, despite the slow onvergene
of Monte Carlo. The ombinationof favorable saling and parallelizability of QMC
Many parallelsienti appliationsrequirelarge memoryand fastinterproessor
ommuniation to run. Superomputers that satisfy these needs are very
expen-sive to onstrut and maintain. The most powerful superomputers in the world
today are owned by the United States Department of Energy, whih has
ommit-ted vast resoures to onstruting them in order to ondut simulations on nulear
weapons [42,43, 44℄.
and interproessor ommuniation requirements [49, 50℄. It is reasonable toenvision
aninexpensive QMC spei superomputer made of nodes with no hard drives and
inexpensiveonnetionhardware. Suhamahine wouldgiveresearherswho donot
haveaess to nationallabomputers the ability to ondut QMC alulations.
4.1 Variational Monte Carlo
VariationalquantumMonteCarlo (VMC)uses theMetropolisalgorithmtominimize
the expetation value of the energy of a trial wavefuntion with respet to its
ad-justable parameters. Beause the high dimensional integrals are done using Monte
Carlo methods, some of the restritions on the form of the wavefuntion that are
neessary when the integrals are evaluatedanalytially an be relaxed.
The expetationvalue for the energy of atrialeletroni wavefuntion, j
T i,is hEi= h T j ^ Hj T i h T j T i = R d ~ R T ~ R ^ H T ~ R R d ~ R T ~ R T ~ R ; (4.1) where ^
H is theHamiltonianoperatorforthe system and ~
R isa vetor ontaining the
3N spatial oordinates of the N eletrons of the moleule. The loal energy of an
eletroni ongurationis dened as
R is aneigenfuntion of H,the loalenergy willbe onstantwith respet to
R . Using the loalenergy, the expetation value an be rewritten:
hEi = R d ~ R T ~ R T ~ R E L ~ R R d ~ R T ~ R T ~ R (4.3) = R d ~ R j T ~ R j 2 E L ~ R R d ~ R j T ~ R j 2 (4.4) = Z d ~ R VMC ~ R E L ~ R ; (4.5) where
is the probability density for the onguration ~ R : VMC ~ R = j T ~ R j 2 R d ~ R j T ~ R j 2 : (4.6)
VMC employs the Metropolis algorithmto generate a series of eletroni
ong-urations, n ~ R i o
, distributed with respet to
. The expetation value of the
energy an then beevaluated as
hEi VMC = 1 M M X i=0 E L ~ R i O 1 p M ! : (4.7)
As thewavefuntionissampled, theexpetationvalueofthe energywillutuate
withinits statistialunertainty, whihmakes omparing dierent sets ofvariational
parameters diÆult. This eet an be mitigated by using orrelated sampling, in
whih expetation values for several sets of parameters are alulated using one set
ofongurations [66℄. Correlatedsamplingallowsthe dierenebetween theenergies
of two sets of parameters to be alulated with muh less variane than if the two
expetation values are ompared after being alulated separately.
Beause the loalenergy foraneigenfuntion of ^
H isonstant,the varianeofits
the exatgroundstate,itsloalenergywillvaryless stronglywith ~
optimizethe parametersof the wavefuntionrather than its expetationvalue. This
method works well for Monte Carlo optimization beause the exat minimum value
of the variane of the energy is known, while the minimum of the expetation value
for the energy is unknown. Some optimization methods minimizea ombination of
theenergyand itsvariane[67℄. Algorithmsthatsamplethederivativesofthe energy
with respet to the adjustable parameters in the wavefuntion an be used to speed
onvergene and ensure the true minimum isfound [68, 69℄.
4.2 VMC Trial Funtions
omponents. Sine the QMC Hamiltonian does not have spin terms, we integrate
over spin and onsider only the spatial omponent. The spatial trialfuntions used
in VMCtypiallyhave the form
are CI expansion oeÆients, the AS
are Slater determinant
wave-funtions, and J is a symmetri funtion of the distanes between partiles alled
the Jastrowfuntion. This results inan overall antisymmetrifuntion with expliit
interpartile terms. The
an be obtained through standard eletroni
struture methodssuh asHartree-Fok, GVB, MCSCF, CI, orDFT.
There are many adjustable parameters in
, the orbital oeÆients
and basis funtions in the AS
, and the parameters in the Jastrow funtion an all
be optimized through VMC. Even when orrelated sampling is used, optimizing a
wavefuntion with a large number of adjustable parameters is a hallenging task.
The form of the trial wavefuntion must be hosen with are, so that time is not
The two body Jastrowfuntion is writtenas
J =exp 2 4 X i;j u ij (r ij ) 3 5 ; (4.9)
wherethe sum isoverallpairs ofpartiles, r
isthe distane between partilesi and
j, and u
)is a funtionthat desribesthe interation between partilesi and j.
The two body Jastrowfuntion makesit straightforward to onstrut trial
Satisfyingthese usp onditionsremovesthe singularitiesinthe loalenergy that
o-ur when partilesollide, whih lowers the variane of the energy.
IfGaussianorbitalsareusedintheSCFpart ofthewavefuntion,the usp
ondi-tion for partiles i and j approahing eah other leads to the following ondition for
the funtion u
ij : lim r ij !0 u ij (r ij ) r ij = ij q i q j
is the redued mass of the partiles, q
i and q
are their harges, and l is 1
for same spin eletrons and 0 otherwise.
is the Pade-Jastrow funtion:
u ij (r ij )= ij r ij + P N k=2 a ij;k r k ij 1+ P M l =1 b ij;l r l ij ; (4.11)
are adjustableparameters. Theonstant
isset tothe value
of the usp ondition for the partiles i and j. To ensure that the limit as r ! 1
remains nite, M and N are usually hosen so that M N. Many other forms for
orrelation funtions are in use, inludingsome with saled variablesand some with
three- and higher-body terms [70, 71℄.
freedom in the form of the orbitalsthat make up the Slater determinant part of the
Gaussian funtionshavezero derivativeattheir origin,so moleularorbitals
ondition. Although these usps an be satised by two-body orrelation funtions,
replaing the Gaussianorbitalswith exponentialfuntions thatsatisfy the usp near
the nulei gives muhbetterresults inQMC alulations[72℄. When the orbitals are
modied in this way, the eletron-nuleus usp values in the two body orrelation
funtions are set to zero.
4.3 Diusion Monte Carlo
Diusion MonteCarlo (DMC) does not rely onthe variationalprinipleto alulate
expetation values, but itsonvergene depends on aurate trialfuntions.
DMC starts with the time dependent Shrodinger equation:
i t j ~
R ;t i= ^ Hj ~
Withahangeof variablestoimaginarytime, =it,equation4.12 takesthe formof
j ~ R ; i= ^ Hj ~ R ; i: (4.13)
The formalsolution toequation 4.13 an be written:
i=e ^ H j ~
, thestate j
iisexpanded ineigenstatesof the Hamiltonian:
j ~ R ; 1 i= X i i j i i; (4.15) where ^
and i =h i j ~ R ; 1 i: (4.17)
The expansion in equation4.15 is substituted intoequation 4.14:
j ~ R ; 1 +d i= X i i e E i d j i i: (4.18)
As equation 4.18 is propagated with , ontributions to
with i>0 willdie
out exponentially, leaving
, the ground state.
Propagating equation 4.18 with Monte Carlo methods is ineÆient beause the
potential part of the Hamiltonian varies widely throughout onguration spae and
diverges whenhargedpartilesapproaheahother. EÆientDMCalulations use
state, is used asa guide funtion. A mixed distributionis dened:
DMC ~ R = 0 ~ R T ~ R R d ~ R 0 ~ R T ~ R : (4.19)
The mixed expetationvalue for anoperator, ^
A, has the form
hAi DMC = h 0 j ^ Aj T i h 0 j T i : (4.20)
For operators that ommute with the Hamiltonian, the DMC expetation value
equals the expetationvalue of the true groundstate:
hAi DMC = h 0 j ^ Aj T i h 0 j T i = h 0 j ^ Aj 0 i h 0 j 0 i : (4.21)
The DMC expetation value for the energy an be rewritten in a mannersimilar
tothe VMC expetationvalue:
hEi DMC = h 0 j ^ Hj T i
h j i
= dR 0 R H T R R d ~ R 0 ~ R T ~ R (4.23) = R d ~ R 0 ~ R T ~ R ^ H T( ~ R ) T ( ~ R) R d ~ R 0 ~ R T ~ R (4.24) = Z d ~ R DMC ~ R E L ~ R : (4.25)
A series of eletroni ongurations is generated with respet to
DMC ~ R , whih
allows the expetation value to be evaluated. Generating eletroni ongurations
with respet to
willbedisussed inthe next setion.
Interpreting DMC ~ R
for all ~
R . Forbosons, this property is easilysatisedbeause the groundstate
wave-funtionhasone signeverywhere inongurationspae. Ifthetrialwavefuntion has
the same sign,
willbenonnegative for all ~
R . Ground state wavefuntions
for fermions, however, have positiveand negative regions separated by nodes. If the
nodes of T ~ R and 0 ~ R
are idential,the two funtionswillhave the same sign
in every nodalregion, and
willbe nonnegativefor all ~
If the nodal strutures of
T ~ R and 0 ~ R are dierent, DMC ~ R will have
positive and negative regions. This is known as the fermion problem in DMC. In
The nodalsurfaeof aneletroni wavefuntionisa(3N 1)-dimensional
hyper-surfae wherethe wavefuntionvanishes. Thespatial antisymmetryof the
Althoughthese pointsare known, nogeneraltehniquesexist foronstruting atrial
wavefuntion with the same nodalstruture asthe true ground state.
The simplest and most widely used solution to this problem is the xed-node
approximation, in whih the nodes of the true ground state are assumed to be the
R . Thexed-nodeapproximationisenforedinaDMC
alulation by rejeting any proposed move that rosses a node and auses
tohange sign. The resultingenergy liesabovethe exat energy andisvariationalin
the nodalstruture of the trialfuntion [73, 74℄.
Other solutions tothe nodalproblem that do not rely on the xed node
approx-imation have been developed. For example, the transient estimator method
propa-gatestwobosonlikewalkerensembles, representing thepositiveand negativeparts of
0 ~ R
. This method isnot stable with respet to , the imaginarytime variablein
whih the ensembles are propagated, beause both parts of the simulation onverge
tothenodelessbosongroundstate. Expetationvaluesan onlybealulatedduring
the intermediate regimebeforethis ours, whihlimitsthe statistialauray that
an be attained.
For most small moleules, the nodes of trialwavefuntions obtained by standard
SCF methods are of good enough quality for xed node DMC alulations to yield
resultswithin hemialauray[57,59℄. In someases, suhas theberylliumatom,
multi-onguration wavefuntionsare needed to obtainnodes of suÆient quality.
4.4 Generating Congurations in DMC
Eletroni ongurations are generated with respet to
using the distribution
f ~ R ; = ~ R ; T ~ R ; (4.26) where j ~ R ;
i is a solution to the time-dependent Shrodinger equation,
The distribution f
is a solutiontoa Fokker-Plank equation:
where ^ L= 1 2 r 2
+rV ~ R +E L ~ R : (4.28) V ~ R
is the loal veloity of the trialfuntion at ~ R : V ~ R = r T ~ R T ~ R ; (4.29) and E L ~ R
E L ~ R = ^ H T ~ R T ~ R : (4.30)
In the ase
= 1, equations 4.27 and 4.28 redue to equation 4.13, the
Shrodinger equation inimaginary time.
The operator ^
L denes aneigenvalue equation:
^ LjK i ~ R i= i jK i ~ R i: (4.31) The K i ~ R = i ~ R T ~ R and the i = E i
, where the
i ~ R and E i are the
eigenvetors and eigenvalues of the Hamiltonian.
Equations 4.27 and 4.28 desribe a diusion proess in a potential. As with
equation 4.13, the formalsolutionto equation 4.27 an be written:
f ~ R ; =e ( ^ L E T ) f ~
This solution an be expanded in the eigenvetors of ^ L: f ~ R ; = X i i e ( i E T ) K i ~ R (4.33) = X i e
( Ei E
where i =hK i ~ R jf ~
i ~ R j f ~
R ;0 T ~ R i: (4.35)
It is easyto see that if E
, ontributions to f ~ R ; from the i ~ R with
DMC ~ R = 0 ~ R T ~ R .
In order to propagate Eq. 4.32 with and obtain
, it is rewritten in
Y; +d =e dE T ( +d) Z d ~ R G ~ Y; ~
R ;d f ~ R ; ; (4.36) whereG ~ Y; ~
proesses, annotbe written for arbitraryd.
Green's funtions an be written for eah proess individually, and an approximate
Green's funtionan bewritten astheir produt:
G ~ Y; ~ R; 1 (2) 3N =2 Æ h ~ Z ~ R V ~ R d i Z d ~ Z exp 2 6 4 ~ Y ~ Z 2 2d 3 7 5 (4.37) e 1 2 [ E L( ~ Y ) +E L( ~ R )℄ d +O d 2 :
Thefatorization ofthe Green'sfuntionneglets the fatthatthe termsof ^
ommute,soequation4.37 isexat onlyinthe limitd !0. Equation4.34, however,
isonlyexat inthe limit !1. Any hoie oftime stepis atradeobetween these
two onsiderations. Inpratie, runs withseveral valuesof d must bedone, and the
results are extrapolated tod =0.
eahonsisting of aneletroni ongurationand a statistialweight:
weight-ing, and branhing. In the drift step, the eletrons are moved aording tothe time
step and the loal veloity. In the diusion step, the eletrons are moved to new
positions with transitionprobabilities given by the kineti part of the Green's
fun-tion. The weighting and branhing step takes into aount the potential part of the
After awalkerismoved from onguration ~
R to ~
Y,its weight isalulated based
ontheloalenergy at ~
R and ~
Y. Inthe branhing step,walkers withhigh weightgive
birth tonew walkers, while lowweightwalkers are deleted.
The trial energy, E
, serves as a normalization fator and is adjusted after eah
step based onthe sum of the weights of the walkers in order to keep the population
stable. The average value of E
after many steps will onverge to the ground-state
Several DMC algorithms, eah with slightly dierent shemes for fatoring the
Green's funtion, proposing ongurations, alulating the weights, and branhing
the walkers have been published [66, 75℄. The DMC alulations presented later in
An Optimized Initialization
Algorithm to Ensure Auray in
Quantum Monte Carlo (QMC) alulations require the generation of random
ele-troniongurations withrespet toadesiredprobability density,usually the square
of the magnitude of the wavefuntion. In most ases, the Metropolis algorithm is
used to generate a sequene of ongurations in a Markov hain. This method has
an inherent equilibrationphase, during whih the ongurations are not
representa-tive of the desired density and must be disarded. If statistis are gathered before
the walkers have equilibrated, ontamination by nonequilibrated ongurations an
greatly redue the auray of the results. Beause separate Markov hains must be
equilibrated for the walkers on eah proessor, the use of a long equilibration phase
has a profoundly detrimental eet on the eÆieny of large parallel alulations.
The stratied atomi walker initialization (STRAW) shortens the equilibration
by avoiding ontamination by nonequilibratedongurations. Shortening the length
of the equilibrationphasealso resultsinsigniant improvementsinthe eÆieny of
parallel alulations, whih redues the total omputationalrun time. For example,
using STRAW rather than a standard initializationmethod in 512 proessor
alu-lations redues the amount of time needed to alulate the energy expetation value
78℄ an inprinipleprovideenergies towithinhemial auray(2 kal/mol)[57,
58, 59℄. The omputational expense of QMC sales with system size as O(N 3
better [61, 62, 63, 64℄, albeit with a large prefator. This is muh more favorable
ou-pled luster, whih tend to sale very poorly with the size of the system, generally
to N!) [79℄. Moreover, the stohasti nature of QMC makes it relatively easy
in areasonable amount of time despite the slowonvergene ofMonteCarlo.
80℄, QMC willbeomeapowerfultoolfor ondutingaurate simulationson
straightforward and eÆient onheterogeneous and homogeneous omputers. To this
end, a nite all-eletron QMC program, QMBeaver, has been written and used to
develop and demonstrateseveral new algorithms[49, 50, 81℄.
Before statistis gathering begins in a QMC alulation, the walkers must be
allowed to equilibrate so that their ongurations are proportional to the desired
density. It is impossible to alulate aurate expetation values if nonequilibrated
ongurationsontaminatethe statistis. Inordertoensuretheirstatistial
indepen-dene, the walkers must equilibrate separately. This makes the equilibration phase
a serial step of the alulation and a major limiting fator in the eÆieny of
par-allel alulations. These onsiderations make it imperative that the equilibration
proess be fast and reliable. For example, we show that for the energeti material
ongurations are generatedby a standard method.
We present here a simple method for hoosing initial eletroni ongurations