Carlo

Thesis by

Daniel Ross Fisher

In Partial Fulllmentof the Requirements

for the Degree of

Dotor of Philosophy

California Instituteof Tehnology

Pasadena, California

2010

Aknowledgements

Abstrat

Quantum Monte Carlo is a relatively new lass of eletroni struture methods that

has thepotentialtoalulate expetationvaluesforatomi,moleular,and materials

systems to within hemial auray. QMC sales as O(N 3

) or better with the size

of the system, whih is muh more favorable than traditional eletroni struture

methods apableofomparable auray. In addition,the stohasti natureof QMC

makes itrelatively easy toparallelizeovermultipleproessors.

QMC alulations use the Metropolis algorithm to sample the eletron density

of the system. This method has an inherent equilibration phase, during whih the

ongurations donot represent the desired density and must be disarded. Beause

the time spentonequilibrationinreases linearly withthe number ofproessors, this

phase limits the eÆieny of parallel alulations, making it impossible to use large

numbers of proessors tospeed onvergene.

This thesis presents an algorithmthat generates statistially independent walker

ongurationsinregionsofhighprobabilitydensity,shorteningthelengthofthe

equi-libration phase and ensuring the auray of alulations. Shortening the length of

the equilibration phase greatly improves the eÆieny of large parallel alulations,

whihwillallowQMCalulationstousethenextgenerationofhomogeneous,

hetero-geneous, anddistributedomputingresoures toondut highlyauratesimulations

onlarge systems.

The most ommonformulationof diusion MonteCarlo has two souresof error:

the time step used to propagate the walkers and the nodes of the trialfuntion. In

multionguration self-onsistent eld trial funtions and time steps ranging from

10 4

to 10 2

au. The results are ompared to values from experiment and high

quality ab initio alulations, as well as the reently developed X3LYP, M06, and

XYG3 density funtionals. The appropriate time step and trial funtions for the

Contents

Aknowledgements iii

Abstrat iv

1 Eletroni Struture Theory 1

1.1 Quantum Mehanis . . . 2

1.1.1 Cusp Conditions . . . 3

1.1.2 The VariationalTheorem. . . 4

1.2 Approximate Methods . . . 5

1.2.1 Hartree-Fok . . . 6

1.2.2 Post Hartree-Fok Methods . . . 8

1.2.2.1 Conguration Interation . . . 8

1.2.2.2 Coupled Cluster . . . 9

1.2.2.3 Multionguration SCF . . . 10

1.2.3 Perturbation Theory . . . 11

1.2.4 ExtrapolatedMethods . . . 12

1.2.5 Density Funtional Theory . . . 13

2 Parallel Computing 16 2.1 Designing Parallel Algorithms . . . 17

2.2 Analyzing Parallel Algorithms . . . 18

3 Random Number Generation 23

3.1 RandomNumber Generation. . . 23

3.1.1 UniformRandom Numbers . . . 24

3.1.2 The Transformation Method . . . 25

3.1.3 The VonNeumann Method . . . 25

3.1.4 The Metropolis Algorithm . . . 26

4 Quantum Monte Carlo 28 4.1 VariationalMonteCarlo . . . 29

4.2 VMCTrialFuntions . . . 31

4.3 DiusionMonte Carlo . . . 33

4.4 GeneratingCongurations inDMC . . . 36

5 An OptimizedInitializationAlgorithmtoEnsure Aurayin Quan-tum Monte Carlo Calulations 40 Abstrat 41 5.1 Introdution . . . 42

5.2 The Metropolis Algorithmand the InitializationCatastrophe . . . 43

5.3 WalkerInitialization . . . 46

5.3.1 STRAW . . . 48

5.3.2 EquilibrationBehavior . . . 51

5.3.3 Timingand SpatialCorrelation . . . 57

5.3.4 Parallel Calulation EÆieny . . . 59

5.4 Conlusion . . . 62

6.3 Experimental and ComputationalResults. . . 68

6.4 ComputationalMethods . . . 69

6.4.1 Quantum Monte Carlo . . . 72

6.4.2 QMC Proedures . . . 76

6.5 Results and Disussion . . . 77

6.5.1 Reation1 . . . 78

6.5.2 Reation2 . . . 81

6.5.3 Reation3 . . . 82

6.5.4 Disussion . . . 83

6.6 Conlusion . . . 85

Chapter 1

Eletroni Struture Theory

Thegoalofeletronistruturetheoryistounderstandthegeometries,reations,and

other properties of moleules and materials based on simulations of their eletroni

struture. The behaviorofpartilesonthis sale isgovernedby the laws of quantum

mehanis. Although these laws are well understood, applying them to nontrivial

systemsleads toequationstooompliatedtosolveexatly. Sine exatsolutionsare

notpossible,researhersuse approximatationsthattradeaurayforomputational

tratability.

Approximate eletroni struture methods are lassied as ab initio methods,

whih are based only on the laws of quantum mehanis, orsemiempirial methods,

whih use experimental results to determine funtional forms and t parameters.

Methodsof both types are used to understand and preditexperimentalphenomena

suh as reationmehanisms, eletrialproperties, and biologial ativity for a wide

variety of systems. This hapter ontains a very basi introdution to the laws of

quantum mehanis and the approximate methods used to apply them to moleular

and solid statesystems. Further informationonquantummehanis an befound in

1.1 Quantum Mehanis

Thefundamentalpostulateofquantummehanisisthatthe energyandallother

ob-servablepropertiesofanatomormoleuleareexpressedinitswavefuntion,whihan

beobtained bysolving theShrodingerequation. Exatsolutionsforthe Shrodinger

equationare possiblefor onlythe simplestsystems. Largersystems leadveryquikly

toequations with too many dimensions tobe solvable.

Quantummehanisditatesseveral neessary featuresfor ,the wavefuntionof

apartile. Theprodutofthewavefuntionwithitsomplexonjugate,

=j j 2

,is

interpreted as the probabilitydensity funtion for the position of the partile. Sine

the partile must exist somewhere, integrating j j 2

over all spae must give unity.

Wavefuntions that satisfythis ondition are referred toasnormalized. Aordingly,

we multiply by a normalizationonstant so that R

dj j 2

=1.

The mathematial framework of a many partile wavefuntion, , must aount

forthe fatthat eletronsare indistinguishablefromeah other. Thismeansthat the

spatial probabilitydensity, , annotvarywith the interhange of any two eletrons:

(1;2)=j (1;2)j 2

=j (2;1)j 2

=(2;1): (1.1)

Therefore,

(1;2)= (2;1): (1.2)

Partiles suh as eletrons with half-integer spin are fermions, for whih the

wave-funtion isantisymmetri: (1;2)= (2;1).

The time-dependent Shrodinger equation determines the evolution of the

wave-funtion of asystem with time [8℄:

i

t

j (X;t) i= ^

Hj (X;t)i; (1.3)

where ^

of the system.

The solution tothe time-dependentShrodinger equation an be expanded as

j (X;t)i= X j j e iE j t j j

(X)i; (1.4)

where the oeÆient

j = h

j

(X)j (X;0)i and the E

j

and j

j

(X)i are the

eigen-values and eigenfuntions of the time-independent Shrodingerequation:

^

Hj

j

(X)i=E

j j

j

(X)i: (1.5)

Beause they do not hange with time, the eigenfuntionsj

j

(X)i are known as

the stationary states of the system. Eah stationary state has an assoiated

eigen-value, E

j

, whih an be interpreted as its energy. The stationary states are usually

ordered so that E

0 E

1 E

2

, with the lowest energy state, j

0

(X)i, being

alled the groundstate.

Beause ^

H is aHermitianoperator,itseigenvalues are realand itseigenfuntions

are orthogonal to eah other and span the spae of all possible solutions. They an

alsobehosen to be normalized,so that

h

i

(X)j

j

(X)i=Æ

ij

; (1.6)

where Æ

ij

is the Kronekerdelta: Æ

ij

equals 1 if i=j and 0otherwise.

1.1.1 Cusp Conditions

The Hamiltonianoperatorfora system ofN eletrons andK nulei with harges Z

L

and masses M

L is ^ H= 1 2 N X i=1 r 2 i 1 2 K X L=1 1 M L r 2 L N X i=1 K X L=1 Z L r iL + N X i=1 N X j>i 1 r ij + K X I=1 K X J>I Z I Z J r IJ ; (1.7)

whereloweraseindiesrefer toeletrons, upperase indiesrefertonulei, and r

In equation 1.7 and throughoutthis work, atomi unitsare used, in whih h=1,

m

e

= 1, jej = 1, and 4

0

= 1, where m

e

is the mass of an eletron, jej is the

magnitude of itsharge, and

0

isthe permittivity of freespae.

The Coulombtermsinthe Hamiltoniandiverge whentwopartilesapproaheah

other. The Shrodinger equation an be solved analytially for these ongurations

beausethekinetiandpotentialtermsofthetwoapproahingpartilesdominatethe

others. In order for the energy of the system to benite, a divergene inthe kineti

energy must exatly anel the divergene in the potential. Solving the Shrodinger

equation for these ongurations to ahieve this anellation leads to the following

usp ondition for the wavefuntion [9℄:

lim r ij !0 r ij = ij q i q j

l+1 lim r ij !0 ; (1.8) where ij

is the redued mass of the partiles, q

i and q

j

are their harges, and l is 1

for same-spin eletrons and 0 otherwise. An aurate wave funtionmust satisfy the

usp ondition for eahpair of partilesinthe system.

1.1.2 The Variational Theorem

The most powerful tool researhers have in onstruting approximate ground state

wavefuntions is the variational priniple, whih provides a way to ompare their

quality. The exat eigenfuntions, j

i

(X)i, of the Hamiltonianspan the spae of all

possible wavefuntions for the system. Therefore, any normalizable trial

wavefun-tion, j

T

(X)i,that satisesthe boundaryonditions of thesystem an beexpanded

in termsof the j

i (X)i:

j

T

(X)i = X i b i j i

(X)i; (1.9)

b

i

= h

i

(X)j

T

the trialwavefuntion:

hE

T i=

h

T (X)j

^

Hj

T (X)i

h

T

(X)j

T (X)i

= P

i jb

i j

2

E

i

P

i jb

i j

2 E

0

; (1.11)

where the equality appliesif j

T

(X)i=j

0 (X)i.

The expetation value of the energy of a trialwavefuntion is anupperboundto

the ground state energy. The loser the trial wavefuntion is to the atual ground

state, the lower its energy will be. This provides a way to approximate the ground

state. First,a parametrizedwavefuntion isonstrutedwith a formthatan easily

be evaluated. Then the parameters are adjusted to minimize the expetation value

of the energy. This is the losest approximation to the ground state in the spae of

the adjustableparameters. Physialarguments must beused inhoosing the formof

the trial wavefuntion: it determines the restritions on the interations that an be

desribed and thereforerepresents a model.

1.2 Approximate Methods

Inmostases,therstsimpliationtotheShrodingerequationistheBorn-Oppenheimer

approximation,whihmakesuseofthefatthatthemassesofnuleiaremuhgreater

than that of an eletron. The eletrons see the heavy, slow-moving nulei as almost

stationaryharges,and thenulei seethe muhfaster eletronsasessentiallya

three-dimensionaldistribution ofharge. The Born-Oppenheimer approximation simplies

the moleularproblembytreatingtheeletroniand nulearmotionsseparately[10℄.

In this method, one assumes a xed onguration for the nulei, and for this

onguration solves an eletroni Shrodinger equation to nd the eletroni wave

funtion and energy. This proess isrepeated for dierent ongurations togive the

eletronienergyasafuntionofthepositionsofthenulei. Thenulearonguration

thatminimizestheenergyistheequilibriumgeometryofthemoleule. Theeletroni

energy levelsfor a given eletroni state.

1.2.1 Hartree-Fok

Thebasis foralmostallmethodstosolvetheeletroni partofthe Shrodinger

equa-tion is the Hartree-Fok (HF) method. In HF, the trial wavefuntion is expressed

as an antisymmetri produt of normalized, orthogonal moleular orbitals,

i . The

simplestway toonstrutatrialwavefuntionfromaset oforbitalsistouse aSlater

determinant, a framework that ensures the antisymmetry of the overall

wavefun-tion [11℄: AS (x 1 ;x 2 ;:::;x

N )= 1 p N! 1 (x 1 ) 2 (x 1 ) N (x 1 ) 1 (x 2 ) 2 (x 2 ) N (x 2 ) . . . . . . . . . 1 (x N ) 2 (x N ) N (x N ) : (1.12)

In equation 1.12, x

i

ontains the spae and spin oordinates of eletron i. Sine the

determinant of a matrix hanges sign if two rows or olumns are interhanged, the

overall wavefuntion willhave the properantisymmetrywith respet topermutation

of the eletrons.

The moleularorbitals an be fatored intospatial and spin omponents:

(x)= (~r;)=(~r)(); (1.13)

whereisaspatialorbitalandisaspinfuntion,eitheror. Thespatialorbitals

are written aslinear ombinationsof basis funtions:

= X ; (1.14) wherethe

arethemoleularorbitalexpansionoeÆients. Thebasisfuntions,

,

type orbitals (GTO) are usually used as basis funtions. GTOs have the following

radial part:

GTO

(r)=d

r

l

exp

r

2

: (1.15)

Sine the derivativeof a Gaussianis zero atitsorigin, these funtions annotsatisfy

the eletron-nulear usp ondition of setion 1.1.1. The resulting multienter

inte-grals an be evaluated analytially, however, so alarge number of Gaussians an be

used in the basis set with littleomputationalexpense.

Slater type orbitals (STO) have the orret form to satisfy both the

eletron-nulear usp ondition and the long range behavior of moleular orbitals, but are

not typially used in basis sets beause they lead to very ompliated integrals in

alulations:

STO

(r)=d

r

l

exp(

r): (1.16)

Sinethemoleularorbitalsareonstrutedfromthebasisfuntions,thebasisset

restrits them toertain shapes and regions of spae. The more funtions in a basis

set, the more exibility it has to approximate moleular orbitals. Larger basis sets

generally produe better results in omputations, but require more omputer time.

Sine an eletron has a nite probability of existing anywhere in spae, an innite

basis set would be neessary to ompletelydesribe its possible position.

In order to solve for the orbital expansion oeÆients, the Hartree-Fok method

makesuseofthevariationalpriniple. Minimizingtheexpetationvalueoftheenergy

ofthewavefuntionleadstoaseriesofequations,whihanbewritteninmatrixform:

FC =SC; (1.17)

where eah element is a matrix. The Fok matrix, F, represents the average eets

of the eld of all the eletrons on eah orbital. The matrix C ontains the orbital

oeÆients, S indiates the overlap between the orbitals,and is a diagonal matrix

Thus equation 1.17 is not linear and must be solved with an iterative proedure

alled the self-onsistent eld (SCF) method. First, an initial guess for the orbital

oeÆients isformed, and the orrespondingdensity matrixis onstruted. Usingit,

the Fok matrix is formed. Then, solving the eigenvalue problemyieldsa new set of

orbitaloeÆients. This proedure is repeated untilboth the orbitaloeÆients and

the energy have onverged. Atthis point,the orbitalsgenerate a eld that produes

the same orbitals. This method produes both oupied and virtual (unoupied)

orbitals. Thetotalnumberoforbitalsformedisequaltothenumberofbasisfuntions.

Solving the eigenvalue problem is the slowest step of the proess. It involves

diagonalizinga matrix, aproess that sales asO(N 3

),where N is the linear size of

the matrix. In this ase, N is the number of basis funtions.

1.2.2 Post Hartree-Fok Methods

The errors of Hartree-Fok are due to the fat that it treats the repulsion of the

eletrons for eah other in an average way and neglets the details of their motion.

The shape of the orbital an eletron oupies is determined by the potential eld

of the nulei and the density of the other oupied orbitals. An eletron sees only

the \mean eld" of the other eletrons, whih allows them to ome lose together

more oftenthan they shouldand makesit impossible forthe wavefuntion tosatisfy

the eletron-eletron usp onditions of setion 1.1.1. The dierene in energy that

would result from properly allowing the eletrons to avoid eah other is alled the

orrelation energy. Several methods go beyond Hartree-Fok and attempt to treat

this phenomenon properly.

1.2.2.1 Conguration Interation

The onguration interation (CI) method uses the virtual orbitals generated by

Hartree-Fok in addition to the oupied orbitals to onstrut a wavefuntion as a

expansion oeÆientsare found by diagonalizing the resultingHamiltonianmatrix:

CI (~x

1 ;~x

2

;:::;~x

N )= X m a m AS m (~x 1 ;~x

2

;:::;~x

N

): (1.18)

If a Slater determinant orreponding to every possible oupation of the orbitals is

inluded in the expansion, the alulation is a \full CI." In most ases, full CI is

impossiblebeausethe number of possible Slaterdeterminantsis too large.

In pratie, CIalulations are usually arriedout by inludinga limitednumber

of determinantsinthe expansion. ACI singles(CIS) alulation exites one eletron

atatimeintoavirtualorbital,aCIdoubles(CID)exitestwoatatime,aCISD

al-ulationinludes singles and doubles, a CISDTalulation inludes singles, doubles,

and triples, et. CI alulations an providequantitative results (within2 kal/mol)

for energies of moleules, but are extremely time onsuming and require immense

amounts of memory,even for small systems and minimalbasis sets. In addition, the

orrelationenergyreoveredsalespoorlywiththenumberofongurationsinluded.

1.2.2.2 Coupled Cluster

In oupled luster (CC) alulations, the trial wavefuntion is expressed as a linear

ombinationofSlaterdeterminants,butanexponentialformofanexitationoperator

is used togenerate the ongurations and alulate the energy [12℄:

j

CC

i=exp ^ T j HF i: (1.19)

Theexitationoperator, ^

T,makesSlaterdeterminantsbyexitingeletrons fromthe

groundstate into virtual orbitals. Equation 1.19 an be expanded ina Taylor series:

j

CC

i = exp ^ T j HF i = j HF i+ ^ T 1 j HF i+ ^ T 2 + 1 2 ^ T 2 1 j HF i + ^ T 3 + ^ T 1 ^ T 2 + 1 ^ T 3 1 j HF

whereT

1

reatessingle exitations,T

2

reatesdoubleexitations,and soon. In

equa-tion 1.20, the termsare grouped into levelsof exitation. At eah level of exitation,

several terms ontribute. At the seond level, for example, ^

T

2

generates onneted

double exitations, while ^

T 2

1

generates two disonneted single exitations.

Coupled luster makes it easy to treat moleules of dierent sizes with the same

level of orrelation, whih is important for hemial reations, in whih bonds may

form or a large moleule may dissoiate into fragments. Treating the produts and

reatants of areation onsistently is neessary toget aurate energy dierenes.

Likeongurationinteration,oupledlusteralulationsarenamedbythelevels

of exitation inluded in the expansion. A CCSD alulation inludes single and

doubleexitations, while CCSD(T)inludes triplesasa perturbation. CCSD(T) isa

very popular method for onduting aurate alulations with reasonable ost, and

is oftenused asa benhmark to ompare the results of other methods. The expense

of CCSD(T) sales as O(N 7

) with the number of basis funtions, whih limits its

applition tosmall moleules and basis sets.

1.2.2.3 Multionguration SCF

Ina multiongurationselfonsistenteld (MCSCF) alulation,the userdenes an

\ativespae"onsistingofasubsetoftheeleronsandorbitalsofamoleule. The

a-tiveeletronsareexitedintotheativevirtualorbitalstoformasetofdeterminants,

and both the orbitals and CI expansion oeÆients are variationallyoptimized [13℄.

If a full CI is arried out on the ative spae, and all possible oupations of the

ative orbitals are onsidered, the alulation is alled a omplete ative spae SCF

(CASSCF) [14℄ or fully optimized reation spae (FORS) [15℄ alulation. Beause

both the orbitals and CI oeÆients are optimized, MCSCF oers the most general

approah available to omputing eletroni struture. The large number of

varia-tional parameters makes the optimization a hallenge, so users must be areful to

onlyinludethe eletronsand orbitalsinvolved inthe reationunderinvestigationin

form of MCSCF in whih eletrons are exited pairwise from valene orbitals into

virtual orbitals [16℄. Although the seletion of ongurations is onstrained, the

optimization proedure for GVB alulations is muh more systemati and reliable

thanageneralMCSCFalulation. TheGVBwavefuntionisthe simplestformthat

allows moleules to dissoiate into open shell fragments, whih allows it to produe

aurate dissoiation urvesfor hemial bonds.

1.2.3 Perturbation Theory

Mller-Plesset (MP) perturbation theory is a non-iterative method for alulating

the orrelationenergy of aset oforbitals. In perturbation theory, the Hamiltonianis

divided into two parts:

^

H = ^

H

0 +

^

V; (1.21)

where ^

H

0

is exatly solvable and ^

V is a perturbation that is assumed to be small

ompared toit. Theperturbed wavefuntionan beexpanded asa powerseries in:

= 0

+ (1)

+ 2 (2)

+ 3 ( 3)

+: (1.22)

The perturbed wavefuntion is substitutedintothe Shrodingerequation:

^

H

0 +

^

V

0

+ (1)

+

=

E 0

+E ( 1)

+

0

+ (1)

+

: (1.23)

Equatingtermswith thesamepowerofgivesformulasfororretions tothe energy

for varying lengths of the expansion.

Ineletronistruturetheory,theunperturbed Hamiltonianandwavefuntionare

the Fok operator and its ground state Slater determinant. The perturbation, ^

V, is

integrals between determinants:

E (2)

= X

t jh

0

jr 1

12 j

t ij

2

E

t E

0

; (1.24)

where the index t sums over determinants in whih two eletrons have been exited

intovirtual orbitals. It iseasytosee fromthe denominatorof equation1.24 that the

greatestontributionstotheseond-orderorretionwillomefromlow-lyingexited

states, whose energy islose to the ground state energy.

Mller-Plesset perturbationtheory is referred toby the order of the expansion of

theperturbation. Seondorder(MP2)isommonlyused,andthird(MP3)andfourth

(MP4) orderare implementedinmany quantumhemistryprograms. Multireferene

MP theory, in whih an MCSCF or CI wavefuntion is used as the unperturbed

wavefuntion, has alsobeen developed [17℄.

While MP2 generally gives good results for moleular geometries and hanges

in energy for hemial reations, reent studies omparing levels of perturbation for

dierenthemialsystemsandbasissetshaveshownthatMPperturbationtheory

en-ergiesarenot neessarilyonvergentinthe limitofhigherordersofperturbation[18℄.

1.2.4 Extrapolated Methods

Severalmethodshavebeendevelopedtoapproximateanextremelyexpensive

alula-tionbysystematiallyombininglessaurateresults. Althoughmultiplealulations

arerun,theoverallostanbesigniantlylessthanthatofthesinglehighlyaurate

alulation.

The omplete basis set (CBS) methods address the errors due to using a nite

basis set in alulations. They extrapolate to an innite basis using expressions

for the orrelation energy reovered for eletron pairs as funtions of higher angular

momentumareinludedinthebasisset[19℄. ACBSalulationonsistsofa

Hartree-Fok alulation with a large basis set, an MP2 alulation with a moderate basis

obtained for a high level alulation withan innitebasis set.

The Gaussian-1 (G1) method approximates a quadrati CISD(T) result with a

large basis set using four smaller alulations [20℄. It orrets for trunation of the

basis set by arrying out MP4 alulationswith three dierent basis sets and for the

limitedleveloforrelation by arryingout aquadrati CISD(T) alulationwith the

smallestbasisset. Theresultsareenteredintoaformulathatinludessomeempirial

orretionsfortheremainingsystematierrorstogivetheG1energy. G2[21℄,G3[22℄,

and G4 [23℄ methodshave subsequently been developed.

The foal point method expliitly examines the onvergene of the energy with

respet to both the basis set and level of orrelation to estimate the ab initio limit

within the Born-Oppenheimer approximation [24℄. In the foal point proedure, HF

energies areextrapolatedtothe CBSlimit,andCCSDTand CCSDT(Q)alulations

are arriedout using a moderatebasis set. The results are ombinedtoestimate the

CBSlimitoftheCCSDT(Q)energy. Corretionsfornon-Born-Oppenheimer[25℄and

speial relativisti eets [26℄ are added togivethe foal point result.

1.2.5 Density Funtional Theory

Density funtional theory (DFT) is another widely used lass of methods for

tak-ing into aount the eets of eletron orrelation. DFT is based on the theorem

of Hohenberg and Kohn, whih proves the existene of a funtionalthat determines

the exat eletron density and energy for a given a nulear potential eld [27℄.

Un-fortunately, the theorem does not provide the form of the exat funtional. While

the exat funtional would take an eletron density as input and return the energy,

approximate funtionalspartitionthe energy into several terms[28℄:

E =E T

+E V

+E J

+E XC

: (1.25)

The rst three terms orrespond to the kineti energy, the attration between the

between the eletrons.

Inpriniple,apuredensityfuntionalmethodwoulddealdiretlywiththeeletron

density, a funtion of the three spatial variables. No orbitals would be involved, and

alulations would sale linearly with the size of the system. In pratie, however,

a method similar to Hartree-Fok is used. The wavefuntion is written as a Slater

determinant of orbitals, and the Fok operator is replaed with one that takes the

eets of eletron orrelationintoaount.

The exhange-orrelation energy of equation 1.25 is separated intoexhange and

orrelation terms. The exhange energy arises from the interations between same

spin eletrons, whih are kept apart by the antisymmetry of the spatial part of the

wave funtion. The orrelation energy is due to the interations between opposite

spin eletrons.

The exhange and orrelation energy terms are alulated by funtionals of the

density. The basis for most funtionals is the loal density approximation, in whih

eletrons uniformlyoupy a volume with a positive bakground harge to keep the

overall harge neutral. Forthis system, the exhange energy has a simple form:

E X

LDA =

3

2

3

4

1

3 Z

d 4

3

: (1.26)

Loal orrelationfuntionals are more ompliated,but are alsoinuse [29℄.

The eletron density of atoms and moleules, however, is not uniform, so

re-searhers have developed exhange and orrelation funtionalsthat use the gradient

of the density aswell as itsvalue [30, 31℄.

Some of the most aurate density funtional methods in use are hybrid

fun-tionals, in whih the Hartree-Fok denition of the exhange energy, whih is based

on moleular orbitals, is inluded as a omponent of the exhange-orrelation

Fok terms:

E XC

B3LYP =E

X

LDA +

0

E X

HF E

X

LDA

+

X E

X

B88 +E

C

VWN3 +

C

E C

LYP E

C

VWN3

;

(1.27)

where E X

B88

is a gradient-orreted exhange term, E C

VWN3

is a loal orrelation

term,and E C

LYP

isa gradient-orretedorrelation term. The oeÆients

0 ,

X , and

C

were ttoexperimentaldata.

Density funtionaltheory isaverypopularway forresearherstoinludeeletron

orrelation in alulations and obtain results that are aurate enough for many

ap-pliationswith moderate omputationalexpense. These methods have been applied

suessfullytoalargevarietyof systemsand havebeenabenettomanyareasof

re-searh. Whilepost-Hartree-Fokmethodsan alwaysbeimproved byinludingmore

ongurations, using a larger basis set, or alulating higher orders of perturbation,

DFT suers from the fat that there is no systemati way to improve its results.

New density funtionalsare ontinuously being developed [34,35, 36℄, but none give

Chapter 2

Parallel Computing

The extraordinary inrease in omputingpoweravailableto researhers over the last

ftyyearshas revolutionizedengineering,astronomy,biology,hemistry,physis,

eo-nomis, and many other elds. The long term trend in the number of iruits that

anbeplaedonanintegratediruitinexpensivelywasdesribedin1965byGordon

Moore:

The omplexity forminimum omponentosts has inreased atarate

of roughly a fator of two peryear .... Certainly overthe short term this

ratean beexpetedtoontinue, ifnottoinrease. Overthelongerterm,

therate ofinrease isabitmore unertain,although thereisnoreasonto

believeitwillnotremainnearlyonstantforatleast10years. Thatmeans

by 1975, the number of omponents per integrated iruit for minimum

ost will be 65,000. I believe that suh a large iruit an be built on a

single wafer [37℄.

In 1975, Moore's predition for the time to double the number of transistors on a

iruit was revised to 18 months. The trends in almost every measure of eletroni

devies, suhasproessing speed, memoryapaity,and omputing performane per

unit ost, are losely relatedto Moore's law.

It has oftenbeen preditedthat hip designers would notbeabletokeep up with

hip makers prediting new proessors onsistent with Moore's law for another ten

years.

Notsatisedwiththeomputingpoweravailableinasingleproessor,researhers

have developed tehniques for parallel omputing, in whih multiple proessors

on-neted with a network are used together tosolve a problem. This hapter desribes

some of the ways parallel programs are designed and analyzed. Additional

informa-tion an be found in referene [38℄.

2.1 Designing Parallel Algorithms

Inorderforanalgorithmtobeexeuted inparallel,theprogrammermust deompose

the work into tasks and identify whih tasks an be exeuted onurrently. The

onurrent tasks an be assignedto dierent proessors to be exeuted. In order for

a proessor to be able to omplete itstask, the appropriate instrutions, input, and

output must be ommuniated.

The granularity of aproblemreferstothenumberandsize ofthetasksintowhih

itan bedeomposed. The degree of onurreny isthe numberof tasksthat an be

exeuted simultaneously. This number isusually less than the total numberof tasks

due to dependenies amongthem.

There are many tehniques for deomposing aproblemintotasks. The nature of

the problem determines how it an be divided and how the dierent tasks interat

with eah other. A problemforwhiha ne-graineddeompositionintoindependent

tasks is possible is well suited for parallel omputing, and will benet greatly from

beingarriedoutonmultipleproessors. Alessidealappliationmaybenetlessfrom

the parallel environment. In espeially unfavorable irumstanes, suh as if many

proessors are idle while they waitfor another task tosupply them with input, or if

2.2 Analyzing Parallel Algorithms

Usingtwie as many proessors to exeute a programrarely results init ompleting

in half the time or generating twie as muh useful output. Overhead expenses,

unavoidable in the parallel environment, subtrat from the performane. Computer

sientists have developed several metristomeasure the expense and performane of

parallelalgorithms.

The most basi measure of performane is the speedup, the ratio of the serial

exeutiontime, T

S

, tothe parallel exeutiontime, T

P :

Speedup= T

S

T

P

: (2.1)

The largest soure of overhead is usually ommuniation of data between

pro-essors. In addition, some proessors may beome idle if they nish their task and

must wait for a new one. The parallel algorithmmay also have to arry out exess

omputationomparedtothe serialalgorithm. Forexample,if theresult ofaertain

alulationmustbeavailabletoeahtask,itmayhavetobearriedoutseparatelyon

eah proessor in a parallel alulation, while the serial algorithm only has to arry

out the alulation one.

The overhead for a parallel algorithm is the dierene between the parallel and

serial osts:

T

o =pT

P T

S

; (2.2)

where pis the numberof proessors.

Animportantmeasure fortheeetivenessof aparallelalgorithmisitseÆieny,

whihis the ratio of the serial ost tothe parallel ost:

E = T

S

pT

P

(2.3)

= T

S

T

S +T

o

inreases with the numberof proessors. As an be seen inequation 2.4, aninrease

in overhead auses a derease in eÆieny. The lossof eÆieny leads to dereasing

returns asmore proessors are used to exeutean algorithm.

The dereasing gain in performane as the number of proessors inreases is

ex-pressed in a slightly dierent way in Amdahl's Law [39℄. If P is the fration of

an algorithm that an be parallelized and S is the fration that must be omputed

serially, the speedup an be writtenas afuntion of the numberof proessors:

Speedup(p) =

S+P

S+ P

p

(2.5)

= 1

S+ P

p

: (2.6)

As the number of proessors inreases, the speedup asymptotially approahes 1

S .

Aording to this formula, if the serial portion of an algorithmis 10%, the greatest

possible speedup is ten times,no matterhow many proessorsare used. As aresult,

muh of the eort in designing parallel algorithmsgoes intoparallelizing as muh of

the work as possible.

After examiningEqs 2.4 and 2.6, it would be easy to beome skeptial as to the

viability of massively parallel omputers, sine the benet of using more proessors

is bounded. In pratie, however, the size of the problem usually inreases with the

number of proessors. When given more proessors,researhers willusually inrease

thesizeoromplexityoftheproblemtokeeptheruntimeapproximatelyonstant. As

the problem size inreases, the fration of the run time spent onoverhead dereases,

whihimproves the eÆieny for large numbers of proessors.

2.3 Superomputers

Sine 1993, the Top500 list has kept trak of the most powerful superomputers in

the AdvanedSimulationand Computingprogram[41℄. Theseare homogeneous

ma-hines, meaningthey are onstruted fromone typeof proessor, with huge memory

and fast interonnetion hardware.

Several alulations presented later in this work were arried out using Blue

Gene/L at Lawrene Livermore National Laboratory. The unlassied portion of

this mahine onsists of 81,920 IBM PowerPC proessors running at 700 MHz [42℄.

Although the proessor speed is slow, the great number of proessors gives uBGL

almost 230 TFlopsof omputing power.

The most powerful omputer in the world at the moment is Roadrunner at Los

Alamos NationalLaboratory. Roadrunner has 13,824 1.8GHzAMD Opteron

proes-sors tohandleoperations,ommuniations, andsome omputationand 116,640IBM

PowerXCell 8iproessorstohandleoatingpoint-operations. Roadrunneristhe rst

mahine tohave over1 petaop sustained performane [43℄.

Not to be outdone, LLNL has announed they will be onstruting Sequoia, a

Blue Gene/Q mahine that will exeed 20 petaops, to go online in 2011. Sequoia

willhave more omputingpower than the urrent Top500 listombined [44℄.

The rate ofesalation inomputing poweris easytosee. Aess tothe DOE

ma-hinesisdiÆulttoobtain,however, andmostresearhersdonothavetheresouresto

onstrutandmaintainthissortofsuperomputer. Theadvanesinproessors,

inter-onnetion hardware, and management software brought about by the DOE projet

have improved the performane of the mahines an individual researh group an

aord.

2.4 Beowulf Clusters and Grid Computing

In ontrast to the massively expensive homogeneous omputers of the previous

se-tion, researhers an assemble a low ost luster using o-the-shelf proessors and

onnetion hardware in a Beowulf framework [45℄. Suh a luster an be

appliationsdemandandresouresallow,orretireproessorsastheybeomeobsolete.

Another development in parallel omputing is the use of loosely oupled, widely

distributed grids of proessors to arry out omputations. Some examples inlude

SETIhome[46℄andFoldinghome[47℄,whihuseidleinternetonnetedomputers

to searh for extraterrestrial intelligene and simulate protein folding. BOINC is a

projet at UC Berkeley that has tools to help researhers develop software for and

onnet to distributed volunteer omputing resoures [48℄.

Parallelalgorithmsmusthave ertainharateristisinordertoperform wellina

heterogeneous, looselyoupled environment. An appliationthat must ommuniate

large amounts of data among the tasks or is unable to balane the work between

proessorsrunningatdierentspeedswillenounterlargeoverheadostsandperform

very poorly. A parallel algorithm with low ommuniations requirements and the

ablity to use proessors running at dierent speeds will be able to eÆiently use

inexpensive omputing resoures toarry out large omputing jobs.

The bulkof the omputingeort in traditionaleletroni struture methodssuh

asthosedisussedinhapter1isspentdiagonalizingmatries. Thisoperationisvery

diÆulttoparallelize,sineeahstepinvolvesalloftherows. Asaresult,alulations

suh as DFT and oupled luster are unable to eÆiently use more than a few tens

of proessors. Sine oupled luster sales as O(N 7

) with the size of the system,

the inability to use large numbers of proessors prevents researhers from arrying

out highly aurate alulations on large systems, suh as nanodevies or biologial

systems.

Quantum Monte Carlo (QMC) is an alternate approah to eletroni struture

simulationsthatalulates expetationvalues stohastiallyratherthan analytially.

ThestohastinatureofQMCmakesitwellsuitedforparallelimplementation. QMC

an,inpriniple, alulateexatexpetationvaluesandsalesasO(N 3

)withthe size

of the system. QMC an beformulatedtohave very small memory and

ommunia-tions requirements and automatially balane the work between proessors running

Chapter 3

Random Number Generation

Randomnumbershaveappliationsinareassuhasryptography,eletroni gaming,

and statistial sampling and analysis. In addition, stohasti, or non-deterministi,

simulationsan beused tomodel manytypes ofphysialand mathematialsystems.

In thesesimulations,the behavior ofsome part of thesystem is randomlygenerated.

Beause of the essential role played by random numbers, they are grouped into a

lass alled \Monte Carlo" methods. Eletroni struture appliations inlude

Vari-ational Monte Carlo (VMC), in whih the parameters of a trial wavefuntion are

optimized, and Diusion Monte Carlo (DMC), whih has the potential to alulate

exat expetation values for many-bodyquantum mehanial systems.

3.1 Random Number Generation

Trulyrandomnumbersanbegeneratedbasedonunpreditablephysialphenomena,

suh as the noise of an analog iruit, the deay of radioative nulei [51℄, or

bak-ground atmospheri radio noise [52℄. Computers, on the other hand, only operate

basedonprogrammedinstrutions. Theyan generatesequenes of\pseudorandom"

numbers that lak patterns, but are determined by a formula. Statistial tests have

beendeveloped todetetorrelationsinsequenesofnumbers. Thequalityofa

3.1.1 Uniform Random Numbers

Uniformrandomnumbers liewithinaspeiedrange,usually0to1,withallnumbers

inthe rangehaving the same probability ofbeing generated. Virtually every sheme

to generate random numbers with respet to a desired probability density relies on

onverting uniform random numbers.

The mostommonwaytogenerate uniformrandomnumbers iswithalinear

on-gruential, or modulo, generator, whih generates a series of integers, fI

0 ;I

1 ;I

2 ;:::g,

by the reurrenerelation

I

j+1 =aI

j

+(modm); (3.1)

wherem,a,and arepositiveintegersalledthe modulus,multiplier,and inrement.

They denethe linearongruentialgenerator. The rst integer,I

0

,is alledthe seed.

Using the same seed with a ertain generator willalways give the same sequene of

numbers.

Clearly, I

j

< m for all j. Therefore, the algorithm an generate at most m

dis-tintintegers. Thesequene ofintegersistransformed intouniformrandomnumbers

between 0and 1by lettingu

j =

I

j

m .

The sequene fI

j

g generated by equation 3.1 will eventually repeat itself with a

periodpthatislessthanorequaltom. Ifm,a,andare properlyhosen,theperiod

will be of maximal length. Several rules have been developed and implemented to

maximizep and givethe best results in statistialtests for randomness [53℄.

Poor hoies of a, , and m, an result in random number sequenes with very

short periods. Many linear ongruential generators implemented as library routines

3.1.2 The Transformation Method

MonteCarlosimulationsoftenrequire randomnumbers distributedwith respet toa

given probability density funtion, (x). The most eÆient way to generate suh a

sequene iswiththetransformationmethod,whihdiretlyonverts uniformrandom

numbers tothe desireddensity.

The umulative distribution funtion represents the probability that a point in

the given density isless than orequal toy:

P(y)= Z

y

1

dx(x): (3.2)

If (x) is normalized,P (y) willinrease monotonially from 0to 1.

To generate a random number, w, distributed with respet to (x) , a uniform

random number, u, is generated. Then w =P 1

(u), where P 1

is the inverse of P.

This methodrequires that the funtionP beknown and invertible,whihis the ase

forsomeverysimpledistributions,suhasexponentialorGaussiandistributions. For

more ompliatedfuntions, dierent algorithmsmust be used.

3.1.3 The Von Neumann Method

The Von Neumann, or rejetion, method is a less eÆient but more generally

appli-able way to generate points with respet to a probability density funtion that is

known and an bealulated. The umulative distributionfuntiondoesnot haveto

be known orinvertible.

In order to use this method, one rst nds a funtion, h(x), that is everywhere

greater than and preferably lose to the desired probability density funtion, (x),

and for whih the transformation method an be used. A random number, z, is

generatedwithrespet toh(x)andthe ratioA(z)= (z)

h( z)

isalulated. Beauseh(x)

is always greaterthan (x) , this ratio willbe between 0and 1.

z ifu<A(z) andrejetingz ifu>A(z). Theeetofthe rejetionstepistoweight

the density h(x) by ( x)

h(x)

so that (x) emerges. This method is very simple, but will

leadtoexessive rejetionand beveryineÆientifh(x)isnot loseto(x), inwhih

ase the aeptane probability A(z) will often be small. This loss of eÆieny is

partiularlyimportantfor high-dimensional spaes.

3.1.4 The Metropolis Algorithm

Quantum Monte Carlo alulations require random eletroni ongurations

dis-tributed with respet to the quantum mehanial probability density, the square of

the magnitude of the eletroni wavefuntion. This is an extremely ompliated

and tightly oupled 3N-dimensional funtion, where N is the number of eletrons.

Furthermore, it has appreiable magnitude only in a very small fration of the total

availableongurationvolume. Thetransformationandrejetionmethodsareunable

toeÆiently generate randompoints with respet tothis sort of probability density.

In order to distribute eletroni ongurations with respet to their quantum

mehanial probability density, the idea of generating statistially independent

on-gurations must be abandoned. Instead, a Markov hain is used, in whih eah new

ongurationisgeneratedwithrespettoaprobabilitydistributiondependingonthe

previous onguration. The sequene of ongurations forms a \random walk" that

isproportionaltothedesireddensity. Beause eahongurationdepends onthe one

before it, they will have some degree of serial orrelation,whih must be onsidered

when the variane of quantities derived from these ongurations is alulated.

A Markov hain is dened in terms of the transition probability T (x!x 0

) for

havingthepointx 0

afterthepointxinthehain. Thetransitionprobabilitiesdepend

onlyontheurrentstateofthesystemand areindependentoftimeandthehistoryof

the walk. The Metropolisalgorithmisaseriesof rulesfor generatingaMarkovhain

satisfy the followingrelationship:

T (x!x 0

)(x)=T(x 0

!x)(x 0

): (3.3)

Eq 3.3 is known as the detailed balane ondition. Using it, the probability for

aepting a proposed move fromx tox 0

is

A(x!x 0

)=min 1; T (x

0

!x)(x 0

)

T (x!x 0

)(x) !

: (3.4)

It should be noted that in equation 3.4, only the ratio (x

0

)

( x)

is alulated, rather

thanthevalues(x)and(x 0

)separately. Asaresult,theprobabilitydensityfuntion

does not have to be normalized.

In the simplest version of the Metropolis algorithm, the transition probabilities

are hosen so that T (x!x 0

) = T (x 0

!x). The aeptane probability an be

inreased by using importane sampling algorithms, whihmanipulate the transition

probabilities todiret the proposed moves into regionsof high density [55℄.

The Metropolis algorithm guarantees the Markov hain will equilibrate to a

sta-tionary distribution, whih will represent the desired probability density funtion.

This method allows virtually any probability density to be sampled, whih makes it

aninvaluabletoolforhighdimensionalsimulations. TheMetropolisalgorithmis

Chapter 4

Quantum Monte Carlo

Quantum Monte Carlo (QMC) is a relatively new lass of methods for onduting

highly aurate quantum mehanial simulations on atomi and moleular systems.

Variationaland diusionMonte Carlo,the mostommonly used eletroni struture

QMC variants, use stohasti methods to optimize wavefuntions and alulate

ex-petationvalues[56℄andanprovideenergiestowithinhemialauray[57,58,59℄.

Beause QMC trialfuntions donot have to be analytially integrable, there is

on-siderable freedomastotheirform. The inlusionofexpliit interpartileoordinates,

whih is impossible with traditional eletroni struture methods, allows QMC trial

funtionstohaveaveryompatformomparedtoSCFwavefuntionsofomparable

auray [60℄.

TheomputationalexpenseofQMCalulationssaleswiththesizeofthesystem

asO(N 3

)orbetter [61, 62,63, 64℄. Mean eld methodsapableof omparable

au-ray, suh as oupled luster, sale muh less favorably, as O(N 6

) to O(N!). Sine

QMCisastohastimethod,itlendsitselfnaturallytoparallelizationarossmultiple

proessors. Although QMC is not \perfetly parallel,"as has been laimed[65℄, the

parallel overhead funtion an be very small, and large numbers of proessors an

be used with high eÆieny. The use of large numbers of proessors allows QMC

alulations to nish in a reasonable amount of time, despite the slow onvergene

of Monte Carlo. The ombinationof favorable saling and parallelizability of QMC

Many parallelsienti appliationsrequirelarge memoryand fastinterproessor

ommuniation to run. Superomputers that satisfy these needs are very

expen-sive to onstrut and maintain. The most powerful superomputers in the world

today are owned by the United States Department of Energy, whih has

ommit-ted vast resoures to onstruting them in order to ondut simulations on nulear

weapons [42,43, 44℄.

QuantumMonteCarlo,however, anbeformulatedtorunwithverysmallmemory

and interproessor ommuniation requirements [49, 50℄. It is reasonable toenvision

aninexpensive QMC spei superomputer made of nodes with no hard drives and

inexpensiveonnetionhardware. Suhamahine wouldgiveresearherswho donot

haveaess to nationallabomputers the ability to ondut QMC alulations.

4.1 Variational Monte Carlo

VariationalquantumMonteCarlo (VMC)uses theMetropolisalgorithmtominimize

the expetation value of the energy of a trial wavefuntion with respet to its

ad-justable parameters. Beause the high dimensional integrals are done using Monte

Carlo methods, some of the restritions on the form of the wavefuntion that are

neessary when the integrals are evaluatedanalytially an be relaxed.

The expetationvalue for the energy of atrialeletroni wavefuntion, j

T i,is hEi= h T j ^ Hj T i h T j T i = R d ~ R T ~ R ^ H T ~ R R d ~ R T ~ R T ~ R ; (4.1) where ^

H is theHamiltonianoperatorforthe system and ~

R isa vetor ontaining the

3N spatial oordinates of the N eletrons of the moleule. The loal energy of an

eletroni ongurationis dened as

If

T

R is aneigenfuntion of H,the loalenergy willbe onstantwith respet to

~

R . Using the loalenergy, the expetation value an be rewritten:

hEi = R d ~ R T ~ R T ~ R E L ~ R R d ~ R T ~ R T ~ R (4.3) = R d ~ R j T ~ R j 2 E L ~ R R d ~ R j T ~ R j 2 (4.4) = Z d ~ R VMC ~ R E L ~ R ; (4.5) where

VMC

~

R

is the probability density for the onguration ~ R : VMC ~ R = j T ~ R j 2 R d ~ R j T ~ R j 2 : (4.6)

VMC employs the Metropolis algorithmto generate a series of eletroni

ong-urations, n ~ R i o

, distributed with respet to

VMC

~

R

. The expetation value of the

energy an then beevaluated as

hEi VMC = 1 M M X i=0 E L ~ R i O 1 p M ! : (4.7)

As thewavefuntionissampled, theexpetationvalueofthe energywillutuate

withinits statistialunertainty, whihmakes omparing dierent sets ofvariational

parameters diÆult. This eet an be mitigated by using orrelated sampling, in

whih expetation values for several sets of parameters are alulated using one set

ofongurations [66℄. Correlatedsamplingallowsthe dierenebetween theenergies

of two sets of parameters to be alulated with muh less variane than if the two

expetation values are ompared after being alulated separately.

Beause the loalenergy foraneigenfuntion of ^

H isonstant,the varianeofits

expetationvaluewillbezero. Asatrialwavefuntionisoptimizedanditapproahes

the exatgroundstate,itsloalenergywillvaryless stronglywith ~

optimizethe parametersof the wavefuntionrather than its expetationvalue. This

method works well for Monte Carlo optimization beause the exat minimum value

of the variane of the energy is known, while the minimum of the expetation value

for the energy is unknown. Some optimization methods minimizea ombination of

theenergyand itsvariane[67℄. Algorithmsthatsamplethederivativesofthe energy

with respet to the adjustable parameters in the wavefuntion an be used to speed

onvergene and ensure the true minimum isfound [68, 69℄.

4.2 VMC Trial Funtions

Asdisussedinsetion1.2.1,quantummehanialwavefuntionshavespae andspin

omponents. Sine the QMC Hamiltonian does not have spin terms, we integrate

over spin and onsider only the spatial omponent. The spatial trialfuntions used

in VMCtypiallyhave the form

T =

X

i

i AS

i !

J; (4.8)

where the

i

are CI expansion oeÆients, the AS

i

are Slater determinant

wave-funtions, and J is a symmetri funtion of the distanes between partiles alled

the Jastrowfuntion. This results inan overall antisymmetrifuntion with expliit

interpartile terms. The

i and

AS

i

an be obtained through standard eletroni

struture methodssuh asHartree-Fok, GVB, MCSCF, CI, orDFT.

There are many adjustable parameters in

T

. The

i

, the orbital oeÆients

and basis funtions in the AS

i

, and the parameters in the Jastrow funtion an all

be optimized through VMC. Even when orrelated sampling is used, optimizing a

wavefuntion with a large number of adjustable parameters is a hallenging task.

The form of the trial wavefuntion must be hosen with are, so that time is not

The two body Jastrowfuntion is writtenas

J =exp 2 4 X i;j u ij (r ij ) 3 5 ; (4.9)

wherethe sum isoverallpairs ofpartiles, r

ij

isthe distane between partilesi and

j, and u

ij (r

ij

)is a funtionthat desribesthe interation between partilesi and j.

The two body Jastrowfuntion makesit straightforward to onstrut trial

wave-funtionsthatsatisfythequantummehanialusponditionsforpairsofpartiles[9℄.

Satisfyingthese usp onditionsremovesthe singularitiesinthe loalenergy that

o-ur when partilesollide, whih lowers the variane of the energy.

IfGaussianorbitalsareusedintheSCFpart ofthewavefuntion,the usp

ondi-tion for partiles i and j approahing eah other leads to the following ondition for

the funtion u

ij : lim r ij !0 u ij (r ij ) r ij = ij q i q j

l+1

; (4.10)

where

ij

is the redued mass of the partiles, q

i and q

j

are their harges, and l is 1

for same spin eletrons and 0 otherwise.

Aformforthetwo-bodyorrelationfuntionommonlyusedformoleularsystems

is the Pade-Jastrow funtion:

u ij (r ij )= ij r ij + P N k=2 a ij;k r k ij 1+ P M l =1 b ij;l r l ij ; (4.11)

wherethea

ij;k andb

ij;l

are adjustableparameters. Theonstant

ij

isset tothe value

of the usp ondition for the partiles i and j. To ensure that the limit as r ! 1

remains nite, M and N are usually hosen so that M N. Many other forms for

orrelation funtions are in use, inludingsome with saled variablesand some with

three- and higher-body terms [70, 71℄.

Inadditiontoallowing

T

tohaveexpliitintereletronioordinates,QMCallows

freedom in the form of the orbitalsthat make up the Slater determinant part of the

Gaussian funtionshavezero derivativeattheir origin,so moleularorbitals

on-strutedfromGaussianbasisfuntionsareunabletosatisfytheeletron-nuleususp

ondition. Although these usps an be satised by two-body orrelation funtions,

replaing the Gaussianorbitalswith exponentialfuntions thatsatisfy the usp near

the nulei gives muhbetterresults inQMC alulations[72℄. When the orbitals are

modied in this way, the eletron-nuleus usp values in the two body orrelation

funtions are set to zero.

4.3 Diusion Monte Carlo

Diusion MonteCarlo (DMC) does not rely onthe variationalprinipleto alulate

expetation values, but itsonvergene depends on aurate trialfuntions.

DMC starts with the time dependent Shrodinger equation:

i t j ~

R ;t i= ^ Hj ~

R ;t

i: (4.12)

Withahangeof variablestoimaginarytime, =it,equation4.12 takesthe formof

a diusionequation:

j ~ R ; i= ^ Hj ~ R ; i: (4.13)

The formalsolution toequation 4.13 an be written:

j

~

R;

i=e ^ H j ~

R ;0

i: (4.14)

Atsome time

1

, thestate j

~

R ;

1

iisexpanded ineigenstatesof the Hamiltonian:

j ~ R ; 1 i= X i i j i i; (4.15) where ^

and i =h i j ~ R ; 1 i: (4.17)

The expansion in equation4.15 is substituted intoequation 4.14:

j ~ R ; 1 +d i= X i i e E i d j i i: (4.18)

As equation 4.18 is propagated with , ontributions to

~

R ;

with i>0 willdie

out exponentially, leaving

0

, the ground state.

Propagating equation 4.18 with Monte Carlo methods is ineÆient beause the

potential part of the Hamiltonian varies widely throughout onguration spae and

diverges whenhargedpartilesapproaheahother. EÆientDMCalulations use

importanesampling,inwhih

T

~

R

,atrialfuntionthatapproximatestheground

state, is used asa guide funtion. A mixed distributionis dened:

DMC ~ R = 0 ~ R T ~ R R d ~ R 0 ~ R T ~ R : (4.19)

The mixed expetationvalue for anoperator, ^

A, has the form

hAi DMC = h 0 j ^ Aj T i h 0 j T i : (4.20)

For operators that ommute with the Hamiltonian, the DMC expetation value

equals the expetationvalue of the true groundstate:

hAi DMC = h 0 j ^ Aj T i h 0 j T i = h 0 j ^ Aj 0 i h 0 j 0 i : (4.21)

The DMC expetation value for the energy an be rewritten in a mannersimilar

tothe VMC expetationvalue:

hEi DMC = h 0 j ^ Hj T i

h j i

= dR 0 R H T R R d ~ R 0 ~ R T ~ R (4.23) = R d ~ R 0 ~ R T ~ R ^ H T( ~ R ) T ( ~ R) R d ~ R 0 ~ R T ~ R (4.24) = Z d ~ R DMC ~ R E L ~ R : (4.25)

A series of eletroni ongurations is generated with respet to

DMC ~ R , whih

allows the expetation value to be evaluated. Generating eletroni ongurations

with respet to

DMC

willbedisussed inthe next setion.

Interpreting DMC ~ R

asaprobabilitydensity isonlypossibleifitisnonnegative

for all ~

R . Forbosons, this property is easilysatisedbeause the groundstate

wave-funtionhasone signeverywhere inongurationspae. Ifthetrialwavefuntion has

the same sign,

DMC

~

R

willbenonnegative for all ~

R . Ground state wavefuntions

for fermions, however, have positiveand negative regions separated by nodes. If the

nodes of T ~ R and 0 ~ R

are idential,the two funtionswillhave the same sign

in every nodalregion, and

DMC

~

R

willbe nonnegativefor all ~

R .

If the nodal strutures of

T ~ R and 0 ~ R are dierent, DMC ~ R will have

positive and negative regions. This is known as the fermion problem in DMC. In

VMC,themagnitudesquaredofthetrialfuntionissampled, sothereisnoanalagous

nodalproblem.

The nodalsurfaeof aneletroni wavefuntionisa(3N 1)-dimensional

hyper-surfae wherethe wavefuntionvanishes. Thespatial antisymmetryof the

wavefun-tiondenesasetof(3N 3)-dimensionalhyperpointsembeddedinthenodalsurfae.

Althoughthese pointsare known, nogeneraltehniquesexist foronstruting atrial

wavefuntion with the same nodalstruture asthe true ground state.

The simplest and most widely used solution to this problem is the xed-node

approximation, in whih the nodes of the true ground state are assumed to be the

sameasthenodesofthetrialwavefuntion. Whenthisapproximationisused,

0

~

itvanishatthenodesof

T

R . Thexed-nodeapproximationisenforedinaDMC

alulation by rejeting any proposed move that rosses a node and auses

T

~

R

tohange sign. The resultingenergy liesabovethe exat energy andisvariationalin

the nodalstruture of the trialfuntion [73, 74℄.

Other solutions tothe nodalproblem that do not rely on the xed node

approx-imation have been developed. For example, the transient estimator method

propa-gatestwobosonlikewalkerensembles, representing thepositiveand negativeparts of

0 ~ R

. This method isnot stable with respet to , the imaginarytime variablein

whih the ensembles are propagated, beause both parts of the simulation onverge

tothenodelessbosongroundstate. Expetationvaluesan onlybealulatedduring

the intermediate regimebeforethis ours, whihlimitsthe statistialauray that

an be attained.

For most small moleules, the nodes of trialwavefuntions obtained by standard

SCF methods are of good enough quality for xed node DMC alulations to yield

resultswithin hemialauray[57,59℄. In someases, suhas theberylliumatom,

multi-onguration wavefuntionsare needed to obtainnodes of suÆient quality.

4.4 Generating Congurations in DMC

Eletroni ongurations are generated with respet to

DMC

using the distribution

f ~ R ; = ~ R ; T ~ R ; (4.26) where j ~ R ;

i is a solution to the time-dependent Shrodinger equation,

equa-tion 4.12.

The distribution f

~

R ;

is a solutiontoa Fokker-Plank equation:

where ^ L= 1 2 r 2

+rV ~ R +E L ~ R : (4.28) V ~ R

is the loal veloity of the trialfuntion at ~ R : V ~ R = r T ~ R T ~ R ; (4.29) and E L ~ R

isits loalenergy:

E L ~ R = ^ H T ~ R T ~ R : (4.30)

In the ase

T

~

R

= 1, equations 4.27 and 4.28 redue to equation 4.13, the

Shrodinger equation inimaginary time.

The operator ^

L denes aneigenvalue equation:

^ LjK i ~ R i= i jK i ~ R i: (4.31) The K i ~ R = i ~ R T ~ R and the i = E i

, where the

i ~ R and E i are the

eigenvetors and eigenvalues of the Hamiltonian.

Equations 4.27 and 4.28 desribe a diusion proess in a potential. As with

equation 4.13, the formalsolutionto equation 4.27 an be written:

f ~ R ; =e ( ^ L E T ) f ~

R ;0

: (4.32)

This solution an be expanded in the eigenvetors of ^ L: f ~ R ; = X i i e ( i E T ) K i ~ R (4.33) = X i e

( Ei E

where i =hK i ~ R jf ~

R ;0

i=h

i ~ R j f ~

R ;0 T ~ R i: (4.35)

It is easyto see that if E

T =E

0

, ontributions to f ~ R ; from the i ~ R with

i>0willdieoexponentiallyas inreases,leavingthedesireddensity,

DMC ~ R = 0 ~ R T ~ R .

In order to propagate Eq. 4.32 with and obtain

DMC

~

R

, it is rewritten in

integral form:

f

~

Y; +d =e dE T ( +d) Z d ~ R G ~ Y; ~

R ;d f ~ R ; ; (4.36) whereG ~ Y; ~

R ;d

istheGreen'sfuntionorrespondingtotheoperator ^

L.

Unfortu-nately,thisGreen'sfuntion,liketheGreen'sfuntionsformostompliatedphysial

proesses, annotbe written for arbitraryd.

Thethreetermsofequation4.28desribediusion,drift,andbranhingproesses.

Green's funtions an be written for eah proess individually, and an approximate

Green's funtionan bewritten astheir produt:

G ~ Y; ~ R; 1 (2) 3N =2 Æ h ~ Z ~ R V ~ R d i Z d ~ Z exp 2 6 4 ~ Y ~ Z 2 2d 3 7 5 (4.37) e 1 2 [ E L( ~ Y ) +E L( ~ R )℄ d +O d 2 :

Thefatorization ofthe Green'sfuntionneglets the fatthatthe termsof ^

Ldonot

ommute,soequation4.37 isexat onlyinthe limitd !0. Equation4.34, however,

isonlyexat inthe limit !1. Any hoie oftime stepis atradeobetween these

two onsiderations. Inpratie, runs withseveral valuesof d must bedone, and the

results are extrapolated tod =0.

eahonsisting of aneletroni ongurationand a statistialweight:

f

~

R;

= X

n w

n; Æ

~

R ~

R

n;

: (4.38)

EahiterationinaDMCalulationonsistsoffourstages: drift,diusion,

weight-ing, and branhing. In the drift step, the eletrons are moved aording tothe time

step and the loal veloity. In the diusion step, the eletrons are moved to new

positions with transitionprobabilities given by the kineti part of the Green's

fun-tion. The weighting and branhing step takes into aount the potential part of the

Greens's funtion.

After awalkerismoved from onguration ~

R to ~

Y,its weight isalulated based

ontheloalenergy at ~

R and ~

Y. Inthe branhing step,walkers withhigh weightgive

birth tonew walkers, while lowweightwalkers are deleted.

The trial energy, E

T

, serves as a normalization fator and is adjusted after eah

step based onthe sum of the weights of the walkers in order to keep the population

stable. The average value of E

T

after many steps will onverge to the ground-state

energy.

Several DMC algorithms, eah with slightly dierent shemes for fatoring the

Green's funtion, proposing ongurations, alulating the weights, and branhing

the walkers have been published [66, 75℄. The DMC alulations presented later in

Chapter 5

An Optimized Initialization

Algorithm to Ensure Auray in

Abstrat

Quantum Monte Carlo (QMC) alulations require the generation of random

ele-troniongurations withrespet toadesiredprobability density,usually the square

of the magnitude of the wavefuntion. In most ases, the Metropolis algorithm is

used to generate a sequene of ongurations in a Markov hain. This method has

an inherent equilibrationphase, during whih the ongurations are not

representa-tive of the desired density and must be disarded. If statistis are gathered before

the walkers have equilibrated, ontamination by nonequilibrated ongurations an

greatly redue the auray of the results. Beause separate Markov hains must be

equilibrated for the walkers on eah proessor, the use of a long equilibration phase

has a profoundly detrimental eet on the eÆieny of large parallel alulations.

The stratied atomi walker initialization (STRAW) shortens the equilibration

phaseofQMCalulationsby generatingstatistiallyindependenteletroni

ongu-rationsinregionsofhighprobabilitydensity. Thisensurestheaurayofalulations

by avoiding ontamination by nonequilibratedongurations. Shortening the length

of the equilibrationphasealso resultsinsigniant improvementsinthe eÆieny of

parallel alulations, whih redues the total omputationalrun time. For example,

using STRAW rather than a standard initializationmethod in 512 proessor

alu-lations redues the amount of time needed to alulate the energy expetation value

5.1 Introdution

QuantumMonteCarlomethodsforsimulatingtheeletronistrutureofmoleules[77,

78℄ an inprinipleprovideenergies towithinhemial auray(2 kal/mol)[57,

58, 59℄. The omputational expense of QMC sales with system size as O(N 3

) or

better [61, 62, 63, 64℄, albeit with a large prefator. This is muh more favorable

thanothereletronistruturemethodsapableofomparableauray, suhas

ou-pled luster, whih tend to sale very poorly with the size of the system, generally

O(N 6

to N!) [79℄. Moreover, the stohasti nature of QMC makes it relatively easy

toparallelizeoveralargenumberofproessors,whihan allowalulationstonish

in areasonable amount of time despite the slowonvergene ofMonteCarlo.

Assuperomputingresouresimproveandbeomemoreaessibletoresearhers[42,

80℄, QMC willbeomeapowerfultoolfor ondutingaurate simulationson

hemi-allyinterestingsystems. Reenteortshavefousedmakingthesealulationsmore

straightforward and eÆient onheterogeneous and homogeneous omputers. To this

end, a nite all-eletron QMC program, QMBeaver, has been written and used to

develop and demonstrateseveral new algorithms[49, 50, 81℄.

Before statistis gathering begins in a QMC alulation, the walkers must be

allowed to equilibrate so that their ongurations are proportional to the desired

density. It is impossible to alulate aurate expetation values if nonequilibrated

ongurationsontaminatethe statistis. Inordertoensuretheirstatistial

indepen-dene, the walkers must equilibrate separately. This makes the equilibration phase

a serial step of the alulation and a major limiting fator in the eÆieny of

par-allel alulations. These onsiderations make it imperative that the equilibration

proess be fast and reliable. For example, we show that for the energeti material

RDX,approximately30,000iterationsareneessary forequilibrationwhentheinitial

ongurations are generatedby a standard method.

We present here a simple method for hoosing initial eletroni ongurations