The RBC/UKQCD light quark physics program:
Algorithms, methods and performance
Taku Izubuchi/Chulwoo Jung Brookhaven National Laboratory for RBC/UKQCD collaborations
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
Introduction
RBC/UKQCD collaborations have been studying Domain Wall Fermion(DWF) configurations, which we believe achieves optimal balance between preservation of chiral symmetry and practicality.
Quenched: 2002: PRD65:014504, PRD66:014504
2003: PRD68:114506
2004: PRD69:074504, PRD69:074502 2006: PRD73:094507
2f: 2005: PRD72:114505
(2+1)f: 2007: PRD76:014504, PRD75:114501,arXiv:0507.2340
2008: PRD78:114509, PRD77:014509, PRD78:054510, PRL 100:032001
Currently (2+1)f DWF configurations in 2 lattice spacings, 2 volumes are available:
L/a msa mla ms/ml mPSa τ(MD) Accept.
β = 2.13,a∼1.73Gev−1∼0.114fm,amres = 0.0031
163×32×16 0.04
0.01 3.3 0.247 4000 57%
0.02 1.86 0.325 4000 56%
0.03 1.3 0.387 7500 82%
243×32×16 0.04
0.005 5.4 0.192 8980 73%
0.01 3.3 0.242 8540 70%
0.02 1.86 0.323 2800 71%
0.03 1.3 0.388 2800 72%
β = 2.25,a∼2.34Gev−1∼0.084fm,amres = 0.00066
323×64×32 0.03
0.004 7.9 0.128 3428×2 72%
0.006 5.5 0.153 3825×2 76%
0.008 4.2 0.172 2965×2 73%
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
Action & Algorithm
Gauge action: Iwasaki action
SG[U] =−
β
3ReTr
2
4(1−8c1) X
x;µ<ν
Uµ(x)Uν(x+ ˆµ)Uµ†(x+ ˆν)U †
ν(x) (c1=−0.331)
+c1
X
x;µ6=ν
Uµ(x)Uµ(x+ ˆµ)Uν(x+ 2ˆµ)U†µ(x+ ˆµ+ ˆν)U † µ(x+ ˆν)U
† ν(x)
3
5
Fermion action : Domain Wall Fermion with 5D preconditioning
Dxdwf,s;x0,s0(M5,mf) =δs,s0Dxk,x0(M5) +δx,x0Ds⊥,s0(mf)
Dxk,x0(M5) =
1 2 4 X µ=1 h
(1−γµ)Ux,µδx+ ˆµ,x0+ (1 +γµ)Ux†0,µδx−µ,ˆx0 i
+ (M5−4)δx,x0
Ds⊥,s0(mf) = 1 2
ˆ
(1−γ5)δs+1,s0+ (1 +γ5)δs−1,s0−2δs,s0˜ −mf
2
ˆ
(1−γ5)δs,Ls−1δ0,s0+ (1 +γ5)δs,0δLs−1,s0
˜ .
D(mf) =DDWF† (M5,mf)DDWF(M5,mf)
Z
dUdψe−(SG[U]−SF[U,ψ]+SPV[U,ψ])=
Z
dUe−SG[U]det
"
D(ms)1/2D(m f)
D(1)3/2
#
det
"
D(ms)1/2D(m f)
D(1)3/2
#
= det
»D
(ms)
D(1)
–3/2 det
»D
(mf)
D(ms)
–
∼ »
detR1/2
»D(ms) D(1)
––3
det
»D(m
f)
D(ms)
–
Omelyan integrator with λ= 0.22 used.
∆t(gauge) : ∆t(RationalQuotient) : ∆t(Quotient) = 1 : 6 : 6. CG: Quotient inversion MInv: Multimass inversion for Rational Quotient GF: Gauge force RF: Rational force HF: Quotient force. A typical sequence of routines called for 16-step trajectory is
6MInv+ 1CG+
[[12GF + [3MInv+ 2RF]×3]×2 + 12GF + 1CG +HF]×32
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
0 1000 2000 3000 4000 5000 -20 -10 0 10 20 Qtop
-20 -10 0 10 20 0 0.05 0.1
0 1000 2000 3000 4000 5000 -30 -20 -10 0 10 20 Qtop
-20 -10 0 10 20 0 0.05 0.1
0 1000 2000 3000 4000 5000 -30 -20 -10 0 10 20 30 Qtop RHMC 0 RHMC I RHMC II
-20 -10 0 10 20 0 0.05 0.1
0 1000 2000 3000 4000 5000
Molecular dynamics time
-30 -20 -10 0 10 20 30 Qtop
-20 -10 0 10 20
Qtop histogram (normalized)
0 0.05 0.1
Performance
Rednumbers show the routines which are duplicated on different 5th dimension slices. 243×64×16(ms= 0.04), Local volume = 63×2×8, 4096-node QCDOC
ml =0.03 0.02 0.01 0.005
Routines time(s) time(s) MFlops/s time(s) time(s) MFlops/s
MInv 1225 1213 221 1195 1367 225
CG 173 223 273 370 634 258
GF 60 60 257 62 73 250
RF 218 218 36 232 274 34
HF 10 10 4.5 10 12 4.5
Total time(seconds) 1941 1983 2124 2635
Total flops(×1012) 1366 1411 1557 2006
323×64×16(m
s= 0.03) QCDOC
ml 0.006 0.004
Local volume 83×2×8 43×8×16
Routines time(sec) MFlops/s time(sec) MFlops/s
MInv 5062 172 4263 205
CG 1964 213 2038 268
GF 214 256 104 263
RF 1130 25 939 28
HF 39 4.3 10 16
Total time(seconds) 9035 7733
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
323×64×16(m
s= 0.03) BG/P
2048×4 core BG/P 4096×4 core BG/P
ml 0.008 0.006 0.004
Local volume 44×16 44×8
Routines time(sec) MFlops/s time(sec) MFlops/s
MInv 851 950 443 563 374
CG 320 433 464 342 391
GF 34 39 350 45 302
RF 142 162 83 306 23
HF 2 3 30 12 4
Total time(seconds) 1406 1646 1355
Total flops (×1012) 4450 5260 5868
483×64×16 DWF 323×64×32 AuxDet DWF
ms= 0.03,2048×2 core BG/L ms= 0.045,4096×4 core BG/P
ml 0.002 0.0042
Local volume 6×122×2×16 44×16
Routines time(sec) MFlops/s time(sec) MFlops/s
MInv 44254 261 3460 479
CG 31280 286 1974 518
CG(AuxDet) 223 375
GF 2179 231 68 346
RF 6256 58 630 68
HF 39 37 28 6
Total time(s) 84851 6760
Observations
Fermion inversions consume ≥80% of time/total flops.
While total flops grows by a factor of∼2.9 between 243 and
323 lattices, 483 simulation is more expensive by a factor of >10 compared to 323 so far. It appears the increasing volume necessitates more tuning of running parameters such as stopping conditions. Parameter tuning is ongoing. Cost reduction up to a factor of 2 is expected.
5D preconditioning scheme is employed instead of 4D. While 4D preconditioning makes it possible to use more general (Moebius) formalism, 5D preconditioning allows an efficient implementation of the Dirac operator even when the 5th dimension is spread on more than 1 node, which has been very useful in running DWF simulations on massively parallel machines such as IBM Blue Gene machines while keeping local volume as symmetric as possible.
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
Example: from Aux. Det. DWF 323×64×32
DWF CG, local volume 44×16 : 527MFlops/core
DWF CG, local volume 8×4×22×32 : 437MFlops/core
A mixed precision scheme where all the projected spinors are kept in single precision is used for MD step. This has had very little effect on acceptance while gives 10-15% performance boost. It may be explained by the fact that the error introduced by single precision Dirac operator does not correlated with lowest eigenvectors of the dirac operators.
Reweighting of dynamical strange quark
Motivation: Lattice spacing is a nontrivial function ofβ and one does not know it until it is measured on thermalized
configurations. While typically multiple ensemble of light quark masses are generated to do extrapolations to the chiral limit, it is in principle possible to simulate at the physical strange quark if one guesses the lattice spacing correctly.
In practice, ensembles with different strange quark masses are needed to interpolate (simple linear interpolation or SU(3) ChPT). It would save a lot of computing resources if this can be avoided. Reweighting of light quark to approach the chiral limit have been tried by various groups. (Hasenfratz, et. al., PRD78 014515(2008), Luscher et. al., arXiv:0810.0946) Here we try to apply the same technique to strange quark of DWF (2+1)f ensembles.
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
Reweighting: Basics
w = det D
†
2D2
D1†D1
!1/2
= det(Ω)1/2,Ω =D2−1D1D1†(D
†
2)
−1
D1=D(ml,ms),D2 = (ml,m0s)
w =
R
e−ξ†Ω1/2ξ R
e−ξ†ξ = D
e−ξ†(Ω1/2−1)ξE
Now observables for reweighted ensemble is calculated by
hOi(m0s) = ΣiO[Ui]wi Σiwi
Noise reduction
We could think of different ways of evaluatingwi:
wi =
q
e−ξ†(Ω[U
i]−1)ξ (2)
or
wi =
D e−ξ†(
√
Ω[Ui]−1)ξE (3)
(2) can be evaluated by very slight modification of quotient 2 flavor part of DWF evolution and needs only 1 CG, while (3) uses Rational quotient part and it needs 2 Multimass inversion.
However, (2) is a biased estimator which converges only when evaluated multiple times while (3) is unbiased.
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
Also, we can split the determinant to multiple terms. We are splitting by using intermediate masses:
wi = Πkj=1=1···N···n
1
ne
−ξ†jk(√Ωj[Ui]−1)ξjk
(4) Ωj =D(mj)−1D(mj−1)D(mj−1)†(D(mj)†)−1
m0 =ms,mN =ms0
While this requires more inversions per measurement, each term have smaller condition numbers, which helps in reducing overall noise. Also, this gives the reweighting factors for intermediate masses automatically.
Reweighting parameters:
Volume ms m0s N n
323×64 0.03 0.025 10 2
2700 2800 2900 3000 3100 3200 3300 trajectory
353 354 355 356 357 358 359 360 361 362 363 364 365
Heff
Rational Quotient, 1 step x 40 Quotient, 10 steps x 4 Rational Quotient, 10 steps x 2
Heff = -1/2 log ( Det(D(0.025)/D(0.03)))
Reweighting factors for different methods of evaluation. Rational Quotient : Eq.(3) Quotient: Eq. (2)
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
0 1000 2000 3000 4000
0 1 2 3 4 5 6
m' s = 0.027 323 x 64 m
s = 0.03 ml = 0.004
0 1000 2000 3000 4000
0 5 10 15
m's = 0.025
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.7
0.75 0.8
Box 0.03
Box 0.03, Reweighted to 0.025 Box 0.025
Box 0.025, Reweighted to 0.025 Omega mass 323 x 64 ms = 0.03
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
0.024 0.026 0.028 0.03 0.032
0.12 0.125 0.13 0.135
Pseudoscalar 323 x 64 m
s = 0.03 ml = 0.004 reweighted masses
0.024 0.026 0.028 0.03 0.032
0.46 0.47 0.48 0.49 0.5 0.51 0.52
Nucleon
10 0.7
0.75 0.8
No Reweigthing 0.025
Omega mass 323 x 64 ms = 0.03 ml = 0.004
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
0 5000 10000 15000 20000
0 10 20 30 40 50 60
No Reweigthing 0.025
Omega propagator (ms=0.025,t=14) 323 x 64 ms = 0.03 ml = 0.004
3e+050 4e+05 5e+05 6e+05 7e+05 8e+05 9e+05 1e+06 20
40 60 80
No Reweigthing 0.025
Pseudoscalar propagator (ms=0.004,t=26) 323 x 64 ms = 0.03 ml = 0.004
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
DWF with Auxiliary Determinant
Renfrew et. al., arXiv:0902.2587
Motivation: Dislocations which induce chiral symmetry breaking is the biggest hurdle for Ginsparg-Wilson fermions at larger lattice spacing. Suppressing dislocations is crucial for DWF studies of quantities which requires large lattice spacing and/or large volume. Examples: QCD Thermodynamics
Nucleon matrix elements
Weak matrix elements (K →ππ)
Various approaches :
Change Gauge action to suppress dislocations (DBW2, ...) Use link smearing to suppress the coupling between dislocations and fermions (HYP, stout,...)
Use additional fermion action to suppress dislocations. γ5Dw(−M5) has been suggested by Vranas.
Problems:
Too strong suppression near−M5 → blocks topology tunneling
Enhancement for larger eigenvalues
Use a ratio of Dirac Operator with imaginary Wilson masses to
control the suppression of eigenvalues near−M5 while preserve
larger eigenvalues.
W(M5, f, b) =
det[DW(−M5+ıbγ5)†DW(−M5+ıbγ5)]
det[DW(−M5+ıfγ5)†DW(−M5+ıfγ5)]
= det[DW(−M5)
†D
W(−M5)] +2f det[DW(−M5)†DW(−M5)] +2b
=Y
i
λ2
i +2f
λ2i +2b
∼1 forλi b, f, ∼2f/2bforλi f.
Compared toQ
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
0 0.05 0.1 0.15 0.2
0 50 100 150 200
ef=0.040, εb=0.50 ef=0.005, εb=0.50 ef=0.010, εb=0.100
Suppression factorFi = λ
2 i+2b
λ2 i+2f
versus |λ|for f/b of 0.001/0.10,
Have done extensive parameter searching on small volumes. A factor of 5-7 decrease inmres is observed after the scales are
matched by locating transition temperature.
163×8×32 164×32
no weighting factor f/b= 0.01/0.10 f/b= 0.005/0.50 β= 1.75 b= 0.50
β mres β mres β mres f mres
1.875 0.0101(5) 0.040 0.0025(1)
1.900 0.0072(3) 0.020 0.0018(1)
1.95 0.0252(1) 1.925 0.0054(4) 0.005 0.0014(10)
2.00 0.0102(1) 1.950 0.0035(3) 1.750 0.0015(1)
2.05 0.0046(2) 1.975 0.0026(4) 1.800 0.0009(3)
2.08 0.0022(2) 2.000 0.0020(2) 1.850 0.0003(4)
2.11 0.0011(1) 2.14 0.0007(1)
Possible run plan
Aux. Det. β= 1.75,a∼1.4Gev, f/b= 0.02/0.5,mres∼0.0019
L/a msa mla L(fm) mPS(Mev) τ(MD) Accept.
323×64×32 0.045 0.0042 ∼4.5 ∼250 ∼200 ∼85%
Introduction Action & Algorithm Performance Reweighting Aux. Det. Conclusions
Conclusions
RBC/UKQCD/LHP collaborations have generated
323×64×16 DWF configurations which allows for more
accurate continuum extrapolations. We greatly benefited from newly available IBM BG/P at Argonne. Analysis is ongoing. Reweighting for the strange quark appear to be practical even
for 323×64 DWF configurations. We are working to do full
analysis to with reweighted data.
What’s the next step? → we are exploring auxiliary
determinant, which improves chiral symmetry at larger lattice spacings. Algorithm tuning is ongoing.