CCP4 Workshop Chicago 2011
Phasing & Substructure Solution
Tim GrüneDept. of Structural Chemistry, University of Göttingen June 2011
http://shelx.uni-ac.gwdg.de
Data Integration (Anomalous)
Differences SubstructureSolution Phasing & DensityModification RefinementBuilding & mosflm, xds, HKL2000,. . . xprep, shelxc, solve,. . . shelxd, SnB, HySS,. . .
shelxe, resolve, pi-rate,. . .
arp/warp, coot, ref-mac5,phenix, YOU!
Molecular Replacement
An electron density map is required to create a crystallographic model. The electron density is calculated from the complex structure factor F(hkl):
ρ(x, y, z) = 1 Vcell
X
h,k,l
|F(h, k, l)|eiφ(h,k,l)e−2πi(hx+ky+lz) (1)
Experiment measures: I(hkl) i.e. real numbers
Model Building requires: F(hkl) i.e. complex numbers:
Motivation: The Phase Problem
The phase problem is one of the critical steps in macromolecular crystallography:A diffraction experiment only delivers the amplitude |F(hkl)|, but not the phase φ(hkl) of the structure factor. Without phases, a map cannot be calculated and no model can be built.
Solutions to the Phase Problem
The main methods to overcome the phase problem in macromolecular crystallography (i.e. to obtain initial
phases):
• (multi/ single) wavelength anomalous dispersion (MAD, SAD)
• (multiple/ single) isomorphous replacement (MIR, SIR)
• molecular replacement (MR)
and combinations thereof (SIRAS, MR-SAD, . . . ).
The first two methods are so-called experimental phasing methods. They require the solution of a substructure.
~a
~b ~c
The Substructure of a (crystal-) structure are the coordi-nates of a subset of atoms within the same unit cell.
~a
~b ~c
The Substructure of a (crystal-) structure are the coordi-nates of a subset of atoms within the same unit cell.
It can be any part of the actual structure. In the usual sense, substructure refers to the marker atoms that are used for phasing.
A real crystal with the properties of the substructure (atom distribution vs. unit cell dimensions) cannot exist: the atoms are too far apart for a stable crystal.
Small Molecule Crystallography
The substructure consists of a small number of atoms in a large unit cell.In small molecule crystallography, the phase problem is usually easily solved:
A structure with not too many atoms (< 2000 non-hydrogen atoms) can be solved by means of ab initio methods
The Terms “Ab initio” and “Direct” Methods
ab initio Methods: phase determination directly from amplitudes, without prior knowledge of any atomic posi-tions. Includes direct methods and the Patterson method. The most widely used of all ab initio methods are:
Direct Methods: phase determination using probabilistic phase relations — usually the tangent formula (Nobel prize for H. A. Hauptman and J. Karle in Chemistry, 1985).
The Substructure
Central to phasing based on both anomalous dispersion and isomorphous replacement is the determination of the substructure.
The steps involve
1. Collect data set(s)
2. Create an artificial substructure data set
3. Determine the substructure coordinates with direct methods 4. Calculate phases (estimates) for the actual data
What about Sheldrick’s Rule?
The Sheldrick’s Rule says that direct methods only work when the data resolution is 1.2 Å or better. Macromolecules seldomly diffract to this resolution.
Sheldrick’s Rule
Sheldrick’s Rule refers the real structures and the 1.2 Å-limit is about the distance where single atoms can be resolved in the data.
The substructure is an artificial crystal and atom distances
within the substructure are well above the usual diffraction limits of macromolecules, even for 4 Å data.
Solving a Macromolecular Phase Problem with a Substructure
The substructure represents only a very small fraction of all atoms in the unit cell (one substructure atom per a few thousand atoms). Yet, knowing the coordinates of the substructure allows to solve the phase problem for the full structure.
The Independent Atom Model (IAM)
“Ordinary” crystallography (other than ultra-high resolution charge density studies) is based on the “Independent Atom Model”: The total structure factor F(hkl) of each reflection (hkl) is the sum the independent contributions
from each atom in the unit cell.
(x, y, z)
x y
(hkl) = (430)
φ = 2π(hx + ky + lz)
= rel. distance to Miller-plane
Re(F) Im(F)
The Independent Atom Model (IAM)
x y
(hkl) = (430)
The (vectorial) sum from each atom contribution yields the total scattering factor F(hkl). The experiment delivers only its modulus |F(hkl)| =
q
Isomorphous Replacement (1/2)
The Harker Construction represents one way to determine the phase of F(hkl) from an isomorphous replace-ment experireplace-ment. It requires two data sets:
q
Inat(hkl) qIder(hkl)
Native: lacking one or several atoms (the sub-structure)
Derivative: Same unit cell, same atoms plus substructure atoms
Isomorphous Replacement (2/2)
The substructure coordinates can derived from the difference data set (qIder(hkl) − qInat(hkl)). The substructure phases φ(hkl) can be calculated from the substructure coordinates.
The phases of the native (or derivative) data set can be derived from
• measured native intensities Inat(hkl)
• measured derivative intensities Ider(hkl)
Harker Construction (1/4)
Im(F(hkl))
Re(F(hkl))
Harker Construction (2/4)
q
Inat(hkl)
Draw the structure factor F(hkl) of the substructure.
Fder(hkl) points somewhere on the circle around the tip
of F(hkl) with radius qInat(hkl)
because
Harker Construction (3/4)
Draw the structure factor F(hkl) of the substructure.
Fder(hkl) points somewhere on the circle around the tip
of F(hkl) with radius qInat(hkl)
At the same time, Fder(hkl) also lies somewhere on the circle around the origin with radius qIder(hkl), because
Harker Construction (4/4)
The two intersections of the two circles are the two possible structure factors Fder(hkl).
How to select the correct one:
MIR a second Harker construction with |Fder-2(hkl)| has exactly one of the two possibilities in common with the first Harker construction.
SIR the mean phase of the two possibilities serve as starting point for density modification, but if the substructure coordinates are centrosymmetric the
two-fold ambiguity cannot always be resolved (this problem is specific to SIR).
Substructure Contribution to Data
The electron density ρ(x, y, z) can be calculated from the (complex) structure factors F(hkl). Inversely the structure factor amplitudes can be calculated once the atom content of the unit cell is known:
F(hkl) = atoms j X in unit cell fj(θhkl)e−Bj sin2θhkl λ2 e2πi(hxj+kyj+lzj) (2)
• fj atomic scattering factor: depends on atom type and scattering angle θ. Values tabulated in [3], Tab. 6.1.1.1 • Bj atom displacement parameter, ADP: reflects vibrational motion of atoms
• 2π(hxj + kyj + lzj) phase shift: relative distance from origin • (xj, yj, zj): fractional coordinates of jth atom
Extraction of Substructure Contribution
Experimental determination (i.e. other than by molecular replacement) of the contribution of the substructure to the measured intensities is achieved by
• Anomalous Dispersion: Deviation from Friedel’s Law and comparison of
|F(hkl)| with |F(¯h¯k¯l)|
• Isomorphous replacement: Changing the intensities without changing the unit cell and com-parison of
|Fnat| with |Fder|
The alteration is usually achieved by the introduction of some heavy atom type into the crystal, either by soaking or by co-crystallisation.
Anomalous Scattering
The atomic scattering factors fj(θ) describe the reaction of an atom’s electrons to X-rays.
They are different for each atom type (C, N, P,. . . ), but normally they are real numbers and do not vary signifi-cantly when changing the wavelength λ.
If, for one atom type, the wavelength λ matches the transition energy of one of the electron shells, anomalous scattering occurs.
This behaviour can be described by splitting fj(θ) into three parts:
fjanom = fj(θ) + fj0(λ) + ifj00(λ)
SAD and MAD the following “grouping” has turned out to be useful, with its phase diagram on the right (ADP
e−8π 2U
j
sin2θhkl
λ2 omitted for clarity):
F(hkl) = non-X substructure fµe2πihrµ | {z } FP + substructure X normal fνe2πihrν | {z } FA + substructure X anomalous fν0e2πihrν | {z } F0 Im(F) FP FA F0 iF00 F
Since the equation for (Eq. 2) is a “simple” sum, one can group it into sub-sums. In the case of SAD and MAD the following “grouping” has turned out to be useful, with its phase diagram on the right (ADP
e−8π 2U
j
sin2θhkl
λ2 omitted for clarity):
F(hkl) = non-X substructure fµe2πihrµ | {z } FP + substructure X normal fνe2πihrν | {z } FA + substructure X anomalous fν0e2πihrν | {z } F0 +i substructure X f00e2πihrν Im(F) FP FA F0 iF00 F
Normal presentation, because f0 usu-ally negative
Now compare F(hkl) with F(¯h¯k¯l) in the presence of anomalous scattering: F(¯h¯k¯l) = non-X substructure fµe2πi(−h)rµ | {z } FP− + substructure X normal fνe2πi(−h)rν | {z } FA− + substructure X anomalous fν0e2πi(−h)rν | {z } F0− substructure 00 2πi(−h)r Im(F) Re(F) FP+ FA+ F0+
Friedel’s Law is valid for the fµ, fν, and fν0 parts:
Now compare F(hkl) with F(¯h¯k¯l) in the presence of anomalous scattering: F(¯h¯k¯l) = non-X substructure fµe2πi(−h)rµ | {z } FP− + substructure X normal fνe2πi(−h)rν | {z } FA− + substructure X anomalous fν0e2πi(−h)rν | {z } F0− +i substructure X anomalous fν00e2πi(−h)rν Im(F) Re(F) FP+ FA+ F0+ iF00+ F+
The complex contribution of ifν00 violates Friedel’s Law:
FP−
F−
iF00−
F−
Karle (1980) and Hendrickson, Smith, Sheriff (1985) published the following formula∗: |F+|2 = |FT|2 + a|FA|2 + b|FA||FT| + c|FA||FT|sinα |F−|2 = |FT|2 + a|FA|2 + b|FA||FT| − c|FA||FT|sinα Im(F) Re(F) α FT+ FA+ FP+ F+
with FT = FP + FA the non-anomalous contribution of structure and substructure.
Combining Theory and Experiment
The diffraction experiment measures the Bijvoet pairs |F+| and |F−| for many reflections. In order to “simulate” a small-molecule experiment for the substructure, we must know |FA|.
With the approximation 12(|F+| + |F−|) ≈ |FT| the difference between the two equations above yields
Status Quo – A Summary
• Our experiment measures |F+| = q
I(hkl) and |F−| = q
I(¯h¯k¯l).
• We are looking for (an estimate) of |FA| for all measured reflections.
– These |FA| mimic a small-molecule data set from the substructure alone.
– The small-molecule data set can be solved with direct methods, even at moderate resolution.
• With the help of Karle and Hendrickson, Smith, Sheriff, we already derived
|F+| − |F−| ≈ c|FA|sinα
The Factor
c
Before we know |FA|, we must “get rid of” c and sin α.
c = 2f00(λ)
f(θ(hkl)) can be calculated for every reflection provided f
00(λ).
During a MAD experiment, f00(λ) is usually measured by a fluorescence scan.
To avoid the dependency on f00(λ), the program shelxd [1] uses normalised structure factor amplitudes
The Angle
α
In the case of MAD, multi-wavelength anomalous dispersion, we can calculate sinα because there is one equation for each wavelength.
In the case of SAD, the program shelxd approximates |FA|sin(α) ≈ |FA|. Why is this justified?
Bijvoet pairs with a strong anomalous difference (
|F
+| − |F−|
) have greater impact in direct methods. The
difference is large, however, when α is close to 90◦ or 270◦, i.e. when sin(α) ≈ ±1. This coarse approximation
The Angle
α
In the case of MAD, multi-wavelength anomalous dispersion, we can calculate sinα because there is one equation for each wavelength.
In the case of SAD, the program shelxd approximates |FA|sin(α) ≈ |FA|. Why is this justified?
Bijvoet pairs with a strong anomalous difference ( |F
+| − |F−|
) have greater impact in direct methods. The
difference is large, however, when α is close to 90◦ or 270◦, i.e. when sin(α) ≈ ±1. This coarse approximation
In order to find the wavelength with the strongest response of the anomalous scatterer, the values of f0 and f00
are determined from a fluorescence scan.
−10 −8 −6 −4 −2 0 2 4 Signal [e] f’ and f’’ for Br λ = 1Å * 12398 eV/E Br f" Br f’
In order to get the strongest contrast in the differ-ent data sets, MAD experimdiffer-ents collect data at
1. maximum for f00 (peak wavelength) 2. minimum for f0 (inflection point)
Single isomorphous replacement requires two data sets:
1. Macromolecule in absence of heavy metal (Au, Pt, Hg, U,. . . ), called the native data set.
Measures |FP|.
2. Macromolecule in presence of heavy metal, called the deriva-tive data set.
Measures |F|.
|FA| can then be estimated from the difference |F| − |FP|
Im(F(hkl))
F
FA FP
SAD and MAD
vs.
SIR and MIR
SAD/ MAD SIR/ MIR
• |FA| from |F+| vs. |F−| (one data set) • |FA| from native vs. derivative (two data sets)
•signal improved by wavelength selection •signal improved by heavy atom (large atomic num-ber Z)
•requires tunable wavelength (synchrotron) •independent from wavelength (inhouse data)
SIRAS (1/2)
Isomorphous replacement is hardly used for phasing anymore:• Incorporation of heavy atom into crystal often alters the unit cell ⇒ non-isomorphism between native and derivative ruins usability of data sets
SIRAS (2/2)
However, one often has a high-resolution data set from a native crystal and one (lower resolution) data set with anomalous signal.
They can be combined into a SIRAS experiment:
• Anomalous data set: SAD
• Anomalous data set as derivative vs. native: SIR
set from a crystal with exactly the same (large) unit cell as our actual macromolecule but with only very few atoms inside.
Normalised Structure Factors
Experience shows that direct methods produce better results if, instead of the normal structure factor F(hkl),
the normalised structure factor is used.
The normalised structure factor is calculated as E(hkl)2 = F(hkl) 2/ε
hF(hkl)2/εi
ε is a statistical constant used for the proper treatment of centric and acentric reflections; the denominator F(hkl)2/ε as is averaged per resolution shell. It is calculated per resolution shell (≈ 20 shells over the whole resolution range).
Starting Point for Direct Methods: The Sayre Equation
In 1952, Sayre published what now has become known as the Sayre-EquationF(h) = q(sin(θ)/λ) X
h0
Fh0Fh−h0
This equation is exact for an “equal-atom-structure” (like the substructure generally is).
It requires, however, complete data including F(000), which is hidden by the beam stop, so per se the Sayre-equation is not very useful.
Tangent Formula (Karle & Hauptman, 1956)
The Sayre-equation serves to derive the tangent formulatan(φh) ≈
P
h0 |Eh0Eh−h0|sin(φh0 + φh−h0)
P
h0 |Eh0Eh−h0| cos(φh0 + φh−h0)
which relates three reflections h, h0, and h − h0.
H. A. Karle & J. Hauptman were awarded the Nobel prize in Chemistry in 1985 for their work on the tangent formula (and many other contributions).
Solving the Substructure
Direct methods (and in particular shelxd) do the following:1. Assign a random phase to each reflection. They will not fulfil the tangent formula.
2. Refine the phases using |FA| (or rather |EA|) to improve their fit to the tangent formula 3. Calculate a map, pick the strong peaks (as putative substructure)
Solving the Substructure
Steps 1-4 are repeated many times, each time with a new set of random phases.
Every such attempt is evaluated and the best attempt is kept as solution for the substructure.
The substructure is thus found by chance. It works even if the data quality is only medium — but may require more trials.
The cycling between steps 2-3, i.e. phase refinement in reciprocal space vs. peak picking in direct space, is called dual space recycling [4, section 16.1]
dual space
recycling
Real space: select atoms reciprocal space refine phases random atoms atoms consistentwith Patterson
repeat a few 100−1000 times
No
FFT and peak search
from atoms phases(map)
between E obs and E calc selection criterion: Correlation Coefficient
or (better)
e.g. with the Harker Construction. With those phases, an initial electron density map for the new structure can be calculated: I(hkl) |FA| (x, y, z)A
R
f
µ
φ(hkl)A φ(hkl) ρ(x, y, z)Phasing the Rest
References
1. G. M. Sheldrick, A short history of SHELX, Acta Crystallogr. (2008), A64, pp.112–133
2. R. J. Morris & G. Bricogne, Sheldrick’s 1.2Å rule and beyond, Acta Crystallogr. (2003), D59, pp. 615–617 3. E. Prince (ed.), International Tables for Crystallography, Vol. C, Union of Crystallography
4. M. G. Rossmann & E. Arnold (eds.), International Tables for Crystallography, Vol. F, Union of Crystallogra-phy