Phasing & Substructure Solution

(1)

CCP4 Workshop Chicago 2011

Phasing & Substructure Solution

Tim Grüne

Dept. of Structural Chemistry, University of Göttingen June 2011

http://shelx.uni-ac.gwdg.de

(2)

Data Integration (Anomalous)

Differences SubstructureSolution Phasing & DensityModification RefinementBuilding & mosflm, xds, HKL2000,. . . xprep, shelxc, solve,. . . shelxd, SnB, HySS,. . .

shelxe, resolve, pi-rate,. . .

arp/warp, coot, ref-mac5,phenix, YOU!

Molecular Replacement

(3)

An electron density map is required to create a crystallographic model. The electron density is calculated from the complex structure factor F(hkl):

ρ(x, y, z) = 1 V_cell

X

h,k,l

|F(h, k, l)|eiφ(h,k,l)e−2πi(hx+ky+lz) (1)

Experiment measures: I(hkl) i.e. real numbers

Model Building requires: F(hkl) i.e. complex numbers:

(4)

Motivation: The Phase Problem

The phase problem is one of the critical steps in macromolecular crystallography:

A diffraction experiment only delivers the amplitude |F(hkl)|, but not the phase φ(hkl) of the structure factor. Without phases, a map cannot be calculated and no model can be built.

(5)

Solutions to the Phase Problem

The main methods to overcome the phase problem in macromolecular crystallography (i.e. to obtain initial

phases):

• (multi/ single) wavelength anomalous dispersion (MAD, SAD)

• (multiple/ single) isomorphous replacement (MIR, SIR)

• molecular replacement (MR)

and combinations thereof (SIRAS, MR-SAD, . . . ).

The first two methods are so-called experimental phasing methods. They require the solution of a substructure.

(6)

~a

~b ~c

The Substructure of a (crystal-) structure are the coordi-nates of a subset of atoms within the same unit cell.

(7)

~a

~b ~c

The Substructure of a (crystal-) structure are the coordi-nates of a subset of atoms within the same unit cell.

It can be any part of the actual structure. In the usual sense, substructure refers to the marker atoms that are used for phasing.

A real crystal with the properties of the substructure (atom distribution vs. unit cell dimensions) cannot exist: the atoms are too far apart for a stable crystal.

(8)

Small Molecule Crystallography

The substructure consists of a small number of atoms in a large unit cell.

In small molecule crystallography, the phase problem is usually easily solved:

A structure with not too many atoms (< 2000 non-hydrogen atoms) can be solved by means of ab initio methods

(9)

The Terms “Ab initio” and “Direct” Methods

ab initio Methods: phase determination directly from amplitudes, without prior knowledge of any atomic posi-tions. Includes direct methods and the Patterson method. The most widely used of all ab initio methods are:

Direct Methods: phase determination using probabilistic phase relations — usually the tangent formula (Nobel prize for H. A. Hauptman and J. Karle in Chemistry, 1985).

(10)

The Substructure

Central to phasing based on both anomalous dispersion and isomorphous replacement is the determination of the substructure.

The steps involve

1. Collect data set(s)

2. Create an artificial substructure data set

3. Determine the substructure coordinates with direct methods 4. Calculate phases (estimates) for the actual data

(11)

What about Sheldrick’s Rule?

The Sheldrick’s Rule says that direct methods only work when the data resolution is 1.2 Å or better. Macromolecules seldomly diffract to this resolution.

(12)

Sheldrick’s Rule

Sheldrick’s Rule refers the real structures and the 1.2 Å-limit is about the distance where single atoms can be resolved in the data.

The substructure is an artificial crystal and atom distances

within the substructure are well above the usual diffraction limits of macromolecules, even for 4 Å data.

(13)

Solving a Macromolecular Phase Problem with a Substructure

The substructure represents only a very small fraction of all atoms in the unit cell (one substructure atom per a few thousand atoms). Yet, knowing the coordinates of the substructure allows to solve the phase problem for the full structure.

(14)

The Independent Atom Model (IAM)

“Ordinary” crystallography (other than ultra-high resolution charge density studies) is based on the “Independent Atom Model”: The total structure factor F(hkl) _{of each reflection} (hkl) _{is the sum the} _independent _{contributions}

from each atom in the unit cell.

(x, y, z)

x y

(hkl) = (430)

φ = 2π(hx + ky + lz)

= rel. distance to Miller-plane

Re(F) Im(F)

(15)

The Independent Atom Model (IAM)

x y

(hkl) = (430)

The (vectorial) sum from each atom contribution yields the total scattering factor F(hkl). The experiment delivers only its modulus |F(hkl)| =

q

(16)

Isomorphous Replacement (1/2)

The Harker Construction represents one way to determine the phase of F(hkl) from an isomorphous replace-ment experireplace-ment. It requires two data sets:

q

I_nat(hkl) qI_der(hkl)

Native: lacking one or several atoms (the sub-structure)

Derivative: Same unit cell, same atoms plus substructure atoms

(17)

Isomorphous Replacement (2/2)

The substructure coordinates can derived from the difference data set (qI_der(hkl) − qI_nat(hkl)). The substructure phases φ(hkl) can be calculated from the substructure coordinates.

The phases of the native (or derivative) data set can be derived from

• measured native intensities I_nat(hkl)

• measured derivative intensities I_der(hkl)

(18)

Harker Construction (1/4)

Im(F(hkl))

Re(F(hkl))

(19)

Harker Construction (2/4)

q

I_nat(hkl)

Draw the structure factor F(hkl) of the substructure.

F_der(hkl) _points _somewhere _{on the circle around the tip}

of F(hkl) with radius qI_nat(hkl)

because

(20)

Harker Construction (3/4)

Draw the structure factor F(hkl) of the substructure.

F_der(hkl) _points _somewhere _{on the circle around the tip}

of F(hkl) with radius qI_nat(hkl)

At the same time, F_der(hkl) also lies somewhere on the circle around the origin with radius qI_der(hkl), because

(21)

Harker Construction (4/4)

The two intersections of the two circles are the two possible structure factors F_der(hkl)_.

How to select the correct one:

MIR a second Harker construction with |F_der-2(hkl)| has exactly one of the two possibilities in common with the first Harker construction.

SIR the mean phase of the two possibilities serve as starting point for density modification, but if the substructure coordinates are centrosymmetric the

two-fold ambiguity cannot always be resolved (this problem is specific to SIR).

(22)

Substructure Contribution to Data

The electron density ρ(x, y, z) can be calculated from the (complex) structure factors F(hkl). Inversely the structure factor amplitudes can be calculated once the atom content of the unit cell is known:

F(hkl) = atoms j X in unit cell f_j(θ_hkl)e−Bj sin2_θhkl λ2 e2πi(hxj+kyj+lzj) (2)

• fj atomic scattering factor: depends on atom type and scattering angle θ. Values tabulated in [3], Tab. 6.1.1.1 • Bj atom displacement parameter, ADP: reflects vibrational motion of atoms

• 2π(hxj + kyj + lzj) phase shift: relative distance from origin • (xj, yj, zj): fractional coordinates of jth atom

(23)

Extraction of Substructure Contribution

Experimental determination (i.e. other than by molecular replacement) of the contribution of the substructure to the measured intensities is achieved by

• Anomalous Dispersion: Deviation from Friedel’s Law and comparison of

|F(hkl)| with |F(¯h¯k¯l)|

• Isomorphous replacement: Changing the intensities without changing the unit cell and com-parison of

|F_nat| with |F_der|

The alteration is usually achieved by the introduction of some heavy atom type into the crystal, either by soaking or by co-crystallisation.

(24)

Anomalous Scattering

The atomic scattering factors f_j(θ) describe the reaction of an atom’s electrons to X-rays.

They are different for each atom type (C, N, P,. . . ), but normally they are real numbers and do not vary signifi-cantly when changing the wavelength λ.

If, for one atom type, the wavelength λ matches the transition energy of one of the electron shells, anomalous scattering occurs.

This behaviour can be described by splitting f_j(θ) into three parts:

f_janom = f_j(θ) + f_j0(λ) + if_j00(λ)

(25)

SAD and MAD the following “grouping” has turned out to be useful, with its phase diagram on the right (ADP

e−8π 2_U

j

sin2_θhkl

λ2 omitted for clarity):

F(hkl) = non-X substructure f_µe2πihrµ | {z } F_P + substructure X normal f_νe2πihrν | {z } F_A + substructure X anomalous f_ν0e2πihrν | {z } F0 Im(F) F_P F_A F0 iF00 F

(26)

Since the equation for (Eq. 2) is a “simple” sum, one can group it into sub-sums. In the case of SAD and MAD the following “grouping” has turned out to be useful, with its phase diagram on the right (ADP

e−8π 2_U

j

sin2_θhkl

λ2 omitted for clarity):

F(hkl) = non-X substructure f_µe2πihrµ | {z } F_P + substructure X normal f_νe2πihrν | {z } F_A + substructure X anomalous f_ν0e2πihrν | {z } F0 +i substructure X f00e2πihrν Im(F) F_P F_A F0 iF00 F

Normal presentation, because f0 usu-ally negative

(27)

Now compare F(hkl) _with F(¯h¯k¯l) _{in the presence} _{of anomalous scattering:} F(¯h¯k¯l) = non-X substructure f_µe2πi(−h)rµ | {z } F_P− + substructure X normal f_νe2πi(−h)rν | {z } F_A− + substructure X anomalous f_ν0e2πi(−h)rν | {z } F0− substructure 00 2πi(−h)r Im(F) Re(F) F_P+ F_A+ F0+

Friedel’s Law is valid for the f_µ, f_ν, and f_ν0 parts:

(28)

Now compare F(hkl) _with F(¯h¯k¯l) _{in the presence} _{of anomalous scattering:} F(¯h¯k¯l) = non-X substructure f_µe2πi(−h)rµ | {z } F_P− + substructure X normal f_νe2πi(−h)rν | {z } F_A− + substructure X anomalous f_ν0e2πi(−h)rν | {z } F0− +i substructure X anomalous f_ν00e2πi(−h)rν Im(F) Re(F) F_P+ F_A+ F0+ iF00+ F+

The complex contribution of if_ν00 violates Friedel’s Law:

F_P−

F−

iF00−

F−

(29)

Karle (1980) and Hendrickson, Smith, Sheriff (1985) published the following formula∗: |F+|2 = |F_T|2 + a|F_A|2 + b|F_A||F_T| + c|F_A||F_T|sinα |F−|2 = |F_T|2 + a|F_A|2 + b|F_A||F_T| − c|F_A||F_T|sinα Im(F) Re(F) α F_T+ F_A+ F_P+ F+

with F_T = F_P + F_A the non-anomalous contribution of structure and substructure.

(30)

Combining Theory and Experiment

The diffraction experiment measures the Bijvoet pairs |F+| and |F−| for many reflections. In order to “simulate” a small-molecule experiment for the substructure, we must know |F_A|.

With the approximation 1₂(|F+| + |F−|) ≈ |F_T| the difference between the two equations above yields

(31)

Status Quo – A Summary

• Our experiment measures |F+| = q

I(hkl) and |F−| = q

I(¯h¯k¯l).

• We are looking for (an estimate) of |F_A| for all measured reflections.

– These |F_A| mimic a small-molecule data set from the substructure alone.

– The small-molecule data set can be solved with direct methods, even at moderate resolution.

• With the help of Karle and Hendrickson, Smith, Sheriff, we already derived

|F+| − |F−| ≈ c|F_A|sinα

(32)

The Factor

c

Before we know |F_A|, we must “get rid of” c and sin α.

c = 2f00(λ)

f(θ(hkl)) can be calculated for every reflection provided f

00₍_λ₎_.

During a MAD experiment, f00(λ) _{is usually measured by a fluorescence scan.}

To avoid the dependency on f00(λ), the program shelxd [1] uses normalised structure factor amplitudes

(33)

The Angle

α

In the case of MAD, multi-wavelength anomalous dispersion, we can calculate sinα because there is one equation for each wavelength.

In the case of SAD, the program shelxd approximates |F_A|sin(α) ≈ |F_A|. Why is this justified?

Bijvoet pairs with a strong anomalous difference (

|F

+_{| − |F}−_|

) have greater impact in direct methods. The

difference is large, however, when α is close to 90◦ _or 270◦_, _i.e. _when sin(α) ≈ ±1_{. This coarse approximation}

(34)

The Angle

α

In the case of MAD, multi-wavelength anomalous dispersion, we can calculate sinα because there is one equation for each wavelength.

In the case of SAD, the program shelxd approximates |F_A|sin(α) ≈ |F_A|. Why is this justified?

Bijvoet pairs with a strong anomalous difference ( |F

+_{| − |}_F−_|

) have greater impact in direct methods. The

difference is large, however, when α is close to 90◦ _or 270◦_, _i.e. _when sin(α) ≈ ±1_{. This coarse approximation}

(35)

In order to find the wavelength with the strongest response of the anomalous scatterer, the values of f0 and f00

are determined from a fluorescence scan.

−10 −8 −6 −4 −2 0 2 4 Signal [e] f’ and f’’ for Br λ = 1Å * 12398 eV/E Br f" Br f’

In order to get the strongest contrast in the differ-ent data sets, MAD experimdiffer-ents collect data at

1. maximum for f00 (peak wavelength) 2. minimum for f0 (inflection point)

(36)

Single isomorphous replacement requires two data sets:

1. Macromolecule in absence of heavy metal (Au, Pt, Hg, U,. . . ), called the native data set.

Measures |F_P|.

2. Macromolecule in presence of heavy metal, called the deriva-tive data set.

Measures |F|.

|F_A| can then be estimated from the difference |F| − |F_P|

Im(F(hkl))

F

F_A F_P

(37)

SAD and MAD

vs.

SIR and MIR

SAD/ MAD SIR/ MIR

• |F_A| from |F+| vs. |F−| (one data set) • |F_A| from native vs. derivative (two data sets)

•signal improved by wavelength selection •signal improved by heavy atom (large atomic num-ber Z)

•requires tunable wavelength (synchrotron) •independent from wavelength (inhouse data)

(38)

SIRAS (1/2)

Isomorphous replacement is hardly used for phasing anymore:

• Incorporation of heavy atom into crystal often alters the unit cell ⇒ non-isomorphism between native and derivative ruins usability of data sets

(39)

SIRAS (2/2)

However, one often has a high-resolution data set from a native crystal and one (lower resolution) data set with anomalous signal.

They can be combined into a SIRAS experiment:

• Anomalous data set: SAD

• Anomalous data set as derivative vs. native: SIR

(40)

(41)

set from a crystal with exactly the same (large) unit cell as our actual macromolecule but with only very few atoms inside.

(42)

Normalised Structure Factors

Experience shows that direct methods produce better results if, instead of the normal structure factor F(hkl)_,

the normalised structure factor is used.

The normalised structure factor is calculated as E(hkl)2 = F(hkl) 2_/ε

hF(hkl)2/εi

ε is a statistical constant used for the proper treatment of centric and acentric reflections; the denominator F(hkl)2/ε as is averaged per resolution shell. It is calculated per resolution shell (≈ 20 shells over the whole resolution range).

(43)

Starting Point for Direct Methods: The Sayre Equation

In 1952, Sayre published what now has become known as the Sayre-Equation

F(h) = q(sin(θ)/λ) X

h0

F_h0F_h₋_h0

This equation is exact for an “equal-atom-structure” (like the substructure generally is).

It requires, however, complete data including F(000), which is hidden by the beam stop, so per se the Sayre-equation is not very useful.

(44)

Tangent Formula (Karle & Hauptman, 1956)

The Sayre-equation serves to derive the tangent formula

tan(φ_h) ≈

P

h0 |Eh0Eh−h0|sin(φh0 + φh−h0)

P

h0 |Eh0Eh−h0| cos(φh0 + φh−h0)

which relates three reflections h, h0, and h − h0.

H. A. Karle & J. Hauptman were awarded the Nobel prize in Chemistry in 1985 for their work on the tangent formula (and many other contributions).

(45)

Solving the Substructure

Direct methods (and in particular shelxd) do the following:

1. Assign a random phase to each reflection. They will not fulfil the tangent formula.

2. Refine the phases using |F_A| (or rather |E_A|) to improve their fit to the tangent formula 3. Calculate a map, pick the strong peaks (as putative substructure)

(46)

Solving the Substructure

Steps 1-4 are repeated many times, each time with a new set of random phases.

Every such attempt is evaluated and the best attempt is kept as solution for the substructure.

The substructure is thus found by chance. It works even if the data quality is only medium — but may require more trials.

The cycling between steps 2-3, i.e. phase refinement in reciprocal space vs. peak picking in direct space, is called dual space recycling [4, section 16.1]

(47)

dual space

recycling

Real space: select atoms reciprocal space refine phases random atoms atoms consistent

with Patterson

repeat a few 100−1000 times

No

FFT and peak search

from atoms phases(map)

between E _obs and E _calc selection criterion: Correlation Coefficient

or (better)

(48)

e.g. with the Harker Construction. With those phases, an initial electron density map for the new structure can be calculated: I(hkl) |F_A| (x, y, z)_A

R

f

_µ

φ(hkl)_A φ(hkl) ρ(x, y, z)

(49)

Phasing the Rest

(50)

References

1. G. M. Sheldrick, A short history of SHELX, Acta Crystallogr. (2008), A64, pp.112–133

2. R. J. Morris & G. Bricogne, Sheldrick’s 1.2Å rule and beyond, Acta Crystallogr. (2003), D59, pp. 615–617 3. E. Prince (ed.), International Tables for Crystallography, Vol. C, Union of Crystallography

4. M. G. Rossmann & E. Arnold (eds.), International Tables for Crystallography, Vol. F, Union of Crystallogra-phy