Density modification - Automation in X-ray structure solution

1.3 Automation in X-ray structure solution

1.3.3 Density modification

The experimentally phased electron density map may initially be of insufficient quality to allow automatic tracing of a three dimensional atomic model. DM aims to improve phase estimates with an itera- tive procedure that cycles between real space incorporation of chemical knowledge and combination of the real space modified phases or phase distributions with the experimentally obtained phase probabilities in reciprocal space.

Figure1.9 illustrates DM in its most traditional form, starting with the generation of the electron density map by Fourier summation of the structure factors calculated with the centroid phases from the experimentally obtained phase probabilities (see Equation1.4). Three procedures frequently used to modify the resulting electron density map are solvent flattening (Leslie 1987; Wang 1985), NCS averaging (Muirhead et al.1967) and histogram matching (Zhang et al. 1997).

Solvent flattening (Leslie1987; Wang1985) is one of the earlier density modification procedures and relies on the disorder of the solvent in a protein crystal. In theory the disordered solvent molecules do not give rise to constructive interference in the diffraction experiment and hence the electron density in the protein crystal solvent channels should be featureless and flat. However, at the start of DM the experimental phase estimates are not very accurate and the initial electron density map is noisy. This may make it difficult to distinguish the boundary between solvent and protein. Rather than looking at the absolute value of

|F|, αex |F|, α ρ(x) ρmod(x) |Fmod|, αmod F Modify ρ(x) F−1 α=f(αmod, αex)

Figure 1.9: Steps in traditional density modification. Structure factor amplitudes, phases and electron density are indicated with |F|, α and ρ(x), respectively. The updated phase probabilities are obtained by multiplication of the original phase probabilities with the modified phases weighted according to the agreement between original and modified structure factor magnitudes.

the electron density to distinguish solvent and protein, the solvent mask may be defined by looking at local variance throughout the unit cell. Indeed, the featureless solvent region will have a low variance compared to the regions with protein. In the flattening stage, the electron density in the solvent region is set to the average solvent density, effectively removing the noise not associated with any structural information.

In cases where there are several copies of a molecule or domain in the asymmetric unit, the multiple redundant samples of highly similar electron density may be combined to improve the signal-to-noise ratio. The usefulness of non-crystallographic symmetry (NCS) averaging in gener- ating improved electron density maps has long been known (Muirhead et al.1967). By comparison, histogram matching (Zhang et al.1997) has been described only recently. The premiss in histogram matching is that the electron density values for a well phased map follow characteristic distributions as opposed to the electron density values of a randomly phased map that follow a normal distribution. By non-linear rescaling

of the electron density the histogram is made to look more like that of a well phased map. This suppresses negative electron density and tends to sharpen electron density peaks (Cowtan2010).

Note that the application of scaling and masking functions in real space is equivalent to combining, in reciprocal space, many structure factors through a convolution. This is an intrinsic property of the re- lation between reciprocal and direct space by Fourier transformation. Consequently, the random error component of the structure factors will average out, whereas the true values of the structure factors will add up systematically.

After modification of the electron density map the modified structure factors, obtained by inverse Fourier transform, need to be combined with the experimental phase probabilities. Because there is no explicit modified phase probability distribution, one is generated by es- timating the errors in the density-modified phases from the agreement from the between the observed and modified structure factor magnitudes. In the final stage of a DM cycle the probability distribution of modified phases and original experimental phases are multiplied to give an updated distribution. In the popularσA (Lunin and Urzhumt- sev1984; Read1986) algorithm, the original experimentally determined phases, represented by Hendrickson-Lattman coefficients (Hendrickson and Lattman 1970), are combined with the density-modified structure factors through a heuristic weighting scheme (Read1997).

There are two major problems associated with the traditional DM approach. All information about the phase probabilities is discarded by using only the centroid phases when calculating the electron density map. Furthermore, when multiplying probabilities care must be taken to ensure that the those probabilities are independent so as to avoid introducing bias. Yet, clearly the modified phase probabilities ultimately depends on the experimental phase probabilities through the DM cycle, resulting in ever narrower distributions without a real improvement of the phase estimates.

Several methods have been developed to reduce the correlation between the original and density-modified structure factors. Some of the earlier approaches include the reflection omit method (Cowtan and Main 1996), solvent flipping (Abrahams and Leslie 1996) and the γ- correction (Abrahams 1997) method. The γ parameter is an estimate of the contribution of the initial experimental structure factor to the density-modified structure factor in solvent flattening. By subtracting this contribution from the density-modified structure factor the correlation with the initial structure factor is suppressed. The γ-perturbation method (Cowtan 1999) is a generalization of this method for any type of density modification. Recently, Skubák and Pannu (2011) introduced the β correction parameter, a novel cross-validation method aiming to reduce bias in maximum-likelihood phase combination functions.

The statistical density modification techniques pioneered by Ter- williger (2004); Terwilliger and Berendzen (1999) and Cowtan (2000) are a different approach to reduce bias in DM. Rather than using a sin- gle modified map to express the a priori knowledge incorporated in the electron density map, statistical DM expresses the newly introduced information in terms of probability distributions, which are subsequently carried forward into reciprocal space illustrated in Figure 1.10. Since the density-modified phase probabilities are no longer estimated from the agreement between the observed and density-modified structure factor magnitudes, the link between the experimental and density-modified phase information is weakened.

Maximum likelihood allows for a rational incorporation of other sources of information. The use of maximum-likelihood in the phase- combination scheme described by Pannu et al. (1998) allows for incorporation of experimentally determined phase information in the form of Hendrickson-Lattman coefficients. The method has been shown to out- perform theσAphase combination traditionally used in classical density modification (Cowtan 2010).

It is clear that the problem of a poor electron density map can be partially mitigated by using prior knowledge about the expected distri-

|F|, P(αex) |F|, P(α) |Fbest|, αbest ρ(x) P(ρ(x)) Pmod(α) Centroid F Infer Transform distribution P(α) =P(αmod)×P(αex)

Figure 1.10: Steps in statistical density modification. Structure factor amplitudes and phases are indicated with|F|andα, respectively. Thea priori information inferred from the electron density, ρ(x) is expressed as probability distributions, P(ρ(x)). These are carried forward into reciprocal space, P(αmod) and multiplied with the experimental phase probabilities,P(αex) to obtain the updated phase probabilities, P(α). bution of electron density and typical arrangement of a protein crystal and combining this density-modified phases with the experimental phases. Current methods assume that these sources of information are independent, yet it is clear that they are not, which results in model bias and escalated figures of merit. Chapter3 describes a DM scheme based on a novel multivariate equation that does not assume independence of initial and density-modified phases. Large scale testing of the new DM method convincingly shows that this procedure generates dramatically improved electron density maps, allowing many more macromolecular structures to be traced without manual intervention from the user.

In document Software developments in automated structure solution and crystallographic studies of the Sso10a2 and human C1 inhibitor protein (Page 38-43)