1.6.1 M ethods o f Structure Determ ination
Many of the studies described above and nearly all of the studies described in this thesis rely on information about the three-dimensional conformations proteins adopt. Obtaining this information is neither easy nor straightforward, and several methods have been developed to gamer it. Understanding these methods and their limitations is vitally important to interpreting the data obtained in these studies, which is why they will be described here.
The majority of structures used in this study are taken from X-ray crystal studies, which rely on passing X-rays through protein crystals and determining an electron density map of the protein from the pattern the diffracted X-rays make. This technique will be described briefly in section 1.6.2, which follows.
Nuclear Magnetic Resonance (NMR) spectroscopy can determine the structure of proteins in solution. It involves exposing a protein in solution within a magnetic field to a radio frequency pulse. Nuclei that have a magnetic moment (principally the isotopes ^H, ^^N, and become aligned either with (lower energy) or against (higher energy) the magnetic field. The radio frequency pulse energises some of the nuclei from the low to the high energy state. When these nuclei relax they radiate the energy at frequencies that vary with the magnetic field, the radio pulse, and the atomic environment of the atoms. COSY (correlation spectroscopy) discovers interactions that are close together in covalent structure, whilst distant interactions are investigated with NOE (Nuclear Overhauser Effect) spectroscopy. This gives a set of distance constraints that can be used to produce a set of models. Because of the problems in separating emissions of similar frequency from the same protein, solving the structures of even medium-sized proteins, of above 200 residues, is difficult. Despite this, a recent study assigned resonances for a 269 residue protein (Remerowski et a l 1994). Comparisons
of X-ray and NMR structures (Billeter 1992) show remarkably close agreement.
Neutron diffraction crystallography is similar in concept to X-ray crystallography, but uses a beam of neutrons which are diffracted by atomic nuclei rather than X-rays diffracted by electron shells. Neutron beams, unfortunately, are scattered to similar extents by all atomic nuclei, and so cannot reveal sufficient information by themselves to solve a protein structure. This technique is much more rarely used than X-ray crystallography and usually augments the X-ray data rather than replaces it (Wlodawer
1982), but is useful for locating hydrogen atoms.
1.6.2 X -R ay Crystallography
The vast majority of protein structure models used in this thesis were determined by X- ray crystallography, as is true of the vast majority of structures in the Protein Data Bank (Bernstein et a l 1977). X-ray crystallography (Blundell and Johnson, 1976) relies on deducing the structure of a protein from the patterns an X-ray beam makes after being fired through a crystal of the protein. The 'scattering pattern' of the crystal is a combination of the scattering pattern of the molecular motif and the crystal lattice.
X-rays are scattered by the electron clouds that surround atoms, and therefore are scattered more by atoms with high atomic numbers. Thus, nitrogen, oxygen, and carbon scatter X-rays to a similar extent and are difficult to distinguish. Also, hydrogen scatters X-rays less efficiently than carbon, thus hydrogen atom positions are not easily located using X-rays.
The crystal lattice amplifies the scattering of a single cell through the regular repetition of near-identical cells throughout the crystal. Because the quality of the crystal lattice is important, the crystallisation technique must be chosen with care. Crystallisation techniques involve a solution of the protein changing slowly from a concentration slightly below its saturation level to supersaturation. The main factors that are used to
Chapter 1: Introduction
bring about this change are the concentrations of protein, added solutes, temperature, and the concentrations of précipitants. Finding conditions that will produce high quality protein crystals uses a combination of the knowledge of crystallisation and trial and error. The crystals must be large enough to be usable (usually, at least 100pm) and of sufficiently high quality to produce a usable diffraction pattern. The most popular crystallisation method, vapour diffusion, typically entails a drop of protein solution, containing a concentration of precipitant below the level needed to cause crystallisation, suspended in a sealed container above a reservoir of solution of the precipitant at a concentration just above the precipitation point. The reservoir exerts osmotic pressure on the air within the container, dehydrating it. This in turn exerts osmotic pressure on the drop of protein solution, increasing the concentrations in the drop of protein and precipitant to cause slow crystallisation.
X-rays used are of the order of 1.54Â, and hence occupy the part of the electromagnetic spectrum able to resolve the detail on the scale of Angstroms that is important in protein structures. The intensity of each reflection is directly proportional to the square of the amplitude of its structure factor. Solving the structure requires information about the phases of reflections as well as their amplitudes.
The process of recovering the phases is known as the "phase problem". The two most common solutions to this problem are multiple isomorphous replacement (M.I.R.) and molecular replacement (M.R.). M.I.R. relies on crystals of the protein that also include heavy metal atoms. Because heavy metal atoms scatter X-rays much more efficiently than protein atoms and are present in very low numbers in each crystal cell, the phases for the reflections can be determined. These can be used as the basis for determining a preliminary set of phases for reflections from the diffraction pattern of the protein, and then discarded. Alternatively, the phases from one or more solved structures of homologous proteins can be used as the starting point for refinement, in a technique called molecular replacement.
series allows an electron density map to be calculated. The covalent structure of the protein is then fitted into the map.
The final step in modelling protein structure is called refinement. This fine tunes the model so that it agrees as far as possible with both the electron density map and expectations about protein structure. This uses minimisation methods, principally linear least-squares refinement and molecular dynamics (see section 1.5) to the sum of two terms. The first term is a structure factor term representing the disagreement between the measured reflections and the reflections that would correspond to the model. The second is a stereochemical term representing unfavourable bond angles, lengths, and contacts. Often, an energy formulation (see section 1.5) is used as the structure factor. Least squares refinement has a very low radius of convergence and finds only local minima {e.g. Tronrud et al. 1987). Sometimes, particularly in the latter stages of refinement, this method is chosen because only a small radius of convergence is required. Molecular dynamics simulations of annealing can explore conformational space and have a much larger radius of convergence (Briinger et al. 1987). Fitting the covalent structure of the protein to the electron density is sometimes done by alternating between manual intervention in model building and automated refinement procedures.
Useful as X-ray models are, it is very important to understand their limitations. The electron density map is rarely accurate enough to show the single electron around a hydrogen nucleus. The crystal lattice is rarely perfect, and some atoms do not occupy the same position in all unit cells. Such atoms, if in more than one position, will show up in the electron density map as alternative positions in the model. The 'occupancy' of each model atom position can be calculated. Each atom also has a temperature factor ("B-factor") to show the degree of thermal motion that blurs the image of the atom in the electron density map. Atoms that form few contacts with the protein, such as solvent atoms, loop regions, and exposed side-chains, will have a high degree of molecular motion and not appear in the electron density map. Because nitrogen, oxygen, and carbon atoms scatter X-rays with similar efficiency the images of the
Chapter 1: Introduction
approximately symmetrical. Which atom of the terminal amide group is nitrogen and which is oxygen, or which atoms of the imidazole group of His are nitrogen and which are carbon, must be implied from the environment around the side-chain, principally from location of potential hydrogen bonding partners This will be analysed in the body of the thesis.
The level of detail shown by electron density maps differs with resolution. An electron density map with a resolution of 3.0Â allows the path of the main-chain to be determined; a resolution of 1.5Â allows resolution of individual atoms; one of 1.2Â even allows resolution of hydrogens. Knowledge of covalent structure often allows atoms to be positioned with more confidence than resolution would suggest. At this stage other values of the quality of a model are calculated, including the occupancies and B-factors described above. The agreement between model and X-ray data is quantified as a residual factor or '/^-factor'. The 7?-factor varies with the resolution of the structure; a well refined protein structure should have an 7^-factor of, at most, 0.2. Nearly all the X-ray structures in the Protein Data Bank have /^-factors of 0.2 or less.