4.3 The Shift-Alignment Algorithm
4.4.2 Comparison to TALOS
As TALOS predicts backbone angles rather than producing an alignment, it was impossible to include it in the tests of the previous section. Nevertheless, as the ultimate goal of this work is to be able to do homology modeling based on chemical shift alignments, it is important to evaluate if there are cases where SimShift is able to provide higher quality torsion angle predictions than TALOS. Therefore, we split the proteins in the benchmark set into two parts: Of each pair, one protein is classified as being the modeling target and the other is used as a potential template. This leaves us with a set of 363 target proteins, each one having at least one associated template structure.
Because the true structure of our targets is known, we can compare the RMSDs of the backbone angles of the best alignment produced by SimShift to the pre- dictions made by TALOS. Thereby, we use only residues where both methods provide torsion angles. Of all 363 targets, 178 have a better RMSD for both the φ- and the ψ-angle. The average RMSD-difference for those where SimShift is better is 18.18◦ for φ, and even 45.58◦ for ψ. This shows that SimShift can
be useful for assisting the structure resolution process even in the presence of TALOS.
4.5 Discussion
We aimed at answering the question: “Is it possible to create structurally correct alignments from chemical shift data alone, when sequence similarity is low?” We argued that this is indeed the case. Through the comparison to other methods we also motivated that information about long range interactions can be extracted from chemical shift data and may be used to create structurally meaningful align- ments.
46 Chapter 4. SimShift
The shift data used here is derived from the BMRB, which is known to contain high quality as well as low quality entries. We were interested in the performance of our approach on experimental data, we therefore did not include any interme- diate processing steps. Additionally, because there is only a limited number of proteins with associated chemical shifts, it isnot advisable to reduce this set even more by restricting oneself to confirmed high quality entries. As the performance presented here was achieved on shifts probably containing erroneous data, one can expect even more accurate alignments when using curated shift data.
What has been presented is a first step towards automating the structure de- termination process with NMR spectroscopy. Chemical shift alignments can be a useful tool for the spectroscopist who starts searching a database of chemical shifts before performing additional experiments. If similarities can be identified a model for the protein of interest may be created. Through comparison to NOE maps, for example, it is possible to validate (or invalidate) the model.
There is still some work to do towards automating structure determination. In the following chapter we present SimShiftDB, a database search tool based on chemical shift alignments. Using the similarities identified by our database search we are able to infer structural information from database proteins to the target protein we are working on. We also apply a statistical model to assess the significance of each similarity identified.
5 SimShiftDB
5.1 Introduction
NMR Spectroscopy is an established method to resolve protein structures on an atomic level. The NMR structure determination process consists of several steps, varying in complexity. A quantity that is measured routinely in the beginning is the chemical shift. Chemical shifts are available on a per atom basis and inher- ently carry structural information. Chemical shifts in general do not suffice to calculate the structure of biological macromolecules, such as proteins. Additional experiments of increased complexity and human expert knowledge are necessary to obtain the solution.
In Chapter 4, the performance of a pairwise chemical shift alignment algorithm was evaluated. We were able to show that it is indeed possible to utilize the infor- mation hidden in the chemical shift data for constructing structurally meaningful alignments. Now we present a method (calledSimShiftDB) that searches for sim- ilarities between a target protein with assigned chemical shifts and a database of template proteins for which both chemical shift data and 3D coordinates are available. The alignment algorithm used in the previous chapter was adapted to fit the requirements of database searching. Also additional constraints derived from the template structure have been incorporated into the calculations.
For each target-template-pair we calculate a chemical shift alignment. These alignments map a set of residues from the target to a set of residues from the template structure. Therefore, we can build a structural model for the aligned residues from the target based on the coordinates of the associated residues from the template. To give the spectroscopist the possibility to judge over the statis- tical significance of a certain alignment with shift similarity score S, we calculate the expectation of the number of alignments with score ≥S occurring by chance. To evaluate the performance of our approach, we compare the backbone angle prediction quality of our method to 123D [Alexandrov et al., 1996], a threading approach, and to TALOS [Cornilescu et al., 1999], which tries to calculate back- bone angles from the amino acid sequence and associated chemical shift data. We are able to prove empirically that 123D is outperformed significantly by our method. When comparing to TALOS, SimShiftDB performs significantly bet- ter for 36% of the target’s residues. Our result suggests that both TALOS and
48 Chapter 5. SimShiftDB
SimShiftDB have their strengths and, therefore, should be used in parallel in the NMR structure determination process.
In the following we describe the template database, the chemical shifts substi- tution matrices used for scoring chemical shift alignments, the calculation of the expected value and the SimShiftDB algorithm. Afterwards, the results on our test set will be presented and discussed.