Results: ADP in SCPP - Capturing atomic interactions with a graphical framework in computationa

I tested adaptive dynamic programming at the rotamer relaxation task (where the amino acid sequence is held fixed) and at the redesign task (where the amino acid sequence is allowed to vary). For both tasks, I selected 15 surface residues from ubiquitin’sβ-sheet, pictured in Fig. 4.9a. I excluded the following amino acids to keep the treewidth of our interaction graphs low: arginine, lysine, and methionine. I chose this optimization problem for two reasons: DEE has been reported to perform poorly at protein surfaces (Gordon and Mayo, 1998), and the interaction graph defined by this problem has a small treewidth. If dynamic programming or adaptive dynamic programming is to beat dead-end elimination, it will be for problems like this one. Indeed, the implementation of dead-end elimination I have put together is unable to solve this problem. Loren Looger’s implementation, which contains theorems I have not imple- mented (Looger and Hellinga, 2001), solves this problem in 30 minutes (Loren Looger, personal communication).

a b c d

Figure 4.9: Ubiquitin’s β-sheet. The β-sheet in (a) is flattened in (b) with its 15 surface residues shown. Rosetta’s energy function defined the treewidth-4 interaction graph in (c) when it included edges between residues for any pair of rotamers that interact with an energy magnitude at leastµ= 0.2kcal/mol. I artificially defined the treewidth-3 interaction graph in

For the rotamer relaxation task, I first created 100 sequences for the ubiquitin back- bone, asking the design module of the Rosetta molecular modeling program (Kuhlman and Baker, 2000) to redesign these 15 surface residues. I then evaluated Rosetta’s experimentally validated energy function (Kuhlman et al., 2003) between all pairs of sub-rotamers, and included degree-2 hyperedges that met the interaction magnitude threshold µ = 0.2 kcal/mol. This produced a treewidth-4 interaction graph, shown in Fig. 4.9c. I set the irresolvable collision cutoff to τ = 1 kcal/mol. I compared the standard dynamic programming algorithm against the adaptive algorithm with ǫ

values of 0, 10−4_{, 10}−3_{, 10}−2_{, 0.1, and 1.0 kcal/mol. Against dynamic programming,}

I compared the time spent in a single simulated annealing trajectory and the score simulated annealing produced.

In the relaxation problem, the average residue had 32 total rotamers, breaking down into 3 base-states and 10 sub-states per base-state. The median state space size was

∼ 1018_{. I measured performance on a dual 2 GHz AMD Athlon with 2 GB RAM.}

In Fig. 4.10, I plot the relative running time of the adaptive and standard dynamic programming algorithms against the actual error observed. In Table 4.1, I present the actual running times. Except for three instances of the 100 rotamer-relaxation tasks, simulated annealing produced the optimal answer when run for as long as standard dynamic programming. ADP Run Time DP ǫ= 0 ǫ= 10−4 _ǫ_{= 10}−3 _ǫ_{= 10}−2 _ǫ_{= 0}_.₁ _ǫ_{= 1}_.₀ _SA Mean 206.2 63.7 62.9 63.0 61.2 17.5 6.4 65.1 Median 117.3 53.7 53.2 53.7 50.7 11.3 4.8 65.0 Std Dev 399.3 49.1 48.6 48.9 48.8 38.2 7.6 6.5 Table 4.1: ADP Runtimes at Rotamer Relaxation. Average running time comparison of dynamic programming (DP), adaptive dynamic programming (ADP), and simulated annealing (SA), in miliseconds, at the rotamer-relaxation task (fixed sequence side- chain placement) for the 15 surface residues of ubiquitin’s β-sheet.

For the redesign task, I artificially created a treewidth-3 interaction graph on the problem, pictured in Figure 4.9d. This interaction graph differs by a single edge from the graph in 4.9c. The absence of this edge may decrease the quality of the design. I none the less include the task as it pushes the DP algorithm’s limits. (After I made the decision to drop this edge and published these observations, Loren Looger kindly ran dead-end elimination on exactly this problem instance and found the answer produced

Figure 4.10: Speedup vs Error for ADP. Increasing ǫ to as high as 1.0 kcal/mol gives a theoretical error bound of± 32 kcal/mol but in actuality greatly preserves accuracy and greatly decreases running time.

by dynamic programming in the absence of this edge was the same as produced by a complete graph – the graph including both this edge and all other dropped edges that failed to meet the 0.2 kcal/mol magnitude threshold [Looger, personal communication]).

Each residue in the design problem had on average 680 rotamers to choose from. This broke down into about 57 base-rotamers per residue and 12 sub-rotamers per base-rotamer. The size of the state space was _∼1042_{. I measured the performance of}

both the standard and the adaptive dynamic programming algorithms on a dual 900 MHz 64-bit Itanium-2 with 10 GB of RAM. I compared the running time, the memory use, and the score produced by ADP against simulated annealing, measuring simulated annealing on the same 2 GHz Athlon described above. Because both DP and ADP required so much memory, they had to be run on a 64-bit machine. This state-of- the-art machine is slower than the commodity 32-bit machines available; a fact that a protein designer should take into a consideration when choosing among optimization algorithms. Table 4.2 contains the results of this experiment.

DP ADP, ǫ = 0.1 ADP, ǫ = 1.0 SA Run Time 15.99 hrs. 5.07 hrs. 1.52 hrs. 3.42 seconds

Memory Usage 3.7 GB 3.4 GB 1.5 GB 0.2 GB

Score (kcal / mol) -42.5893 -42.5893 -42.5579 -42.5692

Error (kcal / mol) 0.0000 0.0314 0.0201

Table 4.2: Redesign Task Performance Comparison. A comparison of the running time and memory usage for dynamic programming and adaptive dynamic programming at the redesign task for the 15 surface-pointing residues on Ubiquitin’s β-sheet.

In document Capturing atomic interactions with a graphical framework in computational protein design (Page 111-114)