• No results found

The next subsection describes the details of the class hierarchy as it exists in Rosetta today. The interaction graph class hierarchy contains six groups of classes providing three sets of concrete classes for use in protein design (Figure 5.3). Two of these concrete classes I describe in greater detail in Chapters 7 and 8. I described the concrete

Between the InteractionGraphBase and the PDInteractionGraph classes lie the

PrecomputedPairEnergiesInteractionGraph classes. These classes define an inter- face between the rotamer-pair-energy evaluation code and the graphs that store the energies. The non-pairwise decomposable energy function I describe in Chapter 8 still includes the pairwise decomposable portion of Rosetta’s energy function. The graph I use to represent the non-pairwise decomposable function stores the pairwise decompos- able energies in edges in the same way as thePDInteractionGraph class. Both graphs inherit from the PrecomputedPairEnergiesInteractionGraph classes.

The AdditionalBackgroundNodesInteractionGraph classes provide general func- tionality for the incorporation of non-pairwise decomposable energy functions into the packer. This group of classes add two new classes to the parallel inheritance hierarchy: the SecondClassNode and the SecondClassEdge classes. The SecondClassNode class defines the set of additional scoring functions that contribute to a non-pairwise decom- posable energy function. They are second class in that they do not carry state spaces and remain invisible to the annealer. The SecondClassEdge class defines relationships between the second class nodes (the non-pairwise decomposable functions) and the first class nodes (those with state spaces) that affect those functions.

The FirstClassNode adds to the NodeBase class an edge list data member (and an edge vector data member) to maintain the second class edges that connect first class nodes to second class nodes. Like the NodeBase and EdgeBase classes, the

FirstClassNode, SecondClassNode, and SecondClassEdge classes maintain all of their data privately and provide read access to derived classes through protected, in- lined methods. The classes used to incorporate the non-pairwise decomposable en- ergy function described in Chapter 8 derive from the AdditionalBackgroundNodes- InteractionGraph and its associated classes.

Figure 5.3: Interaction Graph Class Diagram.The six sets of interaction graph classes, composed of three concrete sets (theFlexBBgraph, thePDgraph, and theSASAgraph) and three abstract sets. With Jenny Hu, I have refactored the simulated annealing code into the SimAnnealerBaseclass using the strategy design pattern to facilitate later in- corporation of variants on simulated annealing. TheFixBBSimAnnealer interfaces the interaction graph as described in this chapter; I describe the interface between the

FlexBBInteractionGraph and the FlexBBSimAnnealer in Chapter 7. All three con- crete edge classes use theAminoAcidNeighborSparseMatrix class to store the pairwise decomposable portions of the energy function; memory is always an issue in protein design. Refactoring the efficient sparse matrix representation into its own class makes it easy for new interaction graphs to take advantage of the amino-acid-neighbor-memory- saving technique. Image generated with VirtualParadigm.

Chapter 6

Fast Rotamer-Pair-Energy

Calculation

This chapter1 describes a technique for rapid rotamer-pair-energy computation between

pairs of residues. The rotamer-pair-energy-computation phase dominates the running time in a typical call to the packer; I have sped the computations in this phase by a factor of ∼3.5×.

6.1

Introduction

Designers divide the side-chain-placement problem into two phases. In phase 1, they precompute all possible rotamer-pair interaction energies for their rotamer library, and in phase 2, they search for the (globally) optimal side-chain placement. Significant work has gone into exact algorithms for the side-chain-placement problem (Desmet et al., 1992; Goldstein, 1994; Looger and Hellinga, 2001; Gordon and Mayo, 1999; Leaver-Fay et al., 2005a). Still, the problem is NP-Complete (Pierce and Winfree, 2002), and many researchers choose fast stochastic optimization techniques (Bradley et al., 2003; Dahiyat and Mayo, 1997b; Holm and Sander, 1992; Saven and Wolynes, 1997; Desjarlais and Handle, 1995). The rotamer-pair-energy computation of phase 1 can be a significant fraction of the running time for both techniques, and usually dominates the running time of stochastic techniques.

The interaction energy between two rotamers,AandB, is the sum of the atom/atom interaction energies over all atoms of A with all atoms ofB. When a pair of rotamers

1Portions of this chapter were previously published in the Workshop on Algorithms in Bioinformat-

ics (WABI) 2005, held in Mallorca, Spain. This work represents a collaboration with Brian Kuhlman and Jack Snoeyink.

on the same residue share torsional angles, they share atoms. Repeated atoms imply repeated atom/atom energy evaluations when computing all rotamer-pair energies.

The obvious way to avoid repeating atom/atom energy computations is to store in a table the result of atom/atom energy computations for unique atom pairs. When a unique atom pair is encountered for the first time, calculate the pair’s interaction energy and store it. When a unique atom pair is encountered any subsequent time, simply look up the old result. However, with a moderately large rotamer library of 2K rotamers, which we use as our running example, a single residue can generate 10K unique atoms. A unique-atom by unique-atom table with 10K x 10K entries would occupy 400 MB. This table does not fit in a processor’s cache (512 KB). Although storing energies avoids repeated computation, retrieving the table entries incurs cache misses, eroding any savings in running time.

We use a trie to represent all the rotamers on a single residue. With a pair of these “rotamer tries” we can rapidly compute the rotamer-pair energies, while reducing our memory usage. We have implemented our algorithm within the Rosetta molecular modeling software (Bradley et al., 2003). Because our memory use is minimal, and because we reuse atom/atom energy computations, our algorithm runs nearly 4 times faster than Rosetta’s existing method.