We briefly specify the protein design algorithm in which we did the implementation, our extension to the algorithm of Le Grand and Merz, and the definition of SASApack scores.
8.2.1
Protein Design Algorithm
We performed all our design simulations with the Rosetta program (Kuhlman et al., 2003). Rosetta uses a Monte Carlo (MC) algorithm with simulated annealing to search sequence space. As in most programs developed for protein design, amino acid side chains are modeled in their most preferred alternate conformations, called rotamers. Rosetta uses Dunbrack’s backbone-dependent rotamer library with additional rotamers created by varying anglesχ1 andχ2 by one standard deviation in each direction of their
preferred values (Dunbrack and Cohen, 1997). Starting with a random sequence, the en- ergies of single amino acid substitutions or rotamer switches are evaluated and accepted if they pass the Metropolis criterion (Metropolis et al., 1953). The standard Rosetta energy function has been described previously; the most important terms in the poten- tial are a 12-6 Lennard-Jones potential, the Lazaridis-Karplus implicit solvation model, an explicit hydrogen-bonding term and knowledge-based torsion potentials (Kuhlman et al., 2003; Lazaridis and Karplus, 1999; Kortemme et al., 2003).
8.2.2
Extending Le Grand and Merz to Maintain SASA
The solvent-accessible surface (SAS) is the surface traced by the center of a spher- ical solvent molecule as it rolls across the molecular (van der Waals) surface of the solute molecule. Lee and Richards first introduced the construction of this surface by increasing the van der Waals radii of the solute by the radius of the solvent (Lee and Richards, 1971). Shrake and Rupley introduced a numerical approximation to the computation of the solvent-accessible-surface area (SASA) by taking dot samples on the surfaces of these enlarged spheres (Shrake and Rupley, 1973). Each dot contained within the enlarged sphere of any other atom is inaccessible to solvent; call such a dot covered. Each dot not contained within an enlarged sphere is accessible to solvent; call such a dot exposed. In this approximation, each exposed dot represents a patch of the solvent-accessible surface.
Le Grand and Merz gave a rapid technique for computing which dots on the surface of one sphere are contained within a nearby sphere (Le Grand and Merz, 1993). This technique uses the same discretization for each sphere so it can precompute Boolean overlap masks recording those dots on a sphere covered by an intersecting sphere with a given radius, orientation, and distance. To determine the set of dots on a sphere that are exposed, determine the intersecting neighbors and OR the overlap masks. This Boolean OR operation is a semigroup operation – it is associative, but does not have
Figure 8.1: SAS Update Algorithm. Updating dot-coverage counts for a red atom pro- duced by a rotamer substitution at the green residue. The substitution increases the coverage count for dot A and decreases it for B and C. This covers dot A, and exposes dot B; the count indicates that dot C remains covered by another residue.
an inverse. If one keeps a count of the number of residues whose spheres cover a dot, this is a group operation that does have an inverse, and one can return to a previous state by decrementing a count previously incremented. Thus, our algorithm maintains these coverage counts for each dot; a dot with a coverage count of zero is exposed, and a dot with a coverage count of one or more is covered.
Consider a rotamer-substitution step in which a current structure is compared against an alternate structure created by substituting a current rotamer with an al- ternate rotamer at a single residue, as depicted in Figure 8.1. The SASA calculation must determine the dot-coverage counts for both 1) the set of spheres for the alternate rotamer and 2) the set of spheres for the rest of the protein that overlap the spheres from either the current or the alternate rotamer. The dots on spheres that are not in these two sets are unaffected by the rotamer substitution. The spheres in the first set are absent from the current structure; for these spheres we perform Le Grand and Merz sphere-overlap computations. The spheres in the second set are present in the current structure; we decrement the coverage counts for the dots covered by the current rotamer, place the alternate rotamer onto the structure, perform sphere-overlap com- putations, and increment the coverage counts for those dots covered by the alternate rotamer.
8.2.3
SAS-Update Procrastination
A fruitful algorithmic optimization is to avoid SAS-update computations whenever possible. At low temperature (kT < 0.5) in our Monte Carlo (MC) optimization, we avoid SAS computations for collision-inducing rotamer substitutions. The MC opti- mizer decides whether or not to commit a considered rotamer substitution based on the change in energy ∆E = ∆EP D+∆ESASA. Here, ∆EP D represents the change in the pairwise-decomposable portions of Rosetta’s energy function, and ∆ESASA represents the non-pairwise-decomposable-SASA term. If ∆EP D exceeds some positive threshold,
t, then we procrastinate SAS-update computations and approximate ∆ESASA= 0. The MC optimizer decides whether to accept the substitution on the basis of ∆EP Dalone. If it does, then we perform the procrastinated dot-coverage count update. At low temper- atures, 75% of rotamer substitutions are rejected without having to compute ∆ESASA, giving a 4-fold speedup in design simulations on complete proteins (Table 8.1).
8.2.4
Shared Atoms in Rotamers
A further algorithmic optimization is to reuse all possible sphere-overlap computations. When replacing the current rotamer with an alternate rotamer, we count the number of shared atoms between the two and compute sphere overlaps only for those atoms in the alternate rotamer that are not contained in the current rotamer. In the best scenario – replacing a current tyrosine rotamer with an alternate tyrosine rotamer that differs only at χ3 – our algorithm computes sphere overlaps for the single terminal
hydroxyl hydrogen only. A tree (or trie) data structure can be used to find the shared atoms in time proportional to their number (Leaver-Fay et al., 2005b). Shared-atom optimization produces a 1.87-fold speedup in design simulations on complete proteins (Table 8.1).
Together, the SASA-update procrastination and the shared-atom optimizations pro- duce a 6.92×speedup. Since this is slightly less than 4×1.87 = 7.47, the two optimiza- tions are not completely separable.