1.3 Computational protein design
1.3.3 Improving designs
Natural, well-folded proteins typically have native states that have significant but not very high stability compared to their unfolded states (∼3-10 kcal/mol). Because the stabiliza- tion resulting from the burial of a hydrophobic group or the formation of a hydrogen bond can be on the order of∼1 kcal/mol [86], even small improvements, for example in packing efficiency [87], can substantially improve protein design outcomes (Figure 1.3a). As de- scribed below, many tools and methods, mainly native-centric but some considering also the transition and denatured states, have been developed that may improve designs and increase the likelihood of obtaining a soluble and well-folded protein.
Using native-centric energy functions
Considerable progress both in whole sequence design and in subsequent optimization have been achieved using atomistic energy functions coupled with extensive sidechain/backbone configurational searching. In a relatively computationally tractable case, the backbone is kept fixed, limiting the search to only side-chain degrees of freedom. While successful designs have been realized using this approach [17, 44], the resulting sequences are fre- quently very similar to natural ones. Often, the ability of sequence design algorithms to recapitulate natural sequence statistics for a given topology is used as a scoring metric [88]. Such natural sequence statistics can be used directly as components of energy functions in design. For instance, Mitraet al. combined sequence information with the empirical FoldX energy function [89] to computationally redesign 243 proteins; 5 were tested experimentally of which∼3 appeared by NMR to be quite well-folded [90]. Protein design has also been accomplished using only statistical energy functions as in the recent redesign of four nat- ural targets [91]. Statistical energy functions, empirical or physical energy functions (like FoldX), and machine learning approaches have been applied as tools for stabilizing protein native structure, with mixed results [92, 93]; it is widely thought that more accurate force fields are needed to improve design outcomes [34, 94, 95]. Nevertheless, the abundance of computational tools for improving protein stability offers considerable scope for optimizing native structure (Figure 1.3b).
That design approaches employing a fixed backbone, as mentioned above, typically produce sequences with appreciable identity to natural proteins of the same topology, suggests the range of sequence innovation is limited. Varying the degrees of freedom of both the backbone and side chain, on the other hand, may greatly expand the range of designable targets, as even small (1-2 Å) perturbations of the backbone enabled the exploration of de novo sequences with low identity to natural counterparts or templates
[23,38,46]. Flexible backbone design can nevertheless recover natural sequences, and also recapitulate the covariation between amino acids at different positions [96]. The benefits of flexible backbone design, however, come at the cost of a more difficult optimization problem. This cost can be reduced by taking advantage of modularity and symmetry, as described for the largest completelyde novo globular protein design to date (∼200 amino acid, see “Globular folds with internal symmetry”) [21] (Figure 1.2d). This design was also aided by previously developed rational rules for the design of backbone templates compatible with well-funnelled energy landscapes, which demonstrated impressive success for variousαβtopologies [39,21] (Figure 1.2h). Collectively, there has been much progress in applying native-centric approaches both for optimization of existing native structure and de novo design.
Beyond native-centric design, using coarse-grained simulations
Coarse-grained simulations are being used increasingly to move beyond the native state and analyze the energy landscape of designed proteins. These simulations have indicated less cooperative folding of thede novo designed Top7 topology compared to natural pro- teins, which may be a consequence of non-native interactions [97] and/or an imbalance of local and long-range interactions [98]. Similarly, other designed proteins may suffer from complex/non-cooperative folding kinetics [97,99]. Coarse-grained simulations showed that designed surface electrostatic interactions in various proteins may reduce frustration and improve both equilibrium stability and folding kinetics [100] (Figure 1.3c). In contrast, non-native electrostatic interactions markedly slowed the folding of another designed pro- tein in all atom molecular dynamics (MD) simulations [101]. In general, coarse-grained simulations have shown that the folding energy landscape is modulated in complex and quite often detrimental ways by functional features [102]; yet, foldability can be criti- cal for achieving function [54]. Also, slower unfolding kinetics can protect proteins from degradation, modification, and aggregation, even if their thermodynamic stability is low; simulations illuminated how kinetic stability may be predicted and enhanced by increas- ing the proportion of contacts between residues distant in sequence [70] (Figure 1.3d). Thus, simulations are providing valuable insights into mechanistic details of folding, and so have potential to be a valuable tool to improve future designs and to assess the impact of designing functional features into idealized but function-less scaffolds [21,39, 46].
Avoiding unintended oligomerization
A critical area for further development is controlling the population of the most stable folded protein structure relative to alternative conformations, e.g. oligomers or aggregates, or functional states. Many current designs fail to express a soluble protein or the intended monomeric form, often forming oligomeric species or specific domain-swapped dimers [22,
23, 39, 45, 47]. To address the common and specific problem of domain-swapped dimers, which may arise from highly native-centric design approaches, Mou et al. used atomistic MD to redesign a domain-swapped dimer into the intended monomer [103] (Figure 1.3e). To combat the tendency of forcefields used in design to generate hydrophobic patches on the protein surface — leading to unwanted oligomerization — forcefields have been re- parameterized to penalize such patches; this has improved the solubility of designs [104] (Figure 1.3f). Similarly, the design of high net-charge surfaces has resulted in increased protein solubility [105]. Aberrant oligomerization is a prevalent problem in design and these approaches may be widely applicable moving forward.
Figure 1.3: Optimizing designs. Energy profiles for initial (orange lines) and optimized (blue lines) protein designs; in each panel the folded state is at the right (and shown as structures) relative to the denatured state (D) at the left, separated by the transition state (‡). Optimization of features (blue parts of structures) of the initial designed or natural proteins (orange) can be designed by stabilizing the native state using CPD to (a) improve packing efficiency to eliminate voids (residues shown in space filling representation) [87], or (b) generally improve such properties as: polar contacts, sterics and backbone angles, among others, to improve energetics or reduce unwanted flexibility. Coarse-grained simulations of the entire energy landscape may also be used to (c) optimize electrostatic interactions [100] or (d) modulate topological complexity to control kinetic stability by tuning the energy barrier for unfolding [70]. Atomistic molecular dynamics can be employed to (e) eliminate unwanted oligomerization caused by local opening/domain swapping [103]. Aggregation of (f) exposed hydrophobic patches (orange) on designed surfaces may be eliminated (blue) by adding penalizing parameters to existing forcefields [104].