The scoring function in Rosetta contains a combination of terms that consider van der Waals interactions, solvation, hydrogen bonding, electrostatics, and sterics (Figure 2.1). Some of the terms are physically-based, such as the van der Waals energy, while the remaining terms are knowledge-based, derived from statistics gathered from the Protein Data Bank (PDB) (1). The expanded form of each energy term can be found in the Appendix. In a design run, all of the energy terms are evaluated for every candidate sequence and their sum becomes the final score for that sequence.
Eprotein = Wlj atrElj atr+Wlj repElj rep+WhbondEhbond+WsolvationEsolvation+WaaEaa+
WpairEpair+WramaErama+WrotamerErotamer −Wref erenceEref erence
2.1.1 van der Waals energy
Rosetta uses the standard 12-6 Lennard-Jones potential to model van der Waals interactions. The Lennard-Jones energy favors having atoms close to each other, but not so close that they clash. This term is important for ensuring well-packed, hydrophobic cores in designed proteins. In some applications, Rosetta uses a dampened Lennard-Jones potential, in which the exponential component of the potential is linearized. This modification helps alleviate large clashes that can results when designing with a fixed-backbone scaffold and discrete side-chain conformations(2; 3). Well depths for the potential are taken from the CHARMM19 parameter set(4; 5) and atom radii are taken from (6).
2.1.2 solvation energy
The solvation energy of a protein is the change in free energy observed when transferring the protein from a vacuum to solvent water. Generally speaking, an accurate solvation energy term favors the burial of hydrophobic surface area ensuring a mostly hydrophobic protein core. While burial of polar surface area is penalized, favorable electrostatics interactions can overcome the unfavorable energy of desolvating a polar group. The most precise way of calculating solvation free energy is through molecular dynamics (MD) simulations using explicit water. However, MD simulations take thousands of CPU hours for just one structure, thereby making it impossible to do design with. Instead, all protein design program use implicit solvent models, also called continuum models because they treat water as a continuous medium. The first implicit solvent models estimated the solvation energy by combining the solvent- accessible surface area (SASA) of every atom with a solvation parameter for that atom(7). The atomic solvation parameters represent the amount of energy per unit area a given atom type contributes to the free energy of solvation. More recently, continuum electrostatics models such as Poisson-Boltzmann (PB) and Generalized Born (GB), an approximation to the PB equation, have been incorporated into protein design programs(8; 9; 10). These methods only model the electrostatic contribution to solvation free energy, and therefore sometimes are augmented with a surface area term to model the nonpolar (or the solvent entropy) contribution to solvation free energy(11). Both of these methods are still very demanding computationally.
Furthermore, solving the PB equation analytically for proteins is not possible, so numerical methods must be used to obtain solutions.
Instead of using one of the above implicit solvation models, Rosetta uses the Lazaridis- Karplus, solvent-exclusion solvation model, also called EEF1(12). EEF1 estimates the solva- tion free energy by taking the solvation free energy of a group i in a fully solvent-exposed reference state and subtracting some energy to account for neighboring desolvating groups. The total solvation free energy for the protein is then obtained by summing over all groups in the protein. How much energy is subtracted from the reference state energy is determined by looking at how much volume is excluded by each neighbor j around group i. The model is parameterized so that the solvation energy of deeply buried groups is zero. The Lazaridis- Karplus method for calculating solvation energy is very fast because it does not require solving the Poisson-Boltzmann equation or calculation of the SASA.
2.1.3 electrostatics energy
Electrostatics are modeled in Rosetta using two energy terms: an orientation-dependent hydrogen-bond potential(13) and a residue-pair potential(14). Hydrogen bonds are impor- tant for the stabilization of secondary and tertiary structure in proteins and provide specificity to protein-protein interactions. The pair energy term captures electrostatic interactions not scored by the EEF1 solvation energy term. Both of these terms are knowledge-based terms derived from statistics of structures deposited in the PDB. The hbond potential is a linear combination of a distance-dependent energy term for the hydrogen-acceptor atom distance, and three angular-dependent energy terms. The angular-dependent energy terms capture preferences for the angle at the hydrogen bond, the angle at the acceptor atom, and the acceptor/acceptor-base dihedral angle (for sp2 hybridized acceptors). The pair energy term is based on the probability of seeing two amino acids close together after adjusting for the prob- ability of seeing those two amino acids in the given environment. The pair residue potential is only evaluated for polar residues.
2.1.4 torsional energy
Bonded atom interactions in Rosetta are evaluated by the ramachandran torsional energy term and a rotamer self-energy term. The ramachandran energy is the inverse log of the probability
of seeing specificφandψbackbone angles given a particular amino acid and secondary structure
(helix, strand, or loop). The rotamer self-energy term measures the internal energy of a side chain. More specifically, it measures the probability of an amino acid type in a specific rotamer
given specificφandψbackbone torsion angles, adjusted for amino acid frequencies in the PDB.
Rosetta uses the rotamer probabilities derived by Dunbrack and Cohen(15).
2.1.5 reference energy
The reference energy term in the energy function serves as a pseudo unfolded state energy and as a way to bias surface amino acid composition. The reference energies were originally parameterized to reproduce native protein sequences(16). The reference energies, which favor the placement of polar residues, offset the solvation energy term which, in general, favors the design of hydrophobic residues.