• No results found

Weights for optimized ff03/HB force field

DEVELOPMENT OF A PHYSICS BASED FORCE FIELD FOR THE SCORING AND REFINEMENT OF PROTEIN MODELS

3.3 Results and Discussion

3.3.6 Weights for optimized ff03/HB force field

Among many sets of weights obtained during the optimization procedure that minimize the target function F, Eq.3.2, the best performance in decoy scoring showed the sets with some of the weights being negative for both the ff03 (Table 3.3, Wgt-0) and ff03/HB (Table 3.3, Wgt-1) force fields. The performance of these weight sets was discussed above. In the best weight set for the ff03/HB potential (Table 3.3, Wgt-1), the van der Waals, short-distance van der Waals, and hydrogen bond (HB) energies have positive and relatively large weights. The remaining weights, of the dihedral (DIH), electrostatic (ELE, ELE1-4), generalized Born solvation (GB), and surface area (SA) energy terms have negative signs. The occurrence of the negative weights for these terms indicates that they are not individually useful in generating a funnel-like shape of the potential. By assigning negative, nonphysical weights, the optimization procedure creates a linear combination of the energy terms that has larger correlation with native-likeness than the individual components. Although there is no reason for any energy component alone to have a correlation with native-likeness, and while this correlation is expected for the total energy, analysis of such individual correlations can help us to interpret the meaning of the weights in the optimized potential. In the case of the dihedral energy, the negative weight most likely reflects its initial anti-correlation with TM-score (Table 3.1, DIH). As discussed earlier (see section “Correlation of energy with native-likeness in the original ff03 force field”), the weak anti-correlation of the dihedral energy relects distortion of the backbone torsional angles in the ff03 force field from their equilibrium values, especially for near-native and native conformations. This effect was also noticed earlier by comparison with high level quantum mechanical calculations for short

polypeptide helices and strands 26. These results suggest that the dihedral parameters may need reoptimization to better describe near-native and native conformations.

Table 3.3 Relative weights* of energy components for the optimized force fields.

ff03 optimized† ff03/HB optimized‡ ff03/HB reduced optimized§ Wgt-0¶ Wgt-1¶ Wgt-2|| Wgt-3** Wgt-R|| DIH§§ -1.25 -1.17 -0.32 0.28 -0.42 VDW¶¶ 1.00* 1.00* 1.00* 1.00* 1.00* VDW1-4|| || 1.04 0.88 0.56 0.56 4.33 ELE*** -0.27 -0.40 -0.25 0.03 0 ELE1-4††† -0.16 -0.23 -0.22 0.17 0 GB‡‡‡ -0.22 -0.23 -0.14 0.18 0 SA§§§ -0.51 -2.07 0.14 3.39 0.51 HB¶¶¶ 0 6.25 1.32 2.56 4.26

*All weights were scaled so that the weight for van der Waals energy is equal to 1,

for easier comparison, weights for bond and angle energy terms were set to 0, and are not presented; † optimized original ff03 force field, ‡ optimized ff03/HB force field (ff03 with added hydrogen bond potential), § optimized reduced ff03/HB force field (ff03 with added hydrogen bond potential, and with electrostatic (ELE and ELE1-4) and GB solvation energy components turned off), ¶ Wgt-0 and Wgt- 1 – the best weight set for ff03 and ff03/HB potentials respectively (no restriction on the sign of the weights), || Wgt-2 and Wgt-R - the weight sets with allowed negative weights for dihedral (DIH), and for Wgt-2 also electrostatic (ELE, ELE1-4), and generalized Born solvation (GB) energies, ** Wgt-3 - the weight set

with all the weights positive, §§ DIH – dihedral angle energy, ¶¶ VDW – van der Waals energy, || || VDW1-4 – short distance van der Waals energy (for atom pairs separated by less than four bonds), *** ELE – electrostatic energy, ††† ELE1-4 – short distance electrostatic energy (for atom pairs separated by less than four bonds), ‡‡‡ GB - generalized Born solvation energy, §§§ SA – surface area dependent solvation energy, ¶¶¶ HB – hydrogen bond energy.

The electrostatic (ELE, ELE1-4) and GB energies are completely uncorrelated with TM-score and their weights are relatively small; therefore, their sign does not have much physical meaning. As expected, the weights of the energy terms that had initial low correlation coefficient (ELE, ELE1-4, GB) are relatively smaller than the weights of the terms showing larger initial correlation of energy with TM-score (VDW, SA, HB, VDW1-4, DIH).

The negative weight for the surface area energy (Table 3.1 and Table 3.3, SA) is partly an artifact of the optimization procedure and also reflects the weak average correlation of the SA energy with TM-score for our decoy set (CC = 0.36). The SA energy landscape is flat for a wide range of native similarity up to a RMSD from native > 8 Å, reflecting the low dependence of our decoy set on the radius of gyration and compactness of the decoys. Only in the near native region does the SA dependent energy component have a noticeable correlation with TM-score. The values of the SA energy are small compared to other energy components (roughly two orders of magnitude smaller than the electrostatic energy) and assigning it a negative weight probably helps balance some deficiencies of the correlation of the other energy terms. For a physical potential, we require a positive weight for the SA energy term, since it represents the hydrophobic energy, and it should energetically favor the transition from the unfolded conformation to a more globular one, not the opposite. Including more unfolded decoys in the force field optimization process should help to obtain a positive weight for the surface area dependent energy term.

Although the linear combinations of the components with some negative weights may produce a potential that correctly scores compact decoy structures, such a potential may not be useful for applications associated with the generation of the new structures (e.g. the refinement of the protein decoys). The most important future goal is to use the optimized force field for the refinement of protein decoys. For this purpose, we need a potential with the smallest number of negative weights but which still performs very well in decoy scoring and with a good energy - native-likeness correlation. Restricting the more weights to positive values decreases the performance of the potential, therefore we also chose the weight set that is a compromise between the number of negative weights and the performance, set ff03/HB Wgt-2. In Table 3.4 we show comparison of its performance with the best unrestricted ff03/HB Wgt-1 set and with the ff03/HB potential optimized keeping all weights positive, Wgt-3. Set Wgt-2 is the best performing weight set under requirement that the weights of the van der Waals (VDW), short-distance van der Waals (VDW1-4), surface area (SA), and hydrogen bond (HB) energies are positive (Table 3.3, Wgt-2). The negative weights for the electrostatic energy (ELE, ELE1-4) and GB solvation (GB) have no physical meaning, because these energy components are uncorrelated with native-likeness and their relative weights are small. We also allowed small negative values of the weights for the dihedral angles energy, because this energy is weakly anti-correlated with native-likeness and can be partially compensated by long and short-distance van der Waals interactions. As shown in Table 3.4, the performance of the Wgt-2 set is slightly worse compared to the Wgt-1 set; however, it is still significantly better that for the unoptimized force field (see Table 3.2, ff03/HB). The average correlation coefficient between the energy and TM-score, CCave, is 0.61, the percent of

proteins with a significant correlation coefficient, CCfr, is 0.54, and the ability to indicate

the native structure (TMfr) and native cluster (RMSDfr) remains very high, above 0.90.

These results mean that using the ff03/HB Wgt-2 potential should allow for the refinement of decoy structures for about 54% of proteins.

Table 3.4 Comparison of scoring performance of the ff03/HB optimized force

fields with different weight sets.

Wgt-1* Wgt-2† Wgt-3‡

Train§ Test¶ Set58|| Train§ Test¶ Set58|| Train§ Test¶ Set58||

CCave** 0.67 0.64 0.65 0.65 0.59 0.61 0.62 0.55 0.57

Z-scoreave†† 2.59 2.19 2.29 2.49 1.86 2.02 2.12 1.49 1.65

CCfr‡‡ 0.60 0.65 0.64 0.73 0.47 0.54 0.60 0.42 0.47

TMfr§§ 1.00 0.86 0.90 1.00 0.86 0.90 0.73 0.72 0.72

RMSDfr¶¶ 1.00 0.88 0.91 1.00 0.88 0.91 0.80 0.79 0.79

* Wgt-1 - the best weight set (no restriction on the sign of the weights), Wgt-2 - the weight set with allowed negative weights for dihedral (DIH), electrostatic (ELE, ELE1-4), and generalized Born solvation (GB) energies, ‡ Wgt-3 - the weight set with all the weights positive, § Train - training protein set, ¶ Test - testing protein set, || Set58 – the entire set of 58 proteins, ** CCave - average

correlation coefficient of the energy with TM-score, †† Z-scoreave - average Z-

score between native cluster and the remaining decoys, ‡‡ CCfr - fraction of

proteins with correlation coefficient of energy with TM score greater than 0.6, §§ TMfr – fraction of proteins for which the lowest energy structure had the TM-

score to the native state greater than 0.90, ¶¶ RMSDfr - fraction of proteins for

We also analyzed the performance of the force field containing only positive weights. The best potential with all the weights positive, ff03/HB Wgt-3 performs slightly worse than the Wgt-1 and Wgt-2 potentials (see Table 3.4, column Wgt-3). The average correlation coefficient between the energy and TM-score, CCave, is 0.57, the

percent of proteins with a significant correlation coefficient, CCfr, is 0.47, and the ability

to indicate the native structure (TMfr) and native cluster (RMSDfr) is still good, above

0.70. These results mean that using the ff03/HB Wgt-3 potential should allow for the refinement of decoy structures for about 47% of proteins. The weights for this potential are listed in Table 3.3, Wgt-3.

Comparison of the performance of the Wgt-1, Wgt-2, and Wgt-3 potentials is shown in Figure C.3 (Appendix C). The CC is still improved for the great majority of proteins, and the CC distribution is shifted toward the significant values for both Wgt-2 (Figure C.3, A’ and B’) and Wgt-3 (Figure C.3, A” and B”), compared with the unoptimized ff03/HB potential. The Z-score improved and is positive for all the proteins for both sets (Figure C.3, C’ and C”). The scoring of the native structure and of the native cluster for the Wgt-2 is as good as for the Wgt-1 (Figure C.3, D’ and E’), and becomes a bit worse for Wgt-3 (Figure C.3, D” and E”).

Related documents