Selection of crystallographic structures and generation of model structures used for predictions of substrate stability

tolerance of HIV-1 Protease

4.2. Computational prediction of HIV-1 mutational frequencies based on structural and functional constraints

4.3.3. Selection of crystallographic structures and generation of model structures used for predictions of substrate stability

16 dimeric, crystallographic structures crystallized with one of 7 endogenous peptide substrates were used for all calculations of substrate stability (Supplementary Table C.2). We generated structural models for each of the three peptide cleavage sequences without crystallographic structures (CTLNF-PISPI, PQITL-WQRPL, and VSFNF-PQITL) by computationally threading each peptide cleavage sequence onto each of the 16 crystallographic structures (sequence positions for which there was missing crystallographic density on any of the 16 peptide template structures were omitted), performing an initial round of side-chain minimization, and selecting the structural template for which the resulting Rosetta interface score (sum of score contributions over all pair wise interactions between residues i and j, where residue i was located on HIV-1 protease and residue j was located on the bound peptide) was lowest. 1MT9.pdb was found to be the best template for both peptides CTLNF-PISPI and PQITL-WQRPL, while 1F7A.pdb was selected as the optimal template for VSFNF-PQITL. Peptide interface scores for the resulting three models (-18.3 to -25) were within the range observed for 16 crystallographic peptides (-18.5 to -33).

4.3.4. Generation of backrub structural ensembles

Computational ensembles of “near-native” backbones were generated starting from one of 11 crystallographic structures originally crystallized with the native consensus sequences (1A8G.pdb, 1EBY.pdb, 1HXW.pdb, 1IZH.pdb, 1PRO.pdb, 1SBG.pdb, 1VIJ.pdb, 1VIK.pdb, 4PHV.pdb, 5HVP.pdb and 9HVP.pdb) by using a

previously described backrub protocol [33]. Briefly, the backrub protocol consisted of repeatedly selecting C" atoms of two residues (separated by 1-10 intervening residues),

performing a rigid body rotation of the selected protein segment (of up to 40 degrees), optimizing the location of related C# and hydrogen atoms, and accepting or rejecting the

backbone move based on the Rosetta scoring function and the Monte Carlo Metropolis criterion. Using the atomic coordinates of each crystallographic structure above as a starting conformation, 100 independent backrub simulations were run at two separate Monte Carlo temperatures (kT=0.6 and kT=1.2) until a total of 10,000 moves per simulation had been sampled. At each temperature, the lowest energy conformation sampled as well as the last conformation accepted during each simulation were saved and used to generate a computational ensemble of 400 backbone conformations per starting crystallographic structure.

Average RMSD values of backrub generated conformations relative to the starting crystal structures were dependent on the starting template and Monte Carlo temperature used, but typically and ranged, on average, from 0.2 to 0.6 for conformations generated at a kT of 0.6 and 0.3 to 0.8 for conformations generated at a kT of 1.2.

4.3.5. Energy Function and Preparation of Crystallographic Structures:

All computational calculations were performed using the Rosetta scoring function, which is dominated by attractive and repulsive Lennard-Jones interactions [29], an orientation-dependent hydrogen bonding term , and an implicit solvation model [30]. Simulations consisted of sampling and scoring side chains, taken from a rotamer library

the chi1 and chi2 side-chain torsion angles, [5] using a Monte-Carlo simulated annealing optimization protocol as described in [73]. Minimization and optimization protocols were with respect to the sum of the total score over all atoms within the dimeric HIV- protein (or dimeric HIV-protein substrate) complex.

In preparation for calculations of fold and dimer stability, all water molecules, heteroatoms, bound inhibitors or substrates and hydrogens present in the original 263 crystallographic PDB structures were removed, and hydrogen atoms were added as previously described [73]. An initial round of side-chain optimization was performed using the Rosetta scoring function as described above, keeping all amino acid identities and backbone coordinates fixed, while selecting for the optimal rotamer at each side- chain position from the rotamer set described above. After this initial minimization, all structures containing amino acid identities differing from the consensus HIV-1 subtype B sequence (see above) were computationally reverted to the consensus sequence and the structures were side-chain minimized a second time. In preparation for calculations of peptide binding stability, a protocol identical to that described above, except leaving all bound substrates present, was repeated separately for the 19 crystallographic and model structures listed in Supplementary Table C.2.

4.3.6. Selection of model parameters:

Optimal values for the six model parameters (WFOLD, WDIMER, WPEPTIDE,

FAVORNATIVE, FAVORPOLAR NATIVE, and PENALTYNATIVE POLAR, HYDRO) were selected

by simultaneously varying each model parameter (Supplementary Table C.3) and computing predicted amino acid frequencies over 97 (non-cysteine) HIV protease sites. For each combination of parameters, the 99 HIV-1 residue sites were computationally

classified as displaying either low (1-5%), medium (5-20%), or high (>20%) mutation rates and the number of residue sites correctly matching the experimentally observed mutation rates was calculated. The percentage of sites correctly determined for each bin was then averaged and used to determine a parameter set for both the “neutral” and “selective pressure” computational models (see Supplementary Figure C.1 and Supplementary Table C.3).

4.3.7. Evaluation of ROC curves and AUC values:

True positive rates (TPR) and false positive rates (FPR) of mutation recovery were calculated by using the parameter values determined above for a “neutral” and “selective pressure” computational model (Supplementary Table C.3) and considering all mutations computationally predicted to occur at frequencies greater than or equal to the following cutoffs: 10%,9%,8%,7%,6%,5%,4%,3%,2%,1%,0.5%,0.1%, and 0.01%. Mutations determined to occur at an experimental frequency of >0% (e.g., greater than a single occurrence), >=1% and >=5% were considered as true positives in constructing ROC curves. ROC curves were constructed for mutations observed within the HIV-1 Stanford HIV-1 database after treatment with 1-9 protease inhibitors (Figure 4.3). AUC values were calculated for each ROC curve by implementing the trapezoid method.

TPR and FPR rates were calculated for two null models by considering at each of the 99 HIV-1 residue sites (1) the set of mutations one nucleotide mutation away from the native codon and (2) the set of all mutations to amino acid types chemically similar to the native, grouped as follows: (A,G,P),(D,E,N,Q),(F,W,Y), (L,I,V,M),(R,K,H),(S,T), see

4.4. Discrimination between low, medium, and high frequency

In document Computational Protein Design with Multiple Structural and Functional Constraints (Page 117-121)