• No results found

ENZYME 33 database is a repository of information related to the nomenclature of

4.5 Improvements in firestar algorithm

4.5.6 CASP10 experiment

4.5.6.2 Evaluating problematic targets

According to our results, firestar performed poorly (MCC below 0.7) for six targets: T0659, T0661, T0694, T0715, T0720 and T0732. We revised these predictions in order to identify possible improvements and limitations of our algorithm. In table 11 we show how many servers predicted the sites and which was the best MCC obtained.

Target ID Ligands ID Site size firestar MCC Predictions Server Best MCC

T0659 ZN 3 - 4/11 0.455 T0661 PEF 31 0.572 6/11 0.623 T0694 6KY-HIS 8(6) 0.660 11/11 0.797 T0715 NAD 26 0.505 10/11 0.866 T0720 SF4, MN 13 0.542 8/11 0.608 T0732 5GP 10(4) 0.662 10/11 0.665

Table 11 List of targets where firestar algorithm obtained a MCC < 0.7. Next to firestar MCC, the total number of server groups that sent a prediction and the overall best MCC are reported.

We present here the analysis of two target predictions, T0659 and T0720. They are particularly interesting for us because they are metal binding sites. As we detailed previously, this type of sites present higher conservation due to their size and their specific features. firestar usually performs very well: over the other 7 metal binding targets it had a 0.960 mean MCC.

T0659 (PDB code 4ESN) is a hypothetical protein from Ruminococcus gnavus (PDB code 4ESN) that contains a novel ZNbinding site, constituted by three cysteines. While no sites pass the reliability threshold we established for CASP, actually firestar detected the conserved cysteines. The ZNbinding site has either diverged from a FE2/S2 inorganic cluster (FES) binding site or converged on the same residues as figure 28 shows. Almost all templates found by firestar bound FES using the same ligand binding residues that formed the novel ZNin T0659. Two of the three cysteines in the ZNbinding site were very conserved while the third cysteine only appeared in a few alignments. However, firestar did not predict residues since the binding site was treated as an FES

binding site. The average size of the FES binding sites is 9 residues, and firestar discarded the binding residues detected in those alignments (figure 28) because the coverage of the FES binding residues was 33% or lower. A human predictor would have noticed the FESconservation footprint using the information generated by firestar possibly made a prediction, but target T0659 was a server-only target.

Figure 28 Extended output from the analysis of target T0659 from the CASP10 experiment. These are the templates found by for the target sequence, first line of each alignment. The three cysteines that constitute the ZNbinding site in T0659 are shown in the red and yellow boxes. All the aligned templates bind FES (FE2/S2 cluster). Cysteines 43 and 63 (red highlights) are conserved in all the alignments, while the position corresponding to cysteine 48 (yellow highlights) binds FES in the majority of templates, but it is usually a different amino acid.

These results suggest that this could be a case of divergent evolution: the binding site specificity drifted and lost the ability to bind iron-sulfur, while maintaining the capability to bind a metal. Or it could be exactly the opposite: it was originally a ZN binding site that gained the ability to bind iron-sulfur. Thanks to this target we decided to work on a possible improvement of the algorithm. When it is not able to generate a canonical consensus prediction but detects conservation for binding site residues in complex with biologically relevant compounds, firestar should now report these residues without predicting the possible ligand.

The other interesting case is T0720, a CRISPR-associated exonuclease Cas4 from

Sulfolobus solfataricus. The protein was co-crystallised with manganese and an iron-sulfur

cluster (SF4), and it had very few remote homologs in the PDB (only 9 unique hits from the HHsearch/PSI-BLAST analyses had functional information). While the conserved manganese binding site was predicted, missing only histidine 62, information for the other site was insufficient and noisy and the filters discarded all the templates. Furthermore the

site was split, meaning that 3 of the 10 binding residues are around sequence position 35 while the rest are located around position 182: no alignment spanned all 202 residues of the target. This was a difficult prediction; in fact firestar had the 3rd best server result. In

our human prediction, using firestar server output, we included four cysteins of the SF4 binding site based on the conservation detected in the extended firestar output pages, as shown in figure 29.

Figure 29 Extended output from the analysis of target T0720. These are 2 template fragments found only in the HHsearch analysis that support functional information for three of four cysteines included in our human prediction. Even though both bind SF4 (FE4/S4 cluster), only part of the site is conserved.

The other 4 problematic cases are similar: these proteins contain large binding sites, and the information coming from FireDB is insufficient or noisy. One good example is T0715: even though there are many templates containing NAD in the PDB, no close homolog was found. The occurrence and conservation filters discarded all but the core binding residues, so firestar lost more than 50% of the site.

All servers had the same problem with T0732: few remote templates were rescued, the majority bound adenosine monophosphate (AMP). 5GP binding proteins were also present, but insufficient to allow firestar to focus on the specific functional residues.

For T0694 and T0661 no templates containing the bound ligands were found. In fact firestar was not able to predict the compound in the binding site and prediction was based only on residue conservation.