Fig. 2 Proposed topology models. a The earliest proposed topology was based on a bioinformatics investigation, which concluded that the enzyme had seven membrane helices, a cytoplasmic N-terminus and an activesite facing the cytoplasm 61 . The putative activesite included the BacA2 sequence motif while the conserved BacA1 region mapped to the ﬁ rst membrane helix and was proposed to be involved in catalysis and in substrate speci ﬁ city. b The second proposed model relied on bioinformatics, modeling, and functional and site-directed mutation work employing constructs of BacA fused with bacteriorhodopsin and green ﬂ uorescent protein 19 . This led to the conclusion that the protein had eight membrane helices with both N- and C-termini in the cytoplasm and an outward facing activesite. c The third model involved an experimental investigation focussed on biochemistry with functional and mutational studies employing hybrids prepared from C-terminal truncations of BacA fused to β -lactamase 18 . This facilitated in vivo screening based on ampicillin resistance or susceptibility depending on which side of the membrane that β -lactamase resided. The model to emerge from this study had seven membrane helices with the N-terminus in the cytoplasm and the activesite facing the periplasm. d The latest model was based on residue-residue co- evolution constraints applied in the Rosetta structure prediction program to generate a structure of BacA 24 . The modeled protein includes a pair of inverted helices and has twofold pseudosymmetry as observed in the structure determined experimentally in the current study. The membrane orientation in d was not de ﬁ ned. Putative catalytic residues Ser27 and R174 are indicated by red and blue dots, respectively
Trichoderma spp. are considered as efficient biocontrol agent that can significantly reduces the growth of several soil borne and plant pathogens such as Rhizoctonia solani, Sclerotium rolfsii, Phythium aphanidermatium, Fusarium oxysporum in several mode of action. Trichoderma harzianum is one the important species in genus Trichoderma, which is capable of producing several effective lytic enzymes and antifungal antibiotics that compete to other fungal pathogens and promotes plant growth. The aim of present study is to predict and analyze the tertiary structure and their potential binding sites through bioinformatic tools and techniques. The protein sequences of enzymes were retrieved from UniProt database, followed by modelling of tertiary structure by Swiss-model Workspace and further validated using PROCHECK server which showed that from 75% to 90% residues are in favored region of Ramchandran Plot. These different validation steps proved that the predicted models are stable and it will provide an insight to its functional aspect which is based on tertiary structures. Furthermore, functional and conserved motifs were predicted through PROSITE database. These findings allow us to determine the protein families and domain which remain conserved throughout the evolution which may act as inducing or suppressing the biological activity of protein. Ligand binding site of enzymes has been predicted using SiteHound server by using four different chemical probes which allow us studying different ligand binding site. Thus, this study supported a scientific base for 3D structure modelling of lytic and defence enzymes and opens the new opportunities for further investigations in biological control of phytopathogens.
The Li et al method  uses machine learning on fea- ture vectors derived from sequence windows. These data vectors include amino acid properties (from AAIndex [114,115]), conservation data, solvent accessibility values, and structural information. First, for every residue in each protein, a peptide is extracted with the residue in question serving as its center, accounting for the local environ- ment of each amino acid. The peptides are labeled as active or inactive based on the label of their central residue and filtered for homology. Then, large feature vectors are extracted to represent each peptide, by extracting and concatenating a variety of attributes per residue, in the order of the peptide. Their dimensionality D is given by D = NL, where N is the number of residue-level fea- tures and L is the length of each peptide (here, 34 and 21, respectively ). To remove irrelevant and redun- dant features from this high dimensional feature space, the MRMR-IFS feature selection procedure is applied (see Feature Selection). In the reduced space, the attribute vec- tors of the peptides are then used to construct a Random Forest (RF) classifier , an ensemble learner based on combining the output of multiple decision trees.
The proximity of Arg515 and Gln518 to the dimeriza- tion interface suggests that these residues may play a role in oligomerization. Oligomerization of BChE is known to play an important role in the turnover of BChE in se- rum. However, these two residues are not conserved in BChE from other species. Gly365 is at the end of a short helix. It is quite far from the cavity defining the activesite and is close to the surface. Asanuma et al.  have modeled the structure of hBChE based on the structure of AChE cholinesterase and based on this model they have suggested the role of steric effects in reduced activity of G365R and other mutants. Prediction of thermal stability of the mutant G365R does not indicate that this mutation will be strongly destabilizing. However, the model for tetramer based on molecular dynamics calculations  indicates that this residue is in a helix which has close contacts with another monomeric unit. Therefore, the loss of activity in the G365R mutant can be explained by the inability of the mutant to adopt the native oligomeric form. Mutation of Thr250 to Pro produces a silent phe- notype, which can be explained based on the predicted loss of thermal stability (Table 2). However, Thr250 is not conserved in BChE of other species. Lys267 occurs in a loop on the surface of BChE. This residue is far from the activesite. It is spatially close to the terminus of a helix and may play a role in its stabilization. Also, the amino group of lysine side chain may form hydrogen bonds to some neighboring residues, such as Thr250. The prediction of thermal stability indicates that this mutation can be slightly destabilizing. However, its dRMS value (0.41A) is higher than average, indicating a possible role in long range perturbation of the activesite geometry. K267R is listed as a silent mutation in the Esther data- base. However, examination of the data in the paper cited in Esther  indicates the possibility that the mutation of interest, K267R, may have low activity. Glu255 occurs in a loop on the surface of BChE and its side chain points away from the rest of the protein. Glu255 is not con- served in BChE sequences of other species. In addition, the predictions of thermal stability (Table 2) indicate this mutation would have little or no effect on the thermal stability of the monomer. The computed model of the tetramer of BChE indicates that Glu255, Thr250 and Lys267 are in close proximity to another monomer in the tetramer. This indicates a possible role for these three residues in the stabilization of the tetramer. Failure to form a tetramer may lead to rapid clearance of the
salinity, Sypro-Orange (Life Technologies, Grand Island, NY) was diluted 1:1000 in 1 mL of a 2 mg/mL VvDxr stock. The ninety-six buffer screen created in our lab contained conditions with pH increasing from 4.0 to 9.5 and increasing NaCl concentrations from 0.0 to 1.0 mM. The screen was transferred into their respective wells of the BioRad (Hercules, CA) Hardshell 96-well PCR plate along with VvDxr to make a 1:1 Sypro-Orange/VvDxr to buffer. The plate was then loaded into the BioRad (Hercules, CA) CFX96 real-time PCR machine and heated at 2°C increments per minute, starting at 30°C and ending at 90°C. As the VvDxr is heated, Sypro-Orange binds to the hydrophobic ends and fluoresces. The melting temperature of the protein is determined by the temperature at which the probe fluoresces. Dr. Nicholas Mank (unpublished) developed a protocol for interpreting protein melting using the derivative results calculated by the CFX96 software. VvDxr stability was also tested by the same method previously described using small molecular compounds. The compound screens were developed by solubilizing the compounds before putting them in Tris buffer at pH 7.5. The result is three hundred and eighty-four conditions divided into four separate screens. Appendix I gives a list of these conditions.
this Mintseris dataset there were some protein complexes that have multiple chains such as 1qfw AB:IM. But for our study we have taken only the first chains such as 1qfw A:I and discarded the other chains. We have generated different number of features. In the Table:5.7 6 features means NOXClass  proposed features. Then we have added 20 number based amino acid composition of the interface features that resulted in 26 features. After this we have added area based amino acid composition of the interface features that resulted in 46 features. In the Table:5.7 it is shown that for 6 features FDA with a Quadratic classifier achieved the highest accuracy 77.96%, with 26 features CDA with linear classifier achieved 77.54 % accuracy. The slight decrease in accuracy is reported while adding number based amino acid composition features. This is expected because these 20 features are very small integer numbers that lower the column rank of the feature vector because by adding these features increase the dependency of columns among each other. This phenomenon results in slightly lower classification accuracy. For 46 features HDA and CDA with a Quadratic classifier achieved the highest accuracy of 79.25%. We have seen 1.29% increase in accu- racy from 6 features to 46 features. From these results it can be concluded that our proposed 40 features have helped to predict obligate and nonobligate protein-protein interactions with higher accuracy.
Docking programs are extremely demanded for the computer aided structural based drug design [78, 80]. The latter is used at the initial stage of the long (10–15 years) and expensive (> 1 billion USD) rational drug design pipeline. This initial stage is the cheapest of all the following stages: preclinical testing (animals), three phases of clinical stages (humans), approving and post approving stages, but the initial stage plays the key role in the whole pipeline success defining the diversity of the active compound–candidates, their selectivity and low tox- icity. The main idea of the rational drug development is to find a compound, the molecules of which bind specifically to a definite region (activesite) of the given bio–molecule, e.g. a protein, responsible for the disease progression, and stop or change the latter. The larger this binding energy is, the lesser concentration of the active compound should be used to achieve the desired effect. Docking programs carry out positioning of the molecules–candidates (ligands) in the def- inite site of the target protein and calculate the protein–ligand binding free energy  which is directly connected with the measured binding (dissociating) constant of the active compound. There are several dozen of docking programs and a dozen of Internet docking sites available either for free or on a commercial basis [14, 66]. Despite the obvious progress of the docking technique and plenty of success stories, there are still various challenges [9, 96, 107]. While in many cases positioning accuracy of docking is satisfactory, the accuracy of binding energy cal- culations is insufficient to perform the hit–to–lead optimization on the base of docking results. The increase of docking accuracy should result in an increase in the effectiveness of the whole process of new drug development .
Multiple interactome analysis is another popular way to integrate the knowledge derived from heterogeneous protein or gene networks. In a simi- lar attempt  integrated protein-protein interactions and gene expression data are derived from literature and public databases. It started with data related to three existing viruses (SARS-COV, MERS-COV, HCOV-229E) to infer the interactome of SARS-COV-2 . It also integrated an additional PPI database to reconstruct the action of SARS-COV-2 on the proteome level, obtaining a network consisting of 13,020 nodes and 71,496 interactions. In parallel, the authors inferred a gene co-expression network using Random walk with restart (RWR) algorithm and using S-glycoproteins of SARS-COV, MERS-COV, and HCOV-229E as seeds. Similarly, to build the SARS-COV- 2 interactome phylogenetically close HCOV–host interactome network was built by assembling four known HCOVs (SARS-COV, MERS-COV, HCOV- 229E, and HCOV-NL63), one mouse MHV, and one avian IBV (N protein). As a novel attempt , the codon usage pattern is used to infer possible interactions between 26 SARS-COV-2 proteins and selective host proteins involved in 17 major cell signaling pathways. They used the RSCU score as a measure of codon usage bias to assess proximity between a pair of host and viral proteins. MAPK pathway is highlighted as the worst affected pathways during COVID-19 .
Lysine residues play an important role to maintain protein structure and function, through hydrogen bonds or electrostatic interactions (Patel et al., 2011). The role of PDI activesite-flanking lysines has not been well studied. In fact there is only one study to our knowledge, which has been conducted on the importance of this residue in PDI (Kimura et al., 2004). In the earlier study, the researchers mutated away the cysteine residues in the a’ domain and focused on the lysine in the a domain (Kimura et al., 2004). The lysine residue in the a domain was then mutated to glutamine and arginine, where a decrease in activity was observed (Kimura et al., 2004). The assay used in this study involved the reduction of insulin by PDI, which was monitored by spectrophotometry. However, this study or any other studies to our knowledge have not delved into how lysine residues affect the redox activity of PDI as well as how they may be used to regulate PDI activity.
tyrosine kinase, is frequently over expressed in non-small cell lung cancer (NSCLC). These receptors play an important role in tumor cell survival and activated phosphorylated EGFR results in the phosphorylation of downstream proteins that cause cell proliferation, invasion, metastasis, and inhibition of apoptosis. Expression appears to be dependent on histological subtypes, most frequently expressed in squamous cell carcinoma but also frequently expressed in adenocarcinomas and large cell carcinomas . The protein and ligand interaction takes an important part in protein function. Both ligand and its binding site are essential components for understanding how the protein ligand complex functions. Most cancers are highly invasive and there are problems of recurrence even after surgery, chemotherapy and radiation treatment. MMPs are a family of highly homologous metal dependent endopeptidases that can cleave most of the constituents of the extracellular matrix such as collagen, fibronectin, laminin and elastin  and are inhibited by endogenous tissue inhibitor of metalloproteinases (TIMPs) or synthetic inhibitors such as EDTA and phenanthroline. Comprehension of the exact mechanisms involved in MMP activity has been complicated by the differing expression patterns and roles of these proteases within the tumor . Further complicating the situation, these enzymes have overlapping substrate specificities  creating difficulty in designing appropriate inhibitors for only one protease . In addition, the MMPs are present at globally low concentration, but they are concentrated on the surface of cells at highly elevated and activated concentrations. Cancer usually is the cause of the altered interaction between the multiple genes rather than changes in a single causal gene  and the functional interactions predict the priority of the highly connected nodes and its neighbors .
We generated fixed-length protein sequences using window size 9, which have phospholyratable re- sidues (Serine, Threonine, or Tyrosine) at the center of them. If the center residue of the sequence is known as phosphorylated, the sequence is “positive”, otherwise “negative”. For positive and negative se- quences, redundant ones were removed using skipredundant . The parameters for redundancy remov- al are as follows: acceptable threshold percentage of similarity was set to 0% - 20%, value for gap opening penalty to 10, and gap extension penalty to 0.5. Table 2 shows the number of positive and negative se- quences before and after removing redundant sequences for each residue.
Out of all the genes that have been found over ex- pressed in Hepatocellular Carcinoma, two genes namely Ube2c and Gankyrin are of utmost importance. Ube2c gene, located on chromosome 20q13, belongs to the E2 gene family and codes for a 19.6 kDa protein involved in ubiquitin-dependent proteolysis . Gankyrin was iden- tified as the p28 component of the 26S proteasome [6-8]. Since gankyrin is involved in the ubiquitylation of tumor suppressor protein p53, gankyrin promises to be a poten- tial target for drug therapy against liver cancer Hepato- cellular Carcinoma. Gankyrin is identical to one of the component of the 26S proteasome. It contains an ankyrin repeat stack (6 repeats) with a 38-amino-acid N-terminal domain and the first letter “g” stands for “gann”, mean- ing cancer in Japanese . It specifically interacts with the S6b ATPase of the 19S regulatory complex of 26S proteasome [7,8]. Gankyrin is highly conserved through- out evolution and is localized on human chromosome Xq22.3. Gankyrin expression is increased in all HCC’s compared with non-cancerous liver tissues . Hence it was found that almost all HCC’s over-express this novel gene, gankyrin. Gankyrin binds to and potentiates the transcriptional activity of p53. It facilitates the binding of MDM2 to p53 and enhances ability of MDM2 to mono- and poly-ubiquitylate p53. Further, gankyrin recruits a MDM2-p53 complex to the 26S proteasome and acceler- ates the degradation of p53 in an MDM2-dependent manner. In vitro, MDM2 catalyzes the addition of single ubiquitin moieties to a cluster of six C-terminal lysines in p53. MDM2 does not efficiently poly-ubiquitylate p53 under usual in vitro conditions, and the E4 activity of p300 is required in addition to MDM2 for p53 polyubi- quitylation.
performed through computer programs like Autodock , arguslab and discovery studio3.1. 25-26 All these molecules were taken from ligand database or draw with help of chemical organizer(draw) software like chemdraw ultra 2d & 3d in mol or pdb format and were stored in a database of MOE in mdb format or Pubchem database. All these molecules were docked against the same pocket where reference drug bound. Molecules were selected from a library of molecules and were further assessed by the interaction analysis. Finalized molecules showed the interactions with the active residue and with other residues. 27
Homology, in the context of biology, denotes a relationship of similarity between two objects that arises through evolutionary history. Our forearm bones are homologous with the forearm bones of chimpanzees. Genes from humans that encode the same gene product as a related gene in E. coli can be homologous. Proteins that are nearly identical can be homologous depending on the database definition of homology. While two proteins are either homologous or not, there is often be a measure of dissimilarity between homologous proteins. In reference to this project, the databases referenced contain designations for proteins based on 3D homology rather than sequence homology. When looking at the amino acid sequence of a protein, it is tough to determine true relationships until you get to a 3D view. As such, this project is
The prediction of acceptor splice sites is, indeed, a difficult task. There are certain characteristics, however, that have recently been observed which may offer some assistance. For example, the first AG dinucleotide downstream of the BPS is usually part of the actual acceptor splice site [6,7]. Also, a search does in fact take place in the absence of the authentic acceptor splice site and those candidates that are more distantly located than others compete less efficiently . These findings have helped to shape a hypothesis that a scanning
Sequence alignment is a way of arranging the protein sequences used to recognize the area of similarity that may be a consequence of functional, structural, or evolutionary relationship between the sequences. For homology, modeling template selection and sequence alignment between the target and template are one of the main criteria. Hence, to select the template PSI - BLAST was performed. Based on the BLAST results, the PDB id: 4PYP which has 65% of similarity with target protein and has the e-value of 0.0 (0.0 indicated good similarity with template) was selected as template for further analysis. Alignment between selected target and template was carried out using Clustal Omega. The alignments which included the residues that were conserved in both the template and query sequences were shown in Figure 1. The identical residues between the query and the template sequences were exposed with the same color.
Figure 2 Overexpression, purification, functional activity and pH dependent structural changes of EhPSAT. (A) SDS-PAGE analysis of E coli lysate over expressing EhPSAT and the purified protein. Lanes 1-4 represent molecular weight markers, uninduced culture, induced culture and purified protein, respectively. (B) pH-dependent enzymatic activity profile of EhPSAT. The data has been represented as percentage relative activity with highest activity observed at pH 8.5 taken as 100%, each point representing mean ± SD of three independent measurements. (C) pH induced changes in the secondary structure of EhPSAT. The effect of pH on the CD signal at 222 nm. The inset shows far-UV CD spectra at pH 6 (solid line), 7 (dashed line), 8 (dotted line) and 9 (dash-dotted line), respectively. (D) and (E) pH-induced changes in PyP and Trp fluorescence polarization, respectively. In both the panels the filled and the open symbols represent protein samples in absence and presence of 200 mM NaCl, respectively. (F) pH induced changes in fluorescence emission spectra of EhPSAT excited at 295 nm. The curves 1-4 represent protein samples incubated at pH 6, 7, 8 and 9, respectively.
atom, accurately recapitulating the experimentally-observed substrate binding orientation in the TxtC-thaxtomin D structure (Figure S12). In another state, arising from rotation about the β- γ bond of the 4-nitrotryptophan residue, the diketopiperazine is flipped such that the phenyl group is pointing towards the roof of the activesite (Figures 8 and S12). This positions the -hy- drogen atom of the Phe residue in close proximity to the heme iron atom and it was found to adopt a near attack conformation in 6% of the frames in MD simulations (Figures 8 and Table S9). No interconversion between the different substrate binding orientations was observed in the MD simulations. Close exam- ination of the substrate binding site in TxtC indicates that a ste- ric clash between the side chain of the Phe residue in the sub- strate and the heme / residues 231-236 in the I helix prevents this. The switch in conformation of the monohydroxylated in- termediate that enables hydroxylation of its phenyl group thus appears to require release from the activesite of TxtC.
To test if protein B2 is required for spherule formation, we first generated infectious FHV virions bearing a viral genome that did not express protein B2 (FHV ⌬ B2) due to two previ- ously documented mutations (Fig. 4A) (8, 31) that change the B2 initiating methionine to a threonine (M1T) and serine 58 to a stop codon (S58stop). These nucleotide substitutions could be made without affecting the protein A amino acid sequence because B2 is in a ⫹ 1 reading frame with respect to the protein A ORF. FHV RNA1 with these B2 mutations and WT RNA2 were in vitro transcribed and cotransfected into Drosophila cells whose RNAi machinery was inhibited by depleting the RNAi effector Argonaute2 (ago2) by RNA silencing, which allows FHV RNA replication in the absence of protein B2 (31). FHV ⌬ B2 virions were collected from the transfected cells, and titers were determined on ago2-silenced Drosophila cells. Drosophila DL-1 cells were used for these experiments because their extensive lysis provides a more accurate titer by plaque assay than S2 cells (13).
Various proteins were downloaded from the Protein data bank PDB for standard bioinformatics (RSCB) that contains various X-ray crystal structures for proteins and other macromolecules. Then it was corrected by addition of missing hydrogen, atoms and incorrect bonding types and the charges were balanced. 2. Ligand prepration