• No results found

HOMOLOGY MODELING OF PHASEOLIN FROM KIDNEY BEAN Phaseolus vulgaris L : ENERGY MINIMIZATION AND STRUCTURE ANALYSIS

N/A
N/A
Protected

Academic year: 2020

Share "HOMOLOGY MODELING OF PHASEOLIN FROM KIDNEY BEAN Phaseolus vulgaris L : ENERGY MINIMIZATION AND STRUCTURE ANALYSIS"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

HOMOLOGY MODELING OF

PHASEOLIN FROM KIDNEY BEAN

(Phaseolus vulgaris L.): ENERGY

MINIMIZATION AND STRUCTURE

ANALYSIS

Sudipta Mondal Department of Biotechnology

The University of Burdwan Golapbag, Burdwan-713104

West Bengal, India sudipta.mondal1988@gmail.com

Buddhadev Mondal Burdwan Raj College Burdwan-713104

West Bengal, India buddhadev.mondal26@gmail.com

Amal Kumar Bandyopadhyay* Assistant Professor Department of Biotechnology

The University of Burdwan Golapbag, Burdwan-713104

West Bengal, India akbanerjee@biotech.buruniv.ac.in

Abstract

Kidney bean, also known as French bean, is a very popular worldwide. Phaseolin is the major seed storage protein also knows as storage globulin. Very little is known about its structure function relation. The fasta format of the protein sequence of Phaseolin was obtained from UniProt database. Due to lack of their structure, structure prediction was necessary, because protein structure play an important role in their function. Our present work is based on the production of a high quality model structure of phaseolin by SWISS MODEL web server. After getting model structure, AMMP minimization was performed and then explicit water box was added and finally NAMD minimization was performed. After dual minimization, the protein was passed by a series of quality testing. Ramachandran plot was calculated by VegaZZ user interface. ProQ was used for quality testing of this model. Structural superimposition was performed from STRAP interface. CASTp web tool was used to predict active sites with their respective volume and area. Finally ProFunc web tool was used for analysis of this structure. Thus the model decodes sequence information correctly and represents the true functional state.

Keywords: Phaseolin; SWISS MODEL; AMMP Minimization; NAMD Minimization; CASTp; ProFunc.

1. Introduction

Phaseolin is the main reserve globulin in seeds of the French bean. It was named and first isolated and characterized by Thomas Burr Osborne in 1894. Protein structure plays an important role in their function. Many sequences are present in UniProt database. However, structural information is truly limited for the protein.

(2)

expected on the basis of sequence conservation alone. The sequence alignment and template structure was then used to produce a structural model of the target. Because protein structures are more conserved than DNA sequences, detectable levels of sequence similarity usually imply significant structural similarity. The quality of the homology model is dependent on the quality of the sequence alignment and template structure. The approach could be complicated by the presence of alignment gaps (indels) which indicate a structural region present in the target but not in the template, and further by structure gaps in the template that arise from poor resolution in the experimental procedure (usually X-ray crystallography).

Model quality declines with decreasing sequence identity; a typical model has ~1–2 Å root mean square deviation between the matched Cα atoms at 70% sequence identity but only 2–4 Å agreement at 25% sequence identity. However, the errors are significantly higher in the loop regions, where the amino acid sequences of the target and template proteins may be completely different. There are many online servers available for homology modeling of which SWISS MODEL was used for the present work for structure generation. SWISS-MODEL (http://swissmodel.expasy.org) is a server for automated comparative modeling of three-dimensional (3D) protein structures. It pioneered the field of automated modeling starting in 1993 and is the most widely-used free web-based automated modeling facility in present day. After getting the model structure, AMMP Minimization in vacuum condition was performed. Finally NAMD minimization was performed using conjugate gradient algorithm under explicit water box (dimension 58x78x89A3) condition.

The Ramachandran plot is a plot of phi (φ) and psi (ψ) i.e. main chain dihedral angles of the amino acids of proteins. Ramachandran, et. al. (1963) developed the plot wherein the plot is basically φ vs. ψ. In structural biology this plot has great value in that it helps to judge the quality and geometry of solved structure.

ProQ utilizes a combination of structural features to predict the global and local quality, respectively. It uses similar type of structural features such as: atom–atom contacts, residue–residue contacts, surface area exposure, and secondary structure agreement as the input to a neural network trained to predict the quality. The term structural superposition refers to rotations and translations performed on one molecular structure to make it match another structure or structures. Inherent in the definition of structural superposition is an assumed measure of structural similarity.

Binding sites and active sites of proteins and DNAs are often associated with structural pockets and cavities. CASTp server uses the weighted Delaunay triangulation and the alpha complex for shape measurements. It provides identification and measurements of surface accessible pockets as well as interior inaccessible cavities, for proteins and other molecules. It measures analytically the area and volume of each pocket and cavity, both in solvent accessible surface (SA, Richards' surface) and molecular surface (MS, Connolly's surface). It also measures the number of mouth openings, area of the openings, and circumference of mouth lips, in both SA and MS surfaces for each pocket. Finally ProFunc online web tool was used to predict the function of this model protein. The ProFunc web server (http://www.ebi.ac.uk/thornton-srv/databases/profunc/index.html) had been developed to help identifying the likely biochemical function of a protein from its three-dimensional structure.

2. Materials and Methods

2.1 Obtaining sequence

To perform effective bioinformatics on sequence and structure, it is very much essential to collect protein sequence from highly annotated sequence databases. The protein sequence of Phaseolin (ID Q43633) was collected from UniProt database (http://www.uniprot.org/). The fasta format of this protein sequence was downloaded from UniProt home page. The protein sequence is 430 amino acid long of which first 24 amino acids act as signal peptide. The signal peptide was deleted from the main fasta sequence. And then the modified fasta sequence was used for further experiments.

2.2 BLAST for identification of template

BLAST (Basic Local Alignment Search Tool) is an effective tool in bioinformatics. Inside the blast homepage (http://blast.ncbi.nlm.nih.gov/Blast.cgi) protein blast option was clicked. In the next page, the fasta format of this protein sequence was uploaded and changes the database into Protein Data Bank proteins (pdb) and finally clicked on BLAST option. After a waiting time the BLAST result appeared. From the BLAST result page 2PHL_A was selected for template (E value:-0.0 and Max ident 96%). The protein sequence and structure of 2PHL_A were downloaded.

2.3 CLUSTALW2 for sequence alignment

(3)

2.4 SWISS MODEL

SWISS-MODEL is an effective structural bioinformatics web-server was used for structure production from protein sequence (http://swissmodel.expasy.org/). It has three modes for structure production; these are Automated Mode, Alignment Mode and Project Mode. Among these, the Alignment Mode was selected. Then email id, project title were entered and change the alignment input format into CLUSTALW. The alignment file was then uploaded and submit alignment option was clicked. In the next page template PDB code and chain id were entered. Then submit alignment option was clicked. In the next page again submit alignment option was clicked. After few minutes the result page appeared. The model structure was downloaded from this page in pdb format.

2.5 Energy minimization

The model structure thus obtained as described in section 2.4 was subjected for refinement via energy minimization. Two steps energy minimization were performed: firstly under vacuum condition (i.e. dielectric constant 1.0) followed by under explicit water box condition by the use of AMMP and NAMD packages respectively. The minimization in vacuum (i.e. AMMP) was performed with 3000 steps using conjugate gradient optimizer model, 0.01 Toler and 0 steepest steps. This minimization process was considered completed when Δβ < 1.5% between two consecutive models and IMAXF (absolute maximum force of any atom) < 10% of its total potential energy. Best frame thus obtained was refined by NAMD minimization for 5000 iterations. After minimization, the lowest energy structure was taken and this structure was subjected to quality assessment with respect to its geometry and energy.

2.6 Ramachandran Plot calculation by using VegaZZ

This final model protein was opened in VegaZZ software. Then calculate option was clicked and finally Ramachandran plot option was clicked. The Ramachandran plot appeared and it was saved.

2.7 Protein structure validation by using ProQ

ProQ (http://www.sbc.su.se/~bjornw/ProQ/ProQ.html) is a neural network based predictor that based on a number of structural features predicts the quality of a protein model. ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures. Two quality measures were predicted, LGscore and MaxSub. It is software that checks the quality of this model protein. Inside the home page, ProQ web server option was clicked. In the next page final model structure was uploaded. Finally submit button option was clicked. Then predicted LG score and Max Sub values were obtained.

2.8 Structural Superimposition

It is very much effective tool in bioinformatics for comparing two structures. This structural superimposition was performed from STRAP interface. The model protein and template protein (2PHL_A) were loaded in STRAP. Then align option was clicked and then superimpose structures was clicked. Superimpose3D_TM_align option was selected. The reference protein and mobile protein were selected and finally Go option was clicked. Then superimposition was viewed in Pymol plug-in.

2.9 Structure Analysis by using CASTp

CASTp (i.e. Computed Atlas of Surface Topography of proteins) is a web tool is used to predict active sites with their respective volume and area. Inside the home page (http://sts-fw.bioengr.uic.edu/castp/calculation.php), the model structure was uploaded and “Submit” button option was clicked. In the next page result appeared.

2.10 Function prediction by using ProFunc

The ProFunc server had been developed to identify the likely biochemical function of a protein from its three-dimensional structure. Inside the ProFunc homepage the model structure was uploaded. Then in the next page, name, institute/company, email address and protein name were entered and finally RUN button was clicked. The result was sent in the provided email id. Finally the sequence motifs, matching folds data and 3D functional template were obtained.

3. Results and Discussion

(4)

good LG score and Max Sub which are shown in Table.4.2 obtained from ProQ. It indicates that the structure is good according to the ProQ validation criteria.

Structural superimposition was performed from STRAP interface. This Superimposition is shown in Fig.4.6 obtained from Pymol plug-in. The result of CASTp of this structure is shown in the Fig.4.7. From the result of CASTp server, the active site of the protein can easily be identified. The active site covers area about 385.3 Å2 and volume of about 720.9 Å3.

From the ProFunc result page, we see that five motifs were matched in scan against PROSITE, PRINTS, PFam-A, TIGRFAM, PROFILES and PRODOM motifs, 35 matching sequences found by FASTA search , 50 matching sequences found by BLAST search , 424 significant structural matches, 1 nest located in the structure . This generated protein structure was passed a series of computational experiment. After a series of computational experiment we finally conclude that this model is of high quality that we hope would help researchers for further studies. It is more so as the data bases contain very few experimentally solved structures.

4. Figures, Tables and Photographs

Fig.4.1 Plot of Z-Score QMEAN Obtained from SWISS MODEL

 

(5)

 

Fig.4.3 Structure of Final Model Protein.

 

Fig.4.4 Model protein inside the Water Box.

 

Fig.4.5 Ramachandran plot of model protein.

 

Fig.4.6 Structural superimposition between model protein and template (2PHL_A).

STRUCTURE GENERATED FROM

LG SCORE

MAX SUB

SWISS MODEL

4.126 0.260

Table 4.2 The LG SCORE and MAX SUB obtained from ProQ.

STRUCTURE OBTAINED FROM

Residues inside the plot (red + yellow)

Residues without steric clashes (red)

SWISS MODEL 77.90% 49.06%

(6)

5. Conclusion

Experimentally protein structures are obtained from X-Ray Crystallography and NMR (Nuclear Magnetic Resonance). Both processes are very much time consuming, laborious and expertise technician is required to perform this work. But in Bioinformatics approach, the protein structure is produce by homology modeling from many authenticated web server. This process is quite easy and time saving. RCSB Data base has only one in duplicate, experimentally solved structure for phaseolin. That why it is very much essential to perform Homology Modeling for obtaining insight into details of structure of the protein. The protein sequence of Phaseolin was retrieve from UniProt web site, then BLAST was perform for template identification and sequence alignment was performed by CLUSTALW2. Then it was uploaded in SWISS MODEL web server for getting structure. This structure was subjected for two step energy minimization starting with AMMP and followed by NAMD for quality improvement. The final structure thus obtained passes most of the authenticated quality testing web based softwares (such as ProQ, CASTp and ProFunc). To the best of our knowledge the model of Phaseolin, is reported for the first time. We further claim that our model for phaseolin is of high quality which is expected to help researchers to extract atomic information in relation to structure function of the protein.

Acknowledgments

Authors were grateful for the computer facility laboratory of DBT, Government of India in The Department of Biotechnology, The University of Burdwan. SM likes to thank Sanchari Chowdhury for her inspiration during the execution of the present work.

References

[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990): "Basic local alignment search tool." ,J. Mol. Biol, 215:403-410.

[2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997): "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.", Nucleic Acids Res., 25:3389-3402.

[3] Arnold K., Bordoli L., Kopp J., and Schwede T. (2006): The SWISS-MODEL Workspace: A web-based environment for protein structure homology modeling, Bioinformatics, 22,195-201.

[4] A. Pedretti, L. Villa, G. Vistoli (2004): "VEGA - AN OPEN PLATFORM TO DEVELOP CHEMO-BIO-INFORMATICS APPLICATIONS, USING PLUG-IN ARCHITECTURE AND SCRIPT PROGRAMMING", J.C.A.M.D., Vol. 18, 167-173. [5] A. Pedretti, L. Villa, G. Vistoli (2002): "VEGA: A VERSATILE PROGRAM TO CONVERT, HANDLE AND VISUALIZE

MOLECULAR STRUCTURE ON WINDOWS-BASED PCs", J. Mol. , Vol. 21, 47-49 .

[6] A. Pedretti, L. Villa, G. Vistoli(2003): "ATOM-TYPE DESCRIPTION LANGUAGE: A UNIVERSAL LANGUAGE TO RECOGNIZE ATOM TYPES IMPLEMENTED IN THE VEGA PROGRAM", Theor. Chem. ,109 (4), 229-32.

 

(7)

[7] Benkert P, Biasini M, Schwede T. (2011): "Toward the estimation of the absolute quality of individual protein structure models, Bioinformatics, 27(3):343-50.

[8] Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A.(2001): A study of quality measures for protein threading models, BMC Bioinformatics, 2(1):5.

[9] Gish, W. & States, D.J. (1993): "Identification of protein coding regions by database similarity search.", Nature Genet, 3:266-272. [10] Goujon M, McWilliam H, Li W, Valentin F, Squizzato S, Paern J, Lopez R (2010): A new bioinformatics analysis tools framework at

EMBL-EBI, Nucleic acids research, 38, W695-9.

[11] Guex, N. and Peitsch, M. C. (1997): SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modeling, Electrophoresis, 18: 2714-2723.

[12] Harrison R. W. "Stiffness and Energy Conservation in Molecular Dynamics: an Improved Integrator" (1993): J. Comp. Chem., 14 1112-1122.

[13] Harrison, R.W., Chatterjee D., and Weber I.T (1995): "Analysis of six protein structures predicted by comparative modeling techniques." Proteins: Structure Function and Genetics, 23:463-471.

[14] James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten (2005): Scalable molecular dynamics with NAMD, Journal of Computational Chemistry, 26:1781-1802.

[15] Joe Dundas, Zheng Ouyang, Jeffery Tseng, Andrew Binkowski, Yaron Turpaz, and Jie Liang. (2006): CASTp: computed atas of surface topography of proteins with structural and topographical mapping of functionally annotated residues, Nucl. Acids Res., 34:W116-W118.

[16] Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009): The SWISS-MODEL Repository and associated resources, Nucleic Acids Research, 37, D387-D392.

[17] Laskowski R A, MacArthur M W, Moss D S, Thornton J M (1993): PROCHECK - a program to check the stereochemical quality of protein structures., J. App. Cryst., 26, 283-291.

[18] Laskowski R A, Rullmannn J A, MacArthur M W, Kaptein R, Thornton J M (1996):AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR, 8, 477-486.

[19] Laskowski R A, MacArthur M W, Thornton J M (2001): PROCHECK: validation of protein structure coordinates, in International Tables of Crystallography, Volume F. Crystallography of Biological Macromolecules, eds. Rossmann M G & Arnold E, Dordrecht, Kluwer Academic Publishers, 722-725.

[20] Lovell SC, Davis IW, Arendall WB 3rd, de Bakker PI, Word JM, Prisant MG, Richardson JS, Richardson DC (2003): Structure validation by Calpha geometry: phi,psi and Cbeta deviation.,Proteins, 15;50(3):437-50.

[21] Madden, T.L., Tatusov, R.L. & Zhang, J. (1996), "Applications of network BLAST server", Meth. Enzymol. 266:131-141. [22] Peitsch, M. C. (1995): Protein modeling by E-mail Bio/Technology 13: 658-660.

[23] RAMACHANDRAN GN, RAMAKRISHNAN C, SASISEKHARAN V (1963): "Stereochemistry of polypeptide chain configurations, J. Mol. Biol, 7: 95–9.

[24] Schwede T, Kopp J, Guex N, and Peitsch MC (2003): SWISS-MODEL: an automated protein homology-modeling server, Nucleic Acids Research, 31: 3381-3385.

[25] Zhang Z., Schwartz S., Wagner L., & Miller W. (2000): "A greedy algorithm for aligning DNA sequences", J Comput Biol, 7(1-2):203-14.

Figure

Fig.4.4 Model protein inside the Water Box.

References

Related documents

BALAGUER maintained a tight grip on power for most of the next 30 years when international reaction to flawed elections forced him to curtail his term in 1996.. Since then,

Since most investigations on the so called ‘ non-contact biofilm removal ’ were performed by directly positioned oral devices towards the center of the biofilm coated disc

Twenty-five percent of our respondents listed unilateral hearing loss as an indication for BAHA im- plantation, and only 17% routinely offered this treatment to children with

15% of all HMOs in Loughborough’s ‘Golden Triangle’ sit empty of student tenants. Yet, masked by this overall figure are important micro-geographies of how de-studentification

the Lion country in Southern Asia sent envoys to dedicate tributes, and expressed willingness to promote Buddhism with the Liang of China, “Desire to advocate Buddhism with Liang

Inclusion means creating a working culture where differences are valued ; where everyone has the opportunity to develop skills and talents consistent with our values and

The incidence rates observed in the current study (323/10,000 person-years for gallbladder surgery and 854/10,000 person-years for abdominal and pelvic sur- gery among