P G DIPLOMA IN BIOINFORMATICS

(1)

Name Course Code Name of the Course Credits Module I

Basic

Bioinformatics

PGD BINF 301 Introduction to Bioinformatics and Databases 2

PGD BINF 302 Genome and Protein Sequence Analysis 2

PGD BINF 303 Genomics and proteomics 2

PGD BINF 304 Lab : Bioinformatics Databases 2

PGD BINF 305 Lab : Sequence Analysis 2

Module II Applied Bioinformatics

PGD BINF 401 Systems Biology 2

PGD BINF 402 Structural Biology 2

PGD BINF 403 Computer Aided Drug Design 2

PGD BINF 404 Lab : Structural Biology 2

(2)

Introduction to Bioinformatics and Databases (2 Credits)

Unit-I

Bioinformatics: an overview - Introduction to Computational Biology and Bioinformatics; some of the biological problems that require computational methods for their solution; Role of internet and www in bioinformatics.

Unit-II

Biological Data Acquisition – The form of biological information; DNA sequencing methods – basic DNA sequencing, automated DNA sequencing, DNA sequencing by capillary array and electrophoresis; Types of DNA sequences – genomic DNA, cDNA, recombinant DNA, Expressed sequence tags (ESTs), Genomic survey sequences (GSSs); RNA sequencing methods; Protein structure determination methods; gene expression data.

Unit-III

Databases : Format and Annotation – Conventions for databases indexing and specification of search terms; Common sequencing file formats – NBRF/PIR, FASTA, GDE; Files for multiple sequence alignment – multiple sequence format (MSF), ALN format; Files for structural data – PDB format and NMR files; Annotated sequence databases – primary sequence databases (GenBank-NCBI, the nucleotide sequence database-EMBL, DNA sequence databank of Japan-DDBJ; Subsidiary data storage (ESTs, dbESTs, GSSs), unfinished genomic sequence data, organisms specific databases (EcoGene, SGD, MatDB, TAIR, FlyBase, OMIM, etc.); Protein sequence and structure databases (PDB, SWISS-PROT and TrEMBL); List of

Gateways (NCBI, GOLD, MIPS, TIGR, UniGene)

Unit-IV

Data : Access, Retrieval and Submission – Data access – standard search engines, Data retrieval tools – Entrez, DBGET and SRS (sequence retrieval systems); Software for data building; Submission of new and revised data.

Unit-V

Sequence Similarity Searches – Sequence homology as product of molecular evolution; Sequence similarity searches; Significance of sequence alignment; Sequence alignment – global, local and free-space; Alignment scores and gap penalties; Measurement of sequence similarity; Similarity and homology.

Text Books :

1. Mount, D. (2004) “Bioinformatics: Sequence and Genome Analysis”; Cold Spring Harbor Laboratory Press, New York. (ISBN 0-87969-712-1)

2. Baxevanis, A.D. and Francis Ouellellette, B.F. (1998) “Bioinformatics – a practical guide to the analysis of Genes and Proteins”; John Wiley and Sons, New Jersey, USA.

(3)

Genome and Protein Sequence Analysis (2 Credits)

Unit-I

Sequence Analysis: Basic concepts of sequence similarity, identity and homology, definitions of homologues, orthologues, paralogues and xenologues Scoring matrices: basic concept of a scoring matrix, Matrices for nucleic acid and proteins sequences, PAM and BLOSUM series, matrix derivation methods and principles. Database Searches: Keyword-based Entrez and SRS; Sequence-based: BLAST & FASTA; Use of these methods for sequence analysis including the on-line use of the tools and

interpretation of results from various sequence and structural as well as bibliographic databases Unit-II

Sequence alignment : Basic concepts of sequence alignment, Needleman and Wunsch, Smith and Waterman algorithms for pairwise alignments, use of pairwise alignments for analysis of nucleic acid and protein sequences and interpretation of results, basic concepts of various approaches for multiple sequence alignment (e.g. progressive, iterative). Algorithm of CLUSTALW and PileUp and their

applications for sequence analysis

Unit-III

Sequence patterns and profiles: Basic concept and definition of sequence patterns, motifs and profiles, various types of pattern representations viz. consensus, regular expression (Prosite-type) and sequence profiles; profile-based database searches using PSI-BLAST, analysis and interpretation of profile-based searches. Tools for searching sequence patterns: MeMe, PHI-BLAST, SCanProsite and PRATT.

Unit-IV

Comparative Genomics: Basic concepts, Applications of Comparative Genomics: Identification of Gene, Regulatory regions, Virulence factors/ Pathogenecity Islands, Reconstruction of metabolic networks. Genome Analysis Tools: Artemis, BLAST2, MegaBLAST, GenePlot. Comparative genomics databases: KEGG, DEG, COG.

Unit-V

Phylogeny: Terminology, Steps in Phylogenetic analysis, Different tree construction methods: Distance– based methods- Neighbor Joining, UPGMA and Fitch–Margoliash; Character-based methods- Maximum Parsimony and Maximum Likelihood. Phylogenetic programs: Phylip, Mega, PAUP, Phylodraw.

Text Books:

1. Mount, D. (2004) “Bioinformatics: Sequence and Genome Analysis”; Cold Spring Harbor Laboratory Press, New York.

2. Baxevanis, A.D. and Francis Ouellellette, B.F. (1998) “Bioinformatics – a practical guide to the analysis of Genes and Proteins”; John Wiley & Sons, UK.

Reference Books

1. Pevzner, P.A. (2004) “Computational Molecular Biology”; Prentice Hall of India Ltd, New Delhi. 2. Lesk, A.M. (2002) “Introduction to Bioinformatics”, First edition, Oxford University Press, UK. 3. Sensen, C.W. (2002) “Essentials of Genomics and Bioinformatics”; Wiley-VCH Publishers, USA.

(4)

Proteomics and Genomics (2 Credits)

Unit-I

Genomics and Metagenomics: Large scale genome sequencing strategies. Genome assembly and annotation. Genome databases of Plants, animals and pathogens. Metagenomics: Gene networks: basic concepts, computational model such as Lambda receptor and lac operon. Prediction of genes, promoters, splice sites, regulatory regions: basic principles, application of methods to prokaryotic and eukaryotic genomes and interpretation of results. Basic concepts on identification of disease genes, role of bioinformatics-OMIM database, reference genome sequence, integrated genomic maps, gene expression profiling; identification of SNPs, SNP database (DbSNP). Role of SNP in Pharmacogenomics, SNP arrays. Basic concepts in identification of Drought stress response genes, insect resistant genes, nutrition enhancing genes

Unit-II

Epigenetics: DNA microarray: database and basic tools, Gene Expression Omnibus (GEO), ArrayExpress, SAGE databases DNA microarray: understanding of microarray data, normalizing microarray data, detecting differential gene expression, correlation of gene expression data to biological process and computational analysis tools (especially clustering approaches)

Unit-III

Comparative genomics: Basic concepts and applications, whole genome alignments: understanding the significance; Artemis, BLAST2, MegaBlast algorithms, PipMaker, AVID, Vista, MUMmer, applications of suffix tree in comparative genomics, synteny and gene order comparisons Comparative genomics databases: COG, VOG

Unit-IV

Functional genomics: Application of sequence based and structure-based approaches to assignment of gene functions – e.g. sequence comparison, structure analysis (especially active sites, binding sites) and comparison, pattern identification, etc. Use of various derived databases in function assignment, use of SNPs for identification of genetic traits. Gene/Protein function prediction using Machine learning tools viz. Neural network, SVM etc

Unit-V

Proteomics: Protein arrays: basic principles. Computational methods for identification of polypeptides from mass spectrometry. Protein arrays: bioinformatics-based tools for analysis of proteomics data (Tools available at ExPASy Proteomics server); databases (such as InterPro) and analysis tools. Protein-protein interactions: databases such as DIP, PPI server and tools for analysis of Protein-protein-Protein-protein interactions

Text Books:

1. Discovering Genomics, Proteomics and Bioinformatics 2nd edition - by A. Malcolm Campbell and Laurie J. Heyer. by Cold Spring Harbor Laboratory Press 2006.

(5)

Lab : Bioinformatics Databases (2 Credits)

Exercises:

1. Entrez and Literature Searches.

2. Sequence Retrieval System of Biological Databases 3. File format conversion

4. Sequence Analysis

5. Phylogenetic analysis using PHYLIP, Phylodraw, PAUP, Treeview, JalView. 6. Usage of Softwares:

a. BioEdit

b. GeneDoc

c. ClustalW / X, MEGA, MEME 7. Usage of Visualization Tool

a. RasMol b. Cn3D c. MolMol

(6)

Lab : Sequence Analysis (2 Credits) Exercises:

1. Sequence Analysis Packages – EMBOSS, NCBI ToolKit 2. Dynamic programming.

3. Analysis of Biological Sequences. 4. FASTA

5. Multiple sequence alignment

6. MEME/MAST, eMotif, InterproScan, ProSite, ProDom, Pfam 7. Phylogenetic analysis – PAUP, PHYLIP, MacClade, MEGA 8. Genome annotation – Artemis.

9. Hypothetical Protein analysis 10. Genome Comparison

(7)

SYSTEMS BIOLOGY (2 Credits) Unit-I

Systems Biology - Objectives of Systems Biology, Strategies relating to In silico Modeling of biological processes, Metabolic Networks, Signal Transduction Pathways, E-cell and V-cell.

Unit-II

Reconstruction of pathways and annotation – Reconstructing metabolic pathways from sequence and function information in microbial species; statistical profiling and function annotation of genomes with a microbial genome as an example.

Unit-III

Profile analysis – Expression profile analysis of cells, Microarray and genome wide expression analysis, Genomics and Proteomics in medicine, Connectivity maps, high throughput sequencing and assembly, SNPs and their applications.

Unit-IV

Databases for Systems Biology – Genome databases (NCBI Entrez Genome databases, Ensembl), Metabolic pathways databases (KEGG, EMP, EcoCyc, MetaCyc) and Expression databases (Gene Expression Omnibus and ArrayExpress).

Unit-V

Biological simulation and network - Constrained based modeling (Flux balance analysis): Stoichiometric matrix, Linear optimization, Elementary flux modes, Extreme pathways. Biological network: scale-free network, Properties of a network (Nodes, edges, hubs, clustering coefficient and diameter).

Text Book:

1. Alon, U. (2006) An Introduction to Systems Biology. Chapman & Hall/CRC 2. M.E.J. Newman (2010) Networks: An Introduction. Oxford University Press. Reference Books:

1. Wilkins, M.R., Wiliams, K.L., Appel, R.D. and Hochstrasser, D.F. (1997) “Proteome Research: New frontiers in Functional Genomics”, Springer Verlag, New York, USA.

2. Witten, I.H. and Frank, E. (2005) “Data mining: Practical Machine Learning Tools and Techniques”, Morgan Kauffman Publishers, USA.

(8)

Structural Biology (2 Credits)

Unit-I

Basic structural features of macro molecules like proteins, nucleic acids and carbohydrates; concepts of secondary structures of proteins –  helix,  sheet; motifs, domains, tertiary and quaternary structures; Structure validation - Ramachandran Plot

Unit-II

Introduction to Experimental Methods: X-ray Diffraction, NMR, Electron microscopy. Database of experimental structures (PDB, NDB).

Intermolecular interactions – Hydrogen bonding, van Der walls forces, hydrophobic and hydrophilic factors, ionic interactions; introduction to membrane proteins.

Unit-III

Structures of DNA; A, B, and Z-DNA, DNA bending. Structure of RNA. Structure of Ribosome. Unit-IV

Methods for prediction of secondary and tertiary structures of proteins – knowledge-based structure prediction; fold recognition; ab initio methods for structure prediction, Comparative protein modeling Unit-V

Methods for comparison of 3D structures of proteins; Methods to predict three dimensional structures of nucleic acids, rRNA; Electrostatic energy surface generation.

Text Books :

1. Andrew R. Leach (2001) “Molecular Modeling – Principles and Applications”; Second Edition, Prentice Hall, USA.

2. George H Stout, and Lyle H Jensen (1989) “ X-ray Structure Determination : A Practical Guide”; Second Edition. Wiley-Interscience Publication.

3. Creighton, T.E. (1993) “Proteins: structure and molecular properties”; Second edition, W.H. Freeman and Company, New York, USA.

Reference Books :

1. Mount, D. (2004) “Bioinformatics: Sequence and Genome Analysis”; Cold Spring Harbor Laboratory Press, New York.

2. Lesk, A.M. (2001) “Introduction to Protein Architecture”, Oxford University Press, UK. 3. Mcpherson, A. (2003) “Introduction of Molecular Crystallography”, John Wiley Publications,

(9)

Computer Aided Drug Design (2 Credits)

Unit-I

Introduction to Drugs: How drugs work - Drug targets, drug-target interaction and dose-response relationships; ADME & Bioavailability of drugs – drug-drug interaction & drug toxicity.

New Drug Discovery & Development: Lead Discovery – Drug likeness concept & Lipinski’s rule of 5 - Preclinical & Clinical Testing of New Drugs - New Drug Approval.

CADD: Introduction, Computer hardware and software for CADD. Unit-II

Molecular Mechanics: Introduction of molecular mechanics. Coordinate System; molecular graphics & potential energy surfaces. force fields and their component; Bonded and non-bonded interactions – importance of hydrogen bonding; Energy minimization algorithms.

Unit-III

Molecular Dynamics Simulation Methods – Molecular Dynamics using simple models; Molecular Dynamics with continuous potentials and at constant temperature and pressure; Time-dependent properties; Solvent effects in Molecular Dynamics; Conformational changes from Molecular Dynamics simulation.

Unit-IV

Analog based drug design: QSAR and QSPR Methodology for drug design - Various Descriptors used in QSAR studies - deriving & validating QSAR equations – 3D QSAR – application in drug design and ADME prediction. 3D Pharamcophore development: Conformation generation - deriving and using 3D Pharmacophores;.

Unit-V

Structure based drug design: Molecular Docking: Docking approaches - Search algorithm - Scoring function; de novo ligand design - Linking & growing methods - applications; and Virtual screening - data base searching to identify leads.

Text Books:

1. Molecular Modeling – Principles and Applications by Andrew R. Leach Second Edition, Prentice Hall, USA, 2001.

2. Computational Drug Design: A guide for Computational & Medicinal Chemists by David C Young, John Wiley & Sons, Inc. 2009.

Reference Books

1. The organic chemistry of drug design and drug action by Richard B. Silverman, Elesevier, 2004. 2. Molecular Modeling: Basic Principles and Applications, by Hans-Dieter Höltje, Wolfgang Sippl,

Didier Rognan, 3rd Edition, Wiley-vch Verlag Gmbh, 2008.

3. Burger’s Medicinal Chemistry and Drug discovery. Volume 2, Drug Discovery and development. 6th Edition. by Andre I. Khuri, Donald J. Abraham, Alfred Burger, Wiley-Interscience, 2003.

(10)

Lab : Structural Biology (2 Credits) Exercises

1. Advanced Visualization Software and 3D representations. 2. Coordinate generations and inter-conversions.

3. Secondary Structure Prediction

4. Fold Recognition, ab initio (Rosetta Server) 5. Homology based comparative protein modeling. 6. Energy minimizations. 7. Validation of models. a. WHATIF b. PROSA c. PROCHECK d. VERIFY 3D

8. Protein Structure Alignment. 9. Modeller

10. Geno-3D

(11)

Lab : Computer Aided Drug Design (2 Credits)

Exercises

Molecular modeling: Viewers, generating conformations, file format conversion, databases Molecular mechanics: Structural characterization

Molecular Dynamics: Simulation of small systems and conformational analysis Docking and Drug Design: QSAR analysis & Docking