Bengt Persson
Sequence Information
Linköping University & Karolinska Institutet 2
Sequence information
Sequence
Comparisons
Pair-wise
Multiple
Database searches
SRS
Entrez
Protein families
Patterns
Post-translational
modifications
Organell
localisation
Orthologue
clusters
InterPro
Pfam
Prosite
Membrane
attachment
Secondary
structure
Linköping University & Karolinska Institutet 3
Sequence information
Sequence
Comparisons
Pair-wise
Multiple
Database searches
SRS
Entrez
Protein families
Patterns
Post-translational
modifications
Organell
localisation
Orthologue
clusters
InterPro
Pfam
Prosite
Membrane
attachment
Secondary
structure
Linköping University & Karolinska Institutet 4
Good web sites
www.expasy.org
www.ebi.ac.uk
www.ncbi.nlm.nih.gov
Protein
Protein
family
family
databases
databases
Linköping University & Karolinska Institutet 6
Protein families, nomenclature
Super-family
– Family
• Sub-family
Linköping University & Karolinska Institutet 7
InterPro
Prosite
– Amos Bairoch, Genève
Pfam
– Erik Sonnhammer, KI and Sanger Institute, UK
PRINTS
– Terri Attwood, UCL, London, UK
ProDom
– Daniel Kahn, INRA, Toulouse, France
SMART
– Peer Bork, EMBL
Swissprot+TrEMBL
Linköping University & Karolinska Institutet 8
InterPro entry
Linköping University & Karolinska Institutet 9
InterPro entry, cont.
Linköping University & Karolinska Institutet 10
InterPro entry, cont.
Linköping University & Karolinska Institutet 11
InterPro -- protein matches
Linköping University & Karolinska Institutet 12
InterPro -- protein matches, graphical
Linköping University & Karolinska Institutet 13
Prosite
Database of protein families and domains
Release 16, September 1999
1035 documentation entries
1375 different patterns
http://www.expasy.ch/prosite/
Amos Bairoch, University of Geneva
Linköping University & Karolinska Institutet 14
Prosite
Linköping University & Karolinska Institutet 15
Prosite
Linköping University & Karolinska Institutet 16
ScanProsite
Linköping University & Karolinska Institutet 17
Prosite, documentation entry
Linköping University & Karolinska Institutet 18
Example of Prosite patterns
Post-translational modifications
Domains
DNA or RNA associated proteins
Enzymes
Electron transport proteins
Other transport proteins
Structural proteins
Receptors
Hormones and active peptides
Toxins
Inhibitors
Protein secretion and chaperones
Cytokines and growth factors
Others
Linköping University & Karolinska Institutet 19
Pfam
A collection of protein families and domains.
Pfam contains multiple protein alignments and
profile-HMMs of these families.
Pfam is a semi-automatic protein family database,
which aims to be comprehensive as well as accurate.
http://www.sanger.ac.uk/Software/Pfam/index.shtml
http://www.cgr.ki.se/Pfam
Linköping University & Karolinska Institutet 20
Hidden Markov Models (HMMs)
Statistical profile method
Enables database searches
Enables multiple alignment creation
Linköping University & Karolinska Institutet 21
Pfam
Linköping University & Karolinska Institutet 22
Pfam
Linköping University & Karolinska Institutet 23
Pfam
Linköping University & Karolinska Institutet 24
COG--Clusters of Orthologous Groups
Linköping University & Karolinska Institutet 25
Functional groups of protein families
Linköping University & Karolinska Institutet 26
COG
Linköping University & Karolinska Institutet 27
COG
Predictions
Predictions
of
of
structure
structure
and
and
post
post
-
-translational
Linköping University & Karolinska Institutet 29
Structure predictions
Secondary structure
Hydrophilicity
Membrane-spanning regions
Antigenicity
Glycosylation
Acetylation
… and much more ...
Linköping University & Karolinska Institutet 30
Secondary structure predictions
Chou & Fasman (CF)
Garnier, Osguthorpe & Robson (GOR)
–
http://pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html
neural networks (e.g. PHD)
–
http://dodo.cpmc.columbia.edu/predictprotein/
Linköping University & Karolinska Institutet 31
Artificial Neural Networks (ANNs)
Statistical method
Pattern recognition, e. g. secondary structure
predictions
Output layer
Hidden layer
Input layer
Output layer
Hidden layer
Input layer
modified from Yvonne Kallberg
Linköping University & Karolinska Institutet 32
The PredictProtein server
Linköping University & Karolinska Institutet 33
Default submission form
Linköping University & Karolinska Institutet 34
Hydrophilicity
Kyte & Doolittle
Hopp & Woods
Linköping University & Karolinska Institutet 35
Example of hydrophilicity and secondary
structure plots
Linköping University & Karolinska Institutet 36
ProtScale
A general tool for plotting sequence properties,
e.g. hydrophilicity
Linköping University & Karolinska Institutet 37
ProtScale, selection of property to plot
Linköping University & Karolinska Institutet 38
ProtScale, results
Linköping University & Karolinska Institutet 39
ProtScale, Graphic view
Linköping University & Karolinska Institutet 40
Membrane protein prediction, TMAP
Linköping University & Karolinska Institutet 41
Membrane protein prediction, TMAP
Linköping University & Karolinska Institutet 42
TMAP, graphics output
Linköping University & Karolinska Institutet 43
Prediction servers at CBS
www.cbs.dtu.dk/services/
Linköping University & Karolinska Institutet 44
SignalP
Linköping University & Karolinska Institutet 45
SignalP -- Results
Linköping University & Karolinska Institutet 46
SignalP -- Results, cont.
Linköping University & Karolinska Institutet 47
TargetP
Linköping University & Karolinska Institutet 48
TargetP -- Results
Linköping University & Karolinska Institutet 49
Phobius
Linköping University & Karolinska Institutet 50
Phobius, results
Linköping University & Karolinska Institutet 51
ExPASy site map
Linköping University & Karolinska Institutet 52
Linköping University & Karolinska Institutet 53
Post-translational modifications
Linköping University & Karolinska Institutet 54
Primary structure analysis
Linköping University & Karolinska Institutet 55
Secondary structure prediction
Linköping University & Karolinska Institutet 56
Transmembrane regions & Sequence
alignments