A look at data for quantitative
analysis using
MSight and Phenyx
Atelier Protéomique Quantitative
25-27 Juin 2007
La Grande Motte
Pierre-Alain Binz
Institut Suisse de Bioinformatique
GeneBio SA
Already said
• Importance of biological question, sample choice, experimental
strategy
• Complexity of sample is a challenge for MS
– Peak capacity, concentration range, chemical properties,…
• Many methods with goods and bads
– iTRAQ, SILAC, ICAT, MRM, label-free, …
• Many instrumental settings: heterogeneity of data
– type, amount, resolution
• Many bioinformatics tools
– Identification, signal detection, quantitation
• Validation methods
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Already said
• Importance of biological question, sample choice, experimental
strategy
• Complexity of sample is a challenge for MS
– Peak capacity, concentration range, chemical properties,…
• Many methods with goods and bads
– iTRAQ, SILAC, ICAT, MRM, label-free, …
• Many instrumental settings: heterogeneity of data
– type, amount, resolution
• Many bioinformatics tools
– Identification, signal detection, quantitation
• Validation methods
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Outlook
• Visualise LC-MS data
• Detect signal
• Align LC-MS runs
• Match images (differential analysis)
• Add identification results
• Quantitation with search engine
P-A Binz, Atelier Proteomique Quantitative, juin 2007
What data for quantitation?
• MS data: dimensions:
– m/z
– Intensity
– Rt, pI, scan number
• Secondary data
– Sample (one, more than one)
– Molecular interpretation (peptide, protein)
– Quantitation method (label description, comparison
method, thresholds, corrections)
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Look at LC-MS data
• Raw MS traces or peaklists (spectrum view or
gel view)
• Chromatographic profiles (TIC, XIC )
• 2D images (LC-MS)
• Annotated spectra
• Overlapped spectra, head-to-head view
• Overlapped images
Visualise LC-MS data:
spectrum view, gel view, chromatograms
m/z
I
Rt
P-A Binz, Atelier Proteomique Quantitative, juin 2007
2D representation
183 122888498 104 104 104 110 108 108 116 112 106 116 12098743276 122 145 141 133 10684 110 1167874 104 104 114 1029288 10672888292989690828286909094969474442652 100 141 124 114685088 1005692 100 116 1169682 108 687282868488928884586056506656664036283866928258282650524872 102 114 1207288 120 66648274625254747048465044364036343024323442363628242628284068 1148884 112 131 72606456564234424836363834323630303432343432302624242420182436526094 131 135 54525046403432343432323430322628282630422836383448242620182622223882 124 133 4234363834303432324032263232262626262432364668563632262026181828203674 100 40343026283434323436323426222226262828645084 108 100805440262018282018264272 322828342828282836263228262426262432365276 131 159 147 1359264362220202418202650 2026282830283434262234303020262858445282 120 159 195 195 175 143 108864022181618222432 2220203234446050222644422418141872724872 112 173 205 207 193 175 161 1498454242440605846 242024324866766632426464281612183676483256 161 207 207 203 195 193 187 13396564244929668 2226323852789076304080806026182050 102623650 155 207 207 201 201 201 195 171 13988565280 12498 302626505682 1049654347886765648507698583474 175 211 207 207 211 207 203 193 171 120745474 114 112 262226525880 106 112865040688470707868564048 116 199 213 211 213 215 207 211 203 183 14598605496 118 36263252687696 120 10474404264727664603648 104 171 211 215 213 215 215 207 211 205 183 155 114806886 118 60302432508094 104 116 102744440364240466296 157 199 211 211 213 211 213 205 203 195 175 155 124 1068490 116 744828265688 104 100 106 112 104826646485874 118 155 189 205 207 205 205 211 207 199 197 189 161 143 133 124 11094 106 886850506686 104 110 108 116 124 112988890 106 131 159 187 199 201 205 207 211 213 211 205 201 175 147 131 139 137 143 102 100 106948276808282 100 110 122 133 135 124 124 133 141 155 183 189 195 199 201 205 211 213 211 203 181 155 139 133 141 149 133 106 102 122 114 1069686926858 102 116 129 133 141 145 151 151 155 167 173 175 189 187 195 197 195 187 175 165 169 151 143 137 129 116 106 108 129 131 120 10298 1049498 112 104 100 106 124 126 135 147 149 147 155 167 165 171 179 179 169 177 181 187 189 179 157 151 147 143 131 131 137 135 126 112 108 116 118 118 116969896 114 10084 112 126 131 141 147 141 143 165 157 135 157 159 163 175 173 171 169 173 173 157 143 143 141 131 129 124 131 131 124 1149892 110 1168874 106 120 122 124 1209296 120 10488 120 157 159 165 165 179 175 175 167 155 139 141 143 139 133 139 135 135 124 12098 110 120 1129876 120 120 131 129 133 104 100 120 11490 116 165 149 143 153 165 161 163 149 147 133 135 141 139 139 147 145 143 135 122 110 120 122 114 104 100 129 118 129 133 137 11498 126 131 120 129 165 141 141 149 149 149 149 141 137 13738x26
m/z
Rt
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Time 30 20 40 10 0 m/z 1200 400 600 800 1000
200 Da
20 min
Example: LC-ESI-Q-TOF
28’800’000 measures
(55 MB)
900 spectra
3 s
0-45 min
time
32’000 measures
0.025 m/z
400-1200 m/z
mass
sampling rate
interval
42-59 kDa extract of human BJAB B-cell line
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Image part to display
6000x400
Data display principle
MS data
32000x900
Screen size
800x600
Projection
Time
m/z
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Time 30 20 40 10 0 m/z 1200 400 600 800 1000
200 Da
20 min
Full image
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Less than 0.001 % of the data displayed
0.5 Da
30 s
m/z 660.25 658.25 659 659.25 659.5 659.75 660 658 658.5 658.75 657.5 657.75 660.5 Time 32.5 33Zoom 256x
0.333+
0.52+
MSight
• LC- MS data analysis tool
• Developed by the
Proteome Informatics Group of the
Swiss Institute of Bioinformatics
• Based on Melanie 2D gel analysis software
It looks a bit like Melanie
http://www.expasy.org
P-A Binz, Atelier Proteomique Quantitative, juin 2007Why MSight?
• Generate and evaluate LC-MS images
– Import LC-MS and MS/MS runs from various MS instruments and formats
– Workspace to manage experiments and data
– Rich visualisation and annotation
– Visualise the complexity of a LC-MS run
– Detect contaminants, running aberations
• Perform peak detection from raw LC-MS data
– Improve Rt and m/z accuracy using 2D
• Quantitation and comparison
– Alignment and matching of LC-MS “images”
– Quantitation reports for differential expression analysis
– Label-free quantitation,
– Generation of inclusion/exclusion list
• Integrate with identification tools (Phenyx)
– Annotate MS “peaks” with peptide identity labels
– Use the annotations to validate matching peaks across LC-MS experiments
Import
• Raw LC-MS and MS/MS data format
– Native format (yep, baf, fid, T2D, dat)
– mzXML, mzData
– Ascii exports
• Handle big original files (100MB-1GB)
• Include profile LC-MS trace and MS/MS spectra
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Visualisation
• Open multiple images
• Zoom in/out
• Chromatographic profile (« XIC »)
• Spectrum view
• Editable and searchable annotations
– landmarks, Rt, m/z, peptide sequence, hyperlinks, others
• Synchronisation between views
• Superpose images in transparency mode and
complementary colors
• 3D view
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Artefacts
1 min
100 Da
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Artefacts
500 Da
SDS-MALDI-TOF
Mass calibration
4’392’000 measures
90 spectra
48’800 measures
0.05 m/z
560-3000 m/z
mass
sampling rate
interval
2 Da
0.15
P-A Binz, Atelier Proteomique Quantitative, juin 2007
2 Da
30 s
100 Da
5 min
Contaminants
44 Da Polymer PEG
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Contaminants (2)
5 min
100 Da
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Redundancy: Peptide modifications
10
min
100 Da
Spot from 2DE gel
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Redundancy: Peptide modifications
2
min
5 Da
5.33
(3+)
5.33
(3+)
Oxidation
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Redundancy: Peptide modifications
10
min
100 Da
3+
2+
4+
3+
5+
4+
2+
Oxidation
Outlook
• Visualise LC-MS data
•
Detect signal
• Align LC-MS runs
• Match images (differential analysis)
• Add identification results
• Quantitation with search engine
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Peak detection
• Detect and quantify MS peaks in a 2D image
• Interactive use
• Manual validation via visualisation
• Export in centroid mode
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Peak detection variability
• High vs low resolution in m/z axis
– Isotopic profile vs bump
• Sampling resolution (Rt and m/z)
– LC-MALDI < ESI-MS with MS/MS < ESI-MS (QTOF<LTQ)
• Noise (chemical, electronic)
• Shape (rectangle, circle, other)
• Intensity (max, sum, fit max, integrate)
• And for quantitation:
– Detect individual sample and compare vs
align and use one single shape per aligned feature
P-A Binz, Atelier Proteomique Quantitative, juin 2007
5 min
5 Da
15 s
Locating the source of noise
37.15
P-A Binz, Atelier Proteomique Quantitative, juin 2007
2 min
1 Da
37.15
Locating the source of noise
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Streak
10
min
1 Da
2000
(5+)
12000
(2+)
80
807 808 809 810m/z3000
(2+)
b
c d
e f
g h
a b c d e f g h
a b c d e f g h
b
c d
e f
g h
i
j
k
L
i
j
k
L
m
n
m
n
28 min
i
j
k
L
i
j
k
L
m
n
m
n
time: 31.9 min
Peptide deconvolution
1 Da
1 min
2+
2+
4+
2+
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Outlook
• Visualise LC-MS data
• Detect signal
•
Align LC-MS runs
• Match images (relative quantitation)
• Add identification results
• Quantitation with identification results
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Alignment and comparison
• Align images via landmarks (corrections for
local deviations)
• Match images (pair peaks together)
• Report relative quantification information
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Alignment
620 624 628 632 m/z
transformation
4 min
P-A Binz, Atelier Proteomique Quantitative, juin 2007
A
-
B
1
min
2 Da
B
A
1
min
2 Da
Migration variability
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Outlook
• Visualise LC-MS data
• Detect signal
• Align LC-MS runs
•
Match images (differential analysis)
• Add identification results
• Quantitation with identification results
• Protein Mixture
– 32-45 kDa fraction of lysate from a culture of a
B-cell line
– ~ 1 pmol
– up to 180 proteins detectable in this sample
when analysed extensively by LC-MS/MS
10 Da
5 min
+26 fmol
+83 fmol
+520 fmol
BSA
Quantitation
740.35 (2+)
LGEYGFQNAL
P-A Binz, Atelier Proteomique Quantitative, juin 2007
2 Da
2 min
+26 fmol
+83 fmol
+520 fmol
Quantitation
3+
3+
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Quantitation
+26 fmol
+83 fmol
+520 fmol
P-A Binz, Atelier Proteomique Quantitative, juin 2007
5 min
20 Da
BSA
BSA+Lyz
Differential (low resolution)
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Differential analysis
A
B
A
B
A
-B
100 Da
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Differential analysis
A
A
-B
2 Da
Outlook
• Visualise LC-MS data
• Detect signal
• Align LC-MS runs
• Match images (differential analysis)
•
Add identification results
• Quantitation with search engine
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Coupling with
identification
• Sofar, quantitation without consideration of
molecular interpretation
• To quantitate protein, need to select signals
and to couple with peptide identification
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Phenyx
A software platform dedicated to the identification and
characterization of proteins and peptides from
mass spectrometry data
•
Developed by GeneBio, in collaboration with the Swiss
Institute of Bioinformatics (SIB)
•
Launched in September 2004 (version 1.8)
•
Version 2.3 in April 2007
•
Rapid development and recognized tool
•
Integration in a number of third-party software (Scaffold,
TPP, MSight, ProteinScape, Proteus LIMS, …)
•
Adopted by a number of large renowned Proteomics centres
http://www.phenyx-ms.com
http://phenyx.vital-it.ch/pwi
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Some features
Core calculation
Robust and flexible scoring
including
log likelihood measures
Conflict resolution
algorithm
Use of annotations
in databases (
PTMs, variants
,
AA modifs
…)
Flexible and interactive interface: the “
Phenyx Web Interface
”
User and jobs properties
(user privileges, job sharing)
Manual validation
functionality
Import
third party jobs (Mascot, Sequest, X!Tandem, Popitam, …)
Many
exports
(native Phenyx, Excel, XML, text…)
Results comparison
functionality
Integration
of Phenyx into workflows: a job follows a suite of
configurable events (pre-processing, processing and
post-processing)
http://www.phenyx-ms.com
http://phenyx.vital-it.ch/pwi
P-A Binz, Atelier Proteomique Quantitative, juin 2007
The Phenyx Web Interface:
Excel, xml and text exports
Desktop
Results
views
Submission
Management console
Results comparison
http://phenyx.vital-it.ch/pwi
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Integrate MSight and Phenyx
• Example: Annotate LC-MS images with peptide
identifications
Raw
LC-MS
Peaklists
Exported
peptide
identifications
Annotated images
Phenyx interface
Phenyx results are stored as
annotations in the images
P-A Binz, Atelier Proteomique Quantitative, juin 2007
LC-MS and MS/MS: undersampling
621 m/z 655 21.15 Time [min] 34.85LC-MS and LC-MS/MS on a QStar of 49-62 KDa SDS separated and
trypsin digested proteins, from a human B-cell line
Focus on a small time x m/z region
(about 1/250 of the full run)
P-A Binz, Atelier Proteomique Quantitative, juin 2007
LC-MS and MS/MS: undersampling
621 m/z 655 21.15 Time [min] 34.857/40 peptides analysed
3/7 identified
< 10% positively identified using stringent criteria
FFADLLDYIK
SLDLDSIIAEVK
LALDLEIATYR
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Outlook
• Visualise LC-MS data
• Detect signal
• Align LC-MS runs
• Match images (relative quantitation)
• Add identification results
•
Quantitation with search engine
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Quantitation with search engine
• Use of MS/MS data
– Reporter ions: isobaric labeling (iTRAQ, TMT)
– emPAI (~ratio observed/predicted peptides)
– Multiplex (SILAC, 18O)
• Use of MS raw traces
– Stable isotope labeling (ICAT, SILAC, AQUA, 18O, ICPL, …)
– Label-free
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Quantitation: needed
information
• Need identified peptides
• Need access to intensities (MS/MS and MS)
• Need quantitation method
– Labeling method (fixed, variable mode)
– Definition of “pairs”
– Intensity correction factors
– Thresholds for what peptides to consider (confidence levels,
scores, #pep / protein)
– Create report, calculate ratios, evaluate outliers
– Include in search engine GUI
A quantitation module for
Phenyx
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Generic
Quantitation
methods
Generic
Quantitation
methods
Prediction of
Co-peptides
Prediction of
Co-peptides
Extraction of
Intensities:
MS level
Extraction of
Intensities:
MS level
Extraction of
Intensities:
MS/MS level
Extraction of
Intensities:
MS/MS level
+
Calculation
of ratios;
exportation
Calculation
of ratios;
exportation
A quantitation module for
Phenyx
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Quantitation
module
Quantitation
module
API
InSilicoSpectro
PhenyxPerl
API
InSilicoSpectro
PhenyxPerl
Quantitation
Result file
(text)
(Phenyx)
result file
Labeling
config file
(xml)
InSilicoDef
definition
file (xml)
External
statistics ( R )
External
statistics ( R )
One possible integration with
MSight (label-free)
Raw LC-MS Peaklists Exported peptide identifications Annotated images Raw LC-MS Peaklists Exported peptide identifications Align, compare Annotated peptide ratiosP-A Binz, Atelier Proteomique Quantitative, juin 2007
Phenyx: generate reports from
identification results
Perl scripts to generate many kinds of exports
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Example for iTRAQ
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Examples of filters and search
parameters that alter
quantitation results
• Minimal number of peptides per protein
• Minimal number of proteotypic peptides
• Minimal score for each peptide
• Filter on redundancy
– same sequence (same or different charge states)
– same exact primary structure,
– Imbedded sequences (missed-cleavages, etc.)
• Remove outliers (quant values > threshold CV)
• Number of missed cleavages allowed
• Semi-tryptic peptides and fully unspecific cleavages
• Number of queried modifications
Only valid peptides:
6 proteins, 22 peptides
4 proteins, 19 peptides
Min. 3 valid peptides:
Min. 3 valid peptides, Intensities
>10’000: 4 proteins, 15 peptides
Min. 3 valid peptides, Intensities >10’000,
CV<20%: 2 proteins, 7 peptides
Effect of filters
2
7
+ CV
4
15
+ Intensity
4
19
+ 3 peptides
6
22
Z-score
# proteins
# peptides
Filter
P-A Binz, Atelier Proteomique Quantitative, juin 2007
# peptide in decoy database
# peptide in forward database
False discovery rate export
Number of valid hits as fct of zscore
0 2000 4000 6000 8000 10000 4.0 6.0 8.0 10.0 12.0 14.0 z-score # h it s True hits Hits in reverse
FDR (hits in rev / hits in fw d)
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 5.0 6.0 7.0 8.0 9.0 10.0 z-score F D R (h it s in r e v / h its i n fw d )
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Calibration status of instrument
(3 datasets)
Calibration status of instrument
3 5 7 9 11 13 15 17 19 -0.6 -0.4 -0.2 0.0 0.2 0.4 de lta m/z zs c o re
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Effect of the search parameters
1rnd,
Only 3 fixed mods
131 valid,
75% cov.
2rnd,
Add variable mods
205 valid,
84% cov.
P-A Binz, Atelier Proteomique Quantitative, juin 2007
2rnd,
With all mods
And half cleaved
348 valid,
90% cov.
Import jobs into Phenyx
Mascot
X!Tandem
Sequest
Phenyx
Manual validation and then quantitation as if Phenyx job
Results comparison tool
What protein in what job?
What peptide in what protein/job?
Concatenate results from different runs/search engine
And then go to quantitation…
Summary
• LC-MS data and 2D image analysis (MSight)
– Rich source of information
– Detect strange behaviors (discontuity, contaminations, QC
issues)
– Use of 2 dimensions efficient for signal detection
– Alignment of multiple MS runs: consider local aberrations
– Quantitation possible for pairs and for groups (statistics)
• Quantitation with protein identification tool only
(Phenyx)
– Quantitation methods limited to information in peaklists
(isobaric labeling, emPAI, Multiplex)
• Quantitation with MSight and Phenyx
– Get access to raw data information
– Full panel of quantitation methods
– Need tight integration (annotation, statistics, filters)
– Thanks to import functionality, access to other search
engines
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Take-home messages
Biological variability
Experimental variability Error to appreciate
Quantitation method tolerance
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Many tools available, make your choice according to:
biological question
capacity to analyse data from the chosen quantitation method
capacity to analyse data from your instruments
possibility to validate generated data (interactivity)
Understan
d, evaluat
e
Understan
d, evaluat
Aknowledgements
• Phenyx devel team
– Alexandre Masselot
– Nicolas Budin
– Anne Niknejad
– Olivier Evalet
• PIG group
– Ron Appel
– Daniel Walther
– Gerard Bouchet
– Sébastien Catherinet
– Stéphane Pelhâtre
– Patricia Palagi
• BPRG
– Ali Vaezzadeh
• PAF
– Manfredo Quadroni
• University Bern
– Manfred Heller
• IPBS
– David Bouyssié
P-A Binz, Atelier Proteomique Quantitative, juin 2007
Thank you for your attention!
MSight: http://www.expasy.org
Phenyx: http://phenyx.vital-it.ch/pwi