Joselin Noirel provided key insights into the tag-merging techniques and conceived of the proteomic data analysis method described in this section. The datasets used in this section
4.6. MERGING TAG-BASED EXPERIMENTS 147
were generated by Filipe Pinto (IBMC, Portugal) and Narciso Couto (Sheffield, UK); and Bagmi Patternak and Pia Lindberg (Uppsala, Sweden).
4.6.1
Abstract
Within tag-based quantitative proteomics, the experimenter is typically chained to the number of samples they are able to compare within a single experiment. With some experimental design consideration, it is possible to chain multiple experiments together in a manner where the results being compared are not only meaningful on a qualitative level – proteins that are in higher or lower abundance – but also on a quantitative level – the levels of fold-change also being proportional between experimental studies. In this section, a number of these methods are explored and compared, utilising median correction and mean-ratio scaling. Experimental datasets exemplifying each different case are shown, demonstrating the improvements gained by each step of processing.
4.6.2
Introduction
There is currently something of a progressive arms race taking place between the com- panies that produce isobaric tag reagents. Originally, tandem mass tags (TMT) enabled simultaneous investigation of two samples within a single experiment, and then improved upon these reagents to enable the analysis of 6 samples simultaneously (Thompson et al., 2003). Shortly after this tagging technology was released, the rival label producers iTRAQ generated a 4-plex set of reagents which came coupled with a bioinformatic processing tool from Applied Biosciences (now ABSciex) (Ross et al., 2004a). This led to the iTRAQ reagents becoming far more popular that the TMT reagents, despite coming later to the market.
On the back of its relative success, iTRAQ released the 8-plex tag. In the mean-time, TMT has been buoyed by becoming the platform-standard isobaric tag for Thermo in- struments, and have recently announced that they have reagents capable of up to 10- plex analysis. Whilst this ever-increasing sample multiplexing may be something that the research-consumer is interested in, there are significant limitations with this ever- expanding multi-plex capacity, as pooling more samples comes at a cost of direct signal identifications – for a more in-depth discussion of this phenomenon, please see chapter 4. Beyond physically creating new chemistry for generating these isobaric tags, it is mathem- atically feasible to increase the multiplexing ability of these tags through the combination of multiple experiments into a single analysis. Although technically simple to perform, such a combination can be challenging, as it depends on experimental design with this
combination in mind before starting the experiment. In some cases, experimental data- sets can be tied together from separate experiments, however to produce meaningful results, these datasets should be generated from the same mass spectrometer with the same protein extraction procedures and the same experimental conditions during growth. Failure to adhere to these requirements – whilst philosophically feasible, since cells should respond to the same conditions in the same manner – is practically very challenging. The difficulty in this type of analysis is that it makes finding a cohesive set of changes within the proteome challenging, particularly at the finer level of protein regulation where the interesting findings are observed. The major issue is that there are multiple ‘ground states’ that a cell can exist in, which are controlled by kinetic features rather than fixed ‘pathways’. This means that whilst broad-spectrum changes – such as the up-regulation of carbon uptake machinery or down-regulation of antennae proteins in Synechocystis – are conserved across different analyses as they are kinetically the most favourable inputs to the system; smaller-scale changes such as responses in metal-binding proteins might not be observed systematically. This is not to say that the finer-scale changes are not informative for a researcher – indeed they give valuable insights into the control processes present within the organism being studied; however it can be more challenging to observe identical responses when looking at the system coming with minor variations in starting conditions. These features are usually grouped together into a pseudo-random variable referred to as ‘biological variation’, unexplained changes that occur in a systematic manner at a level that can’t be reliably observed. Such issues can lead some researchers to state that ‘biology can’t be modelled, as biological systems are unpredictable’ – a statement that is clearly incorrect or else scientific endeavours into biology would be consistently fruitless! When merging different methods together, assuming that the data is consistently pro- duced and analysed in the same experiment, there are a number of approaches that can be taken. Regardless of the method used to merge multiple experiments, there are still a number of weaknesses with such an approach compared with an increased multiplex capacity that is chemically generated.
Firstly, and most prominently, is the shared protein requirement for analysis. ‘You can’t compare what isn’t there’ is a strong mantra in proteomics, and whilst techniques can be attempted to account for missing labels in multiplex experiments (discussed in the previous section of this chapter), if the protein to be compared is not present in both analyses then that usually signifies the end of the analysis. This also presents a chain- weakness in the analysis, where if one dataset presents with far lower quality data – ie. fewer protein identifications – than the others, then it limits the maximum quality of the overall data. Arguably, this effect is a suitable compromise to enable comparisons that could otherwise not normally be made; and it can be mitigated by the increased rate of observation that takes place in lower multiplex observations – notably the difference in
4.6. MERGING TAG-BASED EXPERIMENTS 149 identified peptides between iTRAQ 4-plex and iTRAQ 8-plex can be as high as a 50% reduction in the 8-plex observations.
Secondly, whilst proteins might show the same direction of change, the magnitude of such a change is not always consistent. For example, a protein in two independent analyses shows a statistically significant up-regulation in 2 different experimental conditions, how- ever in one the fold-change is 2, whilst in the second it is 10. This can be caused by a number of features that are not actually derived from the proteomic state of the organ- isms being investigated, including fewer peptide identifications in one dataset compared to another, additional noise contamination, or stronger ratio compression effects. The effects of this can have a number of consequences for the data, since the fold-change is inherently linked to the statistical significance.
Finally, since shared samples are needed across experiments in a number of designs, the full range of the multiplex comparison is not additive. For two 4-plex iTRAQ experi- ments, a and b, a completely independent pair of experiments would be {a1, a2, a3, a4}
and {b1, b2, b3, b4}, where a and b refer to the iTRAQ experiments and the subscript
refers to the sample number. The same experiments with a shared control would be {a1, a2, a3, a4} and {a1, b1, b2, b3}, where a1 is the shared control and a total of 7 unique
samples are compared. This is further reduced if an additional control is shared between the samples. Adding additional multiplex experiments into the design continues this trend, due to the consistent requirement for the shared sample, and so the number of experimental comparisons available in a given number of experiments would be:
(lab − con)exp + con
Where lab is the number of labels in the multiplex, con is the number of controls shared between the experiments, and exp is the number of separate multiplex experiments being merged.
In this section, three methods for merging labels are described – all with 8-plex iTRAQ analyses. The first utilises two shared samples quantified against a single label to determ- ine ratios, the second uses a similar comparison but instead of being quantified against a single label the samples are quantified against the mean quantification from both of the controls. The final method takes the second method even further, removing identical controls entirely through a direct repetition of the experiment and normalising the exper- iments using the mean value of all the labels in the experiment. This method essentially sacrifices the maximum number of conditions to be compared for a greater increase in the number of experimental replicates. To close this section, an experimental design is proposed where 12 conditions, each with 2 replicates, are compared over 3 iTRAQs with an external reference comparison being conducted in a TMT 6-plex.
4.6.3
Methods
The simplest method to attain this type of comparison is to perform two independent analyses which share a control condition, where no further data analytics is performed.
an
ac
,bn ac
Where anrefers to the set of all labels in the first iTRAQ, bn refers to the set of all labels
in the second iTRAQ, and ac is the control sample which is identical in both iTRAQ
experiments. The protein lists are then filtered to only contain aprot∩ bprot, or proteins that appear in both iTRAQ a and b.
Protein Quantification
The datasets are then analysed in a standard manner for consistent statistically up and down regulated proteins between the experimental conditions from both analyses. This is the method used to detect significantly changing proteins and is used in other studies within this thesis in chapter 5 and the appendix. Initially, all intensities are converted to ratios through division by a control sample. This is done to convert the values obtained for each peptide to a relative change, rather than an absolute quantification – since the intensity of a peptide signal in the MS2 scan is not necessarily proportional to the amount
of protein present and can be affected by peptide-specific effects that distort the values. The standard method for quantifying proteins is to generate a relative ratio to a single control sample. The major issue with this in a typical ratio-derived investigation, is that the noise or variance in a single label is completely collapsed. This method masks cases where protein measurements in the control sample are erroneous or missing. Alternative methods to this are described below.
These ratios are then log-transformed. Without this transformation, half of the observed values lie in the range of 0 – 1, whilst the other half lie between 1 – ∞; whereas afterwards the data is split evenly around 0.
For example, take the values 1 and 100. If 1 is the control sample then the ratio between the two is 100, whereas if 100 is the control sample then the ratio is 0.01; demonstrating that these values are unevenly distributed in linear space. Following a log transform, these ratios become:
log10[100] = 2 log10[0.01] = −2
4.6. MERGING TAG-BASED EXPERIMENTS 151 transformation to obtain the original ratios again.
Identifying statistically significant changes
To determine an up-regulated protein, aup ⊂ aprot, for a given protein prot, under exper- imental conditions x and y,
aup= protx− proty σp q 2/n ≤ test
Where σ is standard deviation, n is the number of quantifications, test is the statist- ical cut-off decided by the experimenter for significance (traditionally 0.05 for arbitrary reasons), and σp = s σ2 x+ σy2 2
The order of protx and proty is reversed to identify a statistically significant reduction in protein level between two samples.
In the majority of cases in proteomics there are multiple replicates included within the same experiment. There are two methods for resolving the multiple comparisons gen- erated in this case, the first is to merge the peptide quantifications from the replicates together when performing the statistical test, although this isn’t recommended due to collapsing the hierarchical error associated with both the physical experiment and the extraction protocol.
An alternative method is to consider each replicate independently, running multiple com- parisons – for example with 2 replicates this generates 4 separate T-test P-values. De- pending on how much stringency is wanted in the down-stream analysis, either the highest P-value can be taken of the four (highest stringency), or the median of the 4 values for a more relaxed stringency. If the latter method is being used, then P-values should be converted into Z-scores before being combined, to remove sample-specific bias (Pascovici et al., 2015).
A z-score is simply a P-value measured by how many standard deviations (σ) it falls away from the mean of the data. This conversion ensures that the averaging is performed on a linear scale, and avoids further bias from being introduced to the data. The z-score is calculated as
z = xi− protx σx
Where for a given protein x, xi is the label intensity, protx is the mean intensity of the
Merging by statistically significant proteins
Using one of the methods mentioned above, a complete list of proteins that are found to by significantly ‘up’ or ‘down’ between a test case and some control sample – usually the WT – are compiled for each iTRAQ. The control sample between the two iTRAQs are considered to be identical – indeed these samples are often the same samples in both studies, although they do not necessarily have to be. As a result, samples from different iTRAQs can be compared not directly to each other, but to their statistical differences to the control sample, as exemplified in (Mota et al., 2015).
Scaling by control samples for quantification
Cases where the protein changes move in the same direction for each of the samples
(aup∧ bup) are considered to be unchanging, however where the signs are different and
the proteins are present in both can be considered different. Cases where no significant change was observed are more difficult to assign, as they lack conclusive evidence either way.
A more advanced version of this method was published as part of (Pinto et al., 2015), where two control samples were added to each of iTRAQs being combined, as per the standard method. Briefly, the paper was investigating the effects of stable integration of protein production constructs into 5 putatively neutral sites within the Synechocystis proteome. The aim of this study was to ensure that there were no significant background effects occurring as a result of this integration across all of the sites, but since replication was required there were more test cases than iTRAQ labels available within an 8-plex. The full investigation report that was produced for this work is available as an appendix to this thesis.
To maintain the variation present within every label, instead of being divided by one of the control samples, each label was divided by the mean of the two control values:
msxi
µ(msxc1+ msxc2)
Where in a given MS2 scan x, ms
xi is the vector of the label intensities and µ(msxc1+
msxc2) is the mean of the control intensities.
Between the two iTRAQ experiments, the values were normalised against each other to enable inclusion of the quantitative data as follows. Following log transformation of the values:
iT RAQb,x,i×
µ(iT RAQa.Control,x,i)
4.6. MERGING TAG-BASED EXPERIMENTS 153 Where for each protein x, iT RAQb is the full set of labels for each protein in the second
iTRAQ experiment, µ(iT RAQa.Control) is the mean of peptide quantifications for each
protein in the ‘control sample’ labels in the first iTRAQ, and µ(iT RAQb.Control) is the mean of the peptide quantifications for each protein in the ‘control sample’ labels in the second iTRAQ.
This simple scaling vector, when applied to the second iTRAQ, is sufficient to cluster the proteins – a full break-down of this process in code form can be found in the appendices.
Scaling by mean protein intensity
Initially, iTRAQ datasets were sub-setted to only contain proteins present in both exper- iments. The data within each of the labels was then median-corrected as follows:
iT RAQy,i−iT RAQg y
iT RAQy.0.6− iT RAQy.0.4
Where iT RAQyi is the vector of relative protein quantifications for label y in a given
iTRAQ,iT RAQg y is the median value in that vector, and iT RAQy.0.6 and iT RAQy.0.4 are
the 60% and 40% quantiles, respectively.
As described above, the protein data is ratio converted prior to log-transformation; how- ever instead of using a single control sample or the mean of control samples, the data is divided by the mean of the entire set of labels measured in the iTRAQ for each peptide,
pepti × µ(pepti), where pepti is the list of label ratios for a given peptide and µ is the mean. The two iTRAQ experiments were then scaled against each other with a slightly modified scalar to the one used for control samples above:
iT RAQb,x,i×
µ(iT RAQa,x,i)
µ(iT RAQb,x,i)
Where iT RAQx,i is the vector of the label intensities for each protein x, in iTRAQ a and
b, and µ is the mean.
4.6.4
Results
The control scaling and mean protein intensity scaling methods are shown here. For the control scaling methods, the data was clustered after being scaled by control sample. Since the method used identical samples across the two iTRAQs as the control, and the datasets were scaled by mean of these two samples, the control samples from the first experiment show very close clustering with their counterparts in the second experiment
experimental variation for an identical sample is negligible. The other interesting finding with this analysis was the consistent clustering between the biological replicates across the two experiments, with the test cases clustering against each other more closely than they did against the control samples.
This observation led to the question of whether the requirement for identical samples across multiple experiments was truly necessary, or whether replicates were sufficient. A follow-up experiment, therefore, was conducted on two experimental iTRAQs that had no identical shared samples, but were full experimental replicates of the same experiment. These samples were treated slightly differently to the case used in Pinto et al, as no individual labels stood out as controls – since all the experimental replicates theoretically had an equal similarity between their experimental counterparts. For this experiment, the samples were therefore controlled against the mean of all intensities after median correction (fig. 4.10 p. 156) – as described in the methods.
Following median correction, the data were scaled to balance the measured intensities – as described in the methods. These were checked visually with a box-whisker chart to ensure that the correction process had not disrupted the median correction applied in the previous step, as shown in fig 4.11 (p. 157). As can be seen from the data, the median for each label remained stable, however in certain cases the tails of the data moved. This was to be expected, as the majority of proteins within the dataset should be present around the centre of the data and should therefore not change significantly, whilst the proteins changing by a significant amount move a much greater distance when a scalar is applied. Whilst figures 4.10 (p. 156) and4.11 (p. 157) show an overview of the data, they do not show the internal quantification relationships between the individual proteins. For all datasets, the proteins quantifications were plotted against each other, as shown in fig 4.12 (p. 158). In this graphics grid, the 4 test conditions are plotted against each other, the first two rows showing internal comparisons. Note the final scatter-plot on the second row shows more variation than the other seven comparisons on the first two rows,