CHAPTER 2: Current Tools & Workflows Employed for Analysis of Large-Scale Shotgun Proteomics Experiments
2.1.4. Differential Protein Expression Algorithms
Among the advocates of label-free relative quantification, most prefer to use spectral counts for evaluating protein abundances. Most of the efforts in analyzing differential protein expression, then, have focused on significance tests for determining whether two
spectral count measurements for a protein are statistically different from each other. One of the primary premises of all of these statistical tests demands knowledge or estimation of an expected distribution of spectral counts within the dataset. Despite the widespread adoption of spectral counts as the metric of choice for these tests, there is a substantial disparity within the community about what kind of distribution spectral counts do and should follow, or whether Bayesian probabilities should estimate likelihoods rather than cumulative distribution functions. Consequently, the number of normalization methods attempting to adjust spectral count distributions is as numerous as the suggestions for statistically-robust measures for evaluating differences among values. Of the more noteworthy methods to date, the beta-binomial method, generalized linear mixed effects models, quasi-Poisson method, and normal distributions have been most commonly compared.
In 2008, Choi et al from Nesiviskhi’s lab proposed the implementation of QSpec, 90
quantitation software using spectral counting to measure protein expression differences between two datasets. From their perspective, one of the biggest disadvantages in previous quantitative efforts was that the statistics relied too heavily on signal-to-noise ratios to adjust spectral count distributions within a run, causing biases that favored large differences in highly abundant proteins. They assert that signal-to-noise methods lose power because they are performed on a per protein basis, rather than taking into account all of the proteins within a replicate. While they admit that most other algorithms focus on the highly abundant proteins because they are the most reproducibly present across replicates or across samples, a primary aim of QSpec is to include a model that is robust enough to handle the absence of replicate samples. In short, QSpec uses hierarchical Bayes estimation of generalized linear mixed effects model (GLMM) where the spectral counts are considered random numbers from a Poisson distribution, described by a large population of proteins (those identified within a replicate). Therefore, regression parameters are modeled for each protein as random effects, and if replicate information is available for the protein, the coefficients are “shared” by each instance of the protein so that intrasubject variation is preserved and consistent across the dataset. Random effects
are also contextualized by every sample and for every treatment or condition. Model parameters are estimated using a Markov chain Monte Carlo method, and the number of iterations can be specified by the researcher. In particular, the treatment term, which is described as a random variable from a Gaussian distribution with inverse gamma- distributed variance parameters, is tested for significance, and if it is found not to be contributing to the description of the data, the model is “reduced.” For each protein, a significance test is performed to determine whether there is more evidence for the “full” or “reduced” model. Proteins that have more evidence for the full model are considered statistically differentially expressed.90 One of the primary disadvantages to this approach is that it requires pooling statistical information across all identified proteins. Another contentious decision is how they handle “missing” data: QSpec randomly generates a count from a Poisson distribution using its replicate’s mean. While this ensures that the protein will not be considered significantly different, its meaning is slightly different from a true-negative.
Much like the debate between single aggregate descriptive metrics versus individual, specific scores as discussed in the context of False Discovery Rates and False Positive Rates (Section 2.1.4), a similar debate exists in the context of protein quantitation. In 2009, Pham et al. proposed a beta-binomial method to describe spectral count data collected from label-free tandem mass spectrometry-based proteomics, citing their primary contribution as distinguishing between within- and between-sample variation.91 Therefore, instead of pooling statistical information for each protein like QSpec, this software attempts to identify the variation resulting from the random sampling process of each biological sample and the variation of random biological samples in a sample group. The two types of variation are modeled by the beta-binomial distribution, in which the parameters to estimate within-sample variation (binomial distribution) and between- sample variation (beta distribution) are based on a likelihood ratio test (G-test). Using the beta-binomial distribution, one can achieve comparable true detection rates of the differential expression of proteins when compared to the LPE test, t-test (with log- transformed data), and the G-test,91, 92 as well as estimate a false positive rate, which is
not possible with the LPE or t-test. An important consideration, however, is the performance of the test with multiple replicates. The beta-binomial distribution can be used if there are one-replicate comparisons, but it outperforms QSpec and performs comparably with one-way ANOVA with multi-replicate experiments.91
As a default option, most researchers prefer to use ANOVA as a test of significance in differential protein expression. Although many computational groups have suggested various other methods of testing label-free data (spectral counts in particular), ANOVA is a very straight-forward test that is not only well-understood, but it can be easily implemented through a variety of pre-existing software packages, including Excel, R, and Matlab. ANOVA tests whether the between-group variation of collected data overlaps with the expected variation within a group. ANOVA is therefore more powerful when replicates are available and more powerful when the collected data is comprised of independent measurements from a normal distribution. A log transformation of spectral counts can approximate a normal distribution, especially if the filtering criteria is high enough to retain only the most abundant (and therefore more reproducible) protein measurements. Setting the minimum spectral count and reproducibility too high may result in undesirably significant data loss. Currently there is no “gold standard” for determining which proteins pass an appropriate cutoff for identification purposes and whether an additional stringent filter needs to be used before quantifying proteins. Even if ANOVA is performed on a well-filtered dataset, the test considers a repeated measurement of 0 (no spectral counts) to be highly consistent as well as a repeated measurement of 3 to be highly consistent. A comparison of a protein that is consistently not detected in sample 1 and consistently detected as 100 spectral counts in sample 2 may not pass through filtering criteria that require a protein to be detected in each sample, eliminating this otherwise striking change in protein expression from the final report. Additional considerations are needed to ensure that the measure of protein abundance is in accord with the chosen normalization method as well as the filtering criteria employed to generate final datasets.