CHAPTER 3. Increasing Peak Capacities for Peptide Separations Using Long
3.2 Materials and methods
3.3.8 Separations with long columns
The greatest benefit from having the ability to run ultrahigh pressure separations was observed when running with a long column. In the red chromatogram in Figure 3.13, the standard protein digest was separated on a 44.1 cm x 75 µm ID column with 1.9 µm BEH C18 particles at 15 kpsi. The blue trace was from a 30 kpsi separation of the same sample on a 98.2 cm x 75 µm ID column with 1.9 µm BEH C18 particles. By increasing the pressure, the flow rates and run times were similar between the two separations. As evident from inset graph, the width of a representative peak decreased at higher pressure yet peak intensity remained the same. Several gradient volumes were run on the 98.2 cm column. The results are summarized in Table 3.5 and Figure 3.14 which also includes data from a shorter commercial column run on the standard nanoAcquity. By increasing the operating pressure, the peak capacity increased for separations on a longer column in the same amount of time as separations on a shorter column at
lower pressures. Also, the peak capacity plateaued at a higher value for the longer columns than the shorter columns.
The E. coli digestion standard was also run on the 98.2 cm column at varying gradient volumes as seen in Figure 3.15. An enlarged view of a portion of the longest chromatogram is shown in Figure 3.16. The return of the signal to baseline between several adjacent peaks demonstrated the gain in resolution from using long columns at elevated pressures and
temperature for proteomics analysis. The number of peptide and protein identifications plotted in Figure 3.17 was higher for separations on the modified UHPLC than the commercial system with an increase of nearly 50%. However, there was little difference in the number of protein
identifications between the 98.2 cm column run at 30 kpsi and the 44.1 cm column run at 15 kpsi even though the 98.2 cm column had a larger peak capacity.
The number of protein identifications is not the only metric by which to compare the results of two proteomics analyses. Improvement of protein coverage, or the percent amino acid sequence coverage, can also describe the merit of the experiment. For a large data set containing hundreds of proteins, comparing the coverage for each protein is not straight forward. For example, reducing protein coverage to an average can be misleading. The additional proteins identified in a separation with higher peak capacity were usually of lower abundance and had lower coverage, bringing down the average. Alternatively, comparing only proteins identified by both methods would limit the analysis to only easily detectible proteins which usually had higher coverage and, thus, mute the difference between the methods. Herein, an original method to compare protein coverage based on the mathematical concept of a normalized difference is described. We named this metric the normalized difference protein coverage (NDPC) and define it as the difference in coverage of a protein found in two methods divided by the sum of the
coverage. For example, consider the protein pyruvate kinase, which is involved in E. coli glycolysis.38 For a 360 minute separation, pyruvate kinase had 47% coverage on the 98 cm column and 27% coverage on the 44.1 cm column. The NDPC is 0.27 as calculated in Equation 3-6.
NDPC Coverage1-Coverage2 Coverage1 Coverage2
4 -2
4 2 .2 (3-6)
The Normalized Difference Protein Coverage (NDPC) is plotted in Figure 3.18 for each protein identified with the 360 minute gradient separation. If a protein was identified with higher sequence coverage from the separation on the 98.2 cm column run at 30 kpsi, its NDPC value was positive (blue bars). The red bars signified higher coverage with the separation on the 44.1 cm column at 15 kpsi. Proteins were plotted in order of decreasing coverage i.e. proteins wither higher coverage were plotted on the left and proteins with lower coverage on the right.
Differences in coverage were minimal for highly covered proteins. As protein coverage decreased, more proteins had higher coverage with the 98.2 cm column. Similar comparisons were made for the 90 minute and 180 minute gradient separations and can be found in
Appendix B.1. and Appendix B.2., respectively. To provide a better visual of the trend in coverage, the protein identifiers were removed from the graphs, and the NDPC were plotted in Figure 3.19. parts a, b, and c for the 90, 180, and 360 minute gradient separations, respectively. As evident by the larger portion of blue bars in part c, the greatest improvement in coverage between the long and shorter column was with shallowest gradient.
In an attempt to further simplify the comparison of coverage between multiple methods, while maintaining the meaning of the values, we propose the Grand NDPC which is calculated by the difference between the grand total protein coverage in method one and method two
normalized by the grand sum of protein coverage in both methods. A formula for the Grand NDPC is shown in Equation 3-7:
rand NDPC (∑Coveragemethod 1)-(∑Coveragemethod 2)
∑Coveragemethod 1 ∑Coveragemethod 2 (3-7)
Perhaps a more relevant interpretation of the Grand NDPC would be to relate it to a fold- change improvement in coverage as follows:
Fold-Change in Coverage ∑∑Coveragemethod 1 Coveragemethod 2
1 rand NDPC
1- rand NDPC (3-8)
If the Fold-Change was less than one, the negative reciprocal of the value was used as is conventional with fold-change calculations. The Grand NDPC and Fold-Change in Coverage is listed in Table 3.6 for the E. coli digest standard 90, 180, and 360 min gradient separations on the 98.2 cm column run at 30 kpsi and the 44.1 cm column at 15 kpsi. Positive values represented higher coverage on the long column, and negative values represented higher coverage on the shorter column. Grand NDPC and Fold-Change Coverage increased in favor of the long column as gradient length increased.