Development of a new HDX-MS processing algorithm

In an ideal scenario, comparisons between states in HDX-MS experiments would be between individual backbone amides of the protein structure, the highest resolution unit of the experiment, where a single deuterium uptake value per time point is obtained for each exchangeable backbone amide, in contrast to overlapping peptide level data which generates multiple uptake values per time point, for groups of backbones amides. This would allow for a simple, residue by residue comparison of deuterium uptake between states, which can be plotted readily onto protein structures for data interpretation and visualisation, avoiding the difficulties associated with developing robust data interpretation and presentation strategies for complex, overlapping peptide level data [254]_{. Indeed, HDX-MS experiments}_{can be conducted}

to provide residue level resolution by fragmenting deuterated peptides using alternative fragmentation techniques, such as ETD, and quantifying deuterium

118

incorporation at each backbone amide by analysing the isotope distributions of the resulting fragment ions [216, 255]_{. However, these experiments are plagued by poor,}

charge dependent fragmentation efficiency, often requiring the addition of non- volatile supercharging reagents to achieve adequate peptide fragmentation [255]_.

Similarly, MS source conditions must be carefully tuned to avoid transferring excess internal energy to peptide ions, and minimise gas phase scrambling of the non- covalent and labile, deuterium label [216, 256]_{. This additional experimental complexity}

is such that, despite the problematic interpretation and visualisation, HDX-MS experiments are typically performed to peptide level resolution [206, 257]_.

Numerous efforts have been made to consolidate peptide level deuterium uptake data to attain a single uptake value per amino acid, and simplify comparisons between states. One popular approach is a subtraction analysis, whereby the deuterium content of the overlapped region between peptides is calculated from the difference in deuterium content of the two peptides which overlap [258]_{. In the example presented}

in Figure 4.12, this would mean the deuterium content, and any differences in uptake between the three variants of β2m, between residues 81-93 could be assessed by

subtracting the deuterium content of peptide 94-99 from peptide 81-99. While conceptually appealing, there are several problems with this approach. Firstly, in cases where more than two peptides cover the same overlapped region, multiple subtraction analyses are possible, which can still result in multiple uptake values per exchangeable amide. For example, two additional peptides, that cover the C-terminal ~10 amino acids of β2m (other than the two listed above) were also identified (Figure 4.10), so a

simple subtraction of peptides 81-99 and 94-99 would discard these, otherwise valid, data. Secondly, recent evidence suggests that, excepting cases where almost complete deuterium retention on the overlapped peptides can be achieved (i.e. no back exchange), subtraction analyses often provide unreliable results, as peptides of different lengths but similar sequence, can form different secondary structures, such as alpha helices, under quench conditions [257]_{. The resulting differential hydrogen}

119

the deuterium uptake values for identical amide groups present on different peptides cannot be considered equivalent [257]_.

Many commercial HDX-MS processing methods, such as those in the Waters DynamX software used in our studies, circumvent this issue entirely by plotting data for all observed peptides, ordered by amino acid starting position, next to one another on the x axis, against change in absolute mass on the y axis, for each time point [259]_{. These so}

called ‘butterfly plots’, have an advantage in that they present the raw data in a minimally processed form. However, as a result, they can be challenging to interpret and visualise, as peptide lengths, and the regions of the protein each peptides covers, are not readily identifiable. Similarly, a threshold for differences in absolute mass between peptides of different states is typically set as the determinant of statistically significant difference [260, 261]_{. This can be misleading, however, as the biological}

significance of that mass change is somewhat dependent on the length of the peptide, which is not typically taken into account when measuring only changes in absolute mass. For instance, a difference of 0.5 Da in the weighted average mass of a 20 amino acid long peptide between two states, would appear, on a butterfly plot, to be of equal significance to the same mass difference on a five amino acid long peptide, despite being four times more concentrated, in terms of mass increase per exchangeable site, in the latter example.

One novel approach however, designed to consolidate peptide level data across multiple time points into a single plot, sums the mass increase for a given peptide across all measured time points, before correcting for the number of exchangeable sites in the peptide, generating a single, fractional mass increase value for each peptide observed [170, 262]_{. These values are then averaged across all peptides covering a}

120

particular amino acid in the protein sequence, to generate a single uptake value for each residue in the protein (Equation 4.2).

𝑀̅𝑗 = 1 𝑛∑ 1 𝑞𝑖 ∑(𝑚_𝑖𝑡− 𝑚_𝑖0) 𝑡 0 𝑛 1 (4.2)

Equation 4.2 Previously published HDX-MS peptide level data consolidation approaches. Where 𝑀̅𝑗 is

the mean mass increase at amino acid j, summed across all measured time points, n is the number of overlapping peptides covering amino acid j, 𝑞_𝑖is the number of exchangeableamides for peptide i, 𝑚𝑖𝑡

is the weighted average mass for peptide i at time t and 𝑚𝑖0is the weighted average mass for peptide i

at time 0.

The advantage of this approach is not only that it corrects mass increase for peptide length, but it consolidates peptide level HDX-MS data into the desired, single value per amino acid, which can be easily compared between states. However, although the usefulness of this approach has been demonstrated in numerous publications [170, 262, 263]_{, there are, perhaps, three significant limitations to this method. Firstly, statistical}

significance of any differences identified cannot be calculated, as there is, at present, no way of propagating errors in the deuterium uptake measurements of the peptides, to the final uptake value obtained for each residue. Similarly, without statistical analysis, robust methods of data presentation and visualisation, such as heat maps on protein structures, are challenging to develop. Lastly, the summation of uptake measurements across time points is such that minor differences observed over multiple time points accumulate, and may appear significantly larger in the processed data than the peptide deuterium uptake plot would suggest [170]_.

To address these issues, the following algorithm was developed to provide the desired single uptake measurement per amino acid from the overlapped peptide level data, for processing, visualising and comparing the HDX-MS data for the β2m variants, using

relative fractional uptake data exported from DynamX. This simple approach not only evaluates each time point independently, but also allows statistical analysis of the

121

processed data to determine significant differences between states, offering robust criteria for data visualisation and presentation between datasets.

𝑋𝐶 =

∑𝑧_𝑖=1𝑛_𝑖𝑋̅_𝑖 ∑𝑧𝑖=1𝑛𝑖

(4.3)

Equation 4.3 PAVED algorithm: calculating combined mean relative fractional uptake per residue. Where 𝑋𝐶 is the combined mean relative fractional uptake for a given residue at a given time point, 𝑛𝑖

is the number of replicates for peptide 𝑖. 𝑋𝑖 is the mean relative fractional uptake for peptide 𝑖 at a

given time point. Peptides 𝑖 to 𝑧 are all the peptides that cover an amino acid position, excluding the N-terminal residue of each due to back exchange.

𝑆𝐶 = √

∑𝑧_𝑖=1𝑛_𝑖[𝑆_𝑖2+ (𝑋_𝑖− 𝑋_𝐶)2] ∑𝑧𝑖=1𝑛𝑖

(4.4)

Equation 4.4 PAVED algorithm: calculating combined standard deviation per residue. Where 𝑆𝑐 is the

combined standard deviation for a given residue at a given time point, 𝑆𝑖 is the standard deviation of

peptide 𝑖 at a given time point. 𝑋𝐶 is the combined mean relative fractional uptake for a given residue

at a given time point, 𝑛𝑖 is the number of replicates for peptide 𝑖. 𝑋𝑖 is the mean relative fractional

uptake for peptide 𝑖 at a given time point. Peptides 𝑖 to 𝑧 are all the peptides that cover an amino acid position, excluding the N-terminal residue of each due to back exchange.

For a given time point and state, Equation 4.3 calculates the combined mean relative fractional uptake (deuterium uptake corrected for the number of exchangeable amides in the peptide) for each amino acid, by averaging the relative fractional uptake values for peptides which cover the amino acid in question. Equation 4.4 then uses this combined mean, and the standard deviations for each of the peptides, arising from replicate measurements, as well as variances between charge states from the same measurement, to calculate a combined standard deviation. This process is repeated for each residue in the sequence, for every measured time point, and all states (wild-type, ΔN6 and D76N in these experiments). Knowing the combined mean relative fractional uptake, combined standard deviation, and the total number of measurements for each residue (n = replicates x peptides covering the residue), statistical analysis can be

122

performed using one way ANOVA and post hoc Tukey tests, to determine significant differences, on a per residue basis, between states at equivalent time points [264]_.

Although the presence of multiple charge states for a given peptide provides multiple measurements of deuterium uptake, and were included in the calculation of both the combined mean relative fractional uptake and the combined standard deviation of each residue, it was decided to exclude the number of charge states from the calculation of n, as this would weight larger peptides, offering lower structural resolution, more significantly than smaller ones in the final calculation of combined relative fractional uptake. Additionally, the N-terminal amide of each peptide has been observed to undergo rapid back exchange, within 1-2 minutes, under quench conditions, leaving no deuterium on this residue [221]_{. It was therefore decided to}

exclude the N-terminal residue of each peptide from our analysis, and the calculation of relative fractional uptake.

This processing algorithm was designed to process output files from DynamX, and was initially written in R (original code can be found in Section 8.1) before being further developed by James Ault (MS facility manager) for faster processing speeds, and the inclusion of a graphical user interface (Figure 4.13). This software, named PAVED (Positional Averaging for Visualising Exchange Data) is now available for free download at:

123

Figure 4.13 PAVED graphical user interface. PAVED software for visualising and presenting peptide level HDX-MS data. Software and GUI created by James Ault.

In document The development of structural mass spectrometry based techniques for the study of aggregation-prone proteins. (Page 145-151)