Materials and methods - Development and Application of Chemometric Methods for Modelling Metabo

3.3.1 Sample preparation

Samples were collected and analysed previously as part of the COMET-2 project141, 142 _{and were} stored at -80 ◦C. Rats were administered 10 mL/kg vehicle (0.9% saline, n = 8) or galN (galactosamine hydrochloride dissolved in 0.9% saline to give a free base concentration of 41.5 mg/mL, n = 8).140 _{Urine specimens were collected for 24 hr after administration of galactosamine or} dosing vehicle, and plasma was obtained upon sacrifice, 24 hr after administration, as described previously.140 _{Urine samples were thawed, vortexed and allowed to stand for 10 minutes prior to} mixing aliquots (400 µL) with phosphate buffer (200 µL, 0.2 M containing 10% deuterium oxide (D₂_{O), 3 mM tsp (3-(trimethylsilyl)-[2,2,3,3-}2H₄]-propionic acid sodium salt) and 3 mM sodium azide) and centrifuged at 13000 rpm for 10 minutes. Samples (550 µL) were placed in 5 mm outer diameter nmr tubes. Stored plasma samples, previously prepared and in 5 mm outer diameter nmr tubes140 were thawed and used for analysis.

3.3.2 _H

_NMR

_spectroscopy

Proton nmr spectra were acquired on a Bruker Avance-800 spectrometer operating at an 800.32 MHz 1H frequency and a temperature of 27 ◦_{C. D}

2O provided a field frequency lock and tsp provided a chemical shift reference for the urine samples. Urine and plasma ‘1D’1_{H nmr spectra} were acquired with water peak pre-saturation using the pulse sequence (d1–90x–4 µs–90x–tm–90x– acquire fid). For each sample, 64 transients were collected into 64k data points using a spectral width of 16025 Hz, with a relaxation delay (d1) of 2 s, an acquisition time of 2.04 s and a mixing time (tm) of 100 ms. The water resonance was selectively irradiated during d1 and tm. cpmg1H

nmr spectra of urine and plasma, acquired with the pulse sequence (d1–90x–[τ–180y–τ]n–acquire

fid), with a spin–spin relaxation delay 2nτ of 102.4 ms (τ = 0.4 ms, n = 128) were collected with 64 transients and a d1of 2 s into 64k data points with a spectral width of 16025 Hz and an acquisition time of 2.04 s. The water peak was irradiated during d1. The2Djres urine and plasma1H nmr spectra were acquired with the pulse sequence, (d1–90y–τ–180–τ–acquire fid, with suppression of the water resonance during d1) into 64k data points in F2 with a d1 of 2 s and 8 transients using 32 increments of τ; the spectral widths in F2 and F1 were 16025 and 50 Hz, respectively.

Initially, the magnet could not be shimmed adequately for the urinary data, which could be reduced by equilibrating the sample in the nmr machine for about 30 minutes before acquisition. Some samples showed bad shimming and peak splitting, especially evident in the jres projections, which was resolved by shimming in the ‘gs mode’ of the jres experiment. The replicated spectra were plotted and the highest quality spectra were selected. Two of the urine samples were reprepared (with an extra centrifuge step and careful pipetting, resulting in a dilution). The repreparation, however, did not reduce the need for the sample to settle. The main result of repreparation is dilution of the sample, which can easily be compensated for by using correct normalisation methods. All samples have gone through a number (3–5) of freeze-thaw cycles.

3.3.3 Data processing

A line broadening function of 1 Hz and one level of zero filling were applied to all1D _{and cpmg} spectra prior to Fourier transformation (ft). Automated phase and baseline correction and referencing to tsp (δ 0.000) or α-glucose (δ 5.233) resonances for urine and plasma, respectively, were performed using in-house software (NMRproc v0.3, Drs. T.M.D. Ebbels and H.C. Keun, private communication). jres data were zero filled to 128k data points in F2, and up to 256 increments in F1. Apodisation of the jres data was carried out using an unshifted sine-bell function in both dimensions prior to ft. As is necessary, the absolute value spectra were calculated. Data were tilted, symmetrised with respect to the horizontal through the centre of F1, and skyline or sum projected as indicated. The jres projections were referenced and baseline corrected as above.

All spectra were imported to matlab (R2008a, The Mathworks, MA, USA) and data were interpolated to form a vector running from δ -1.0 to δ 10.0 in steps of δ 0.00025. This maintains the

resolution of the spectral data while allowing for increased accuracy in chemical shift referencing. Following removal of the tsp, water and urea regions, the data between δ 0.1–4.5 and δ 6.0–10.0 were used for further analysis (resulting in 44001 data points per spectrum) and subjected to probabilistic quotient normalisation, see §2.3.1.82

There were six data sets in total: urine and plasma data for each of the three types of nmr experiment (1D_{, cpmg and jres). To investigate the effect of peak alignment, recursive segment-} wise peak alignment116 _{was performed, where the reference spectrum for each set of 16 samples} was chosen to be the one with maximum correlation with the other spectra for that particular experiment and biofluid type.

3.3.4 Signal-to-noise ratio and line width analyses

The signal-to-noise ratio (s/n) for different processing options of the jres projections was calculated as the mean ratio across the 16 samples of a given peak maximum with respect to the standard deviation of noise (δ 0.2–0.5 for jres (1201 data points), δ 9.8–10.0 for cpmg (800 data points)). The s/n was calculated for the alanine methyl signal at δ 1.47 and the lactate methyl signal at δ 1.33 in both urine and plasma and the tsp singlet at δ 0.00 and the succinate singlet at δ 2.41 in urine; for the cpmg and untilted jres spectra, the maximum value of the lactate and alanine doublet signals was used. It should be noted that the term noise is not absolutely correct in jres spectra since it is not truly random having no negative components in the absolute value mode, prior to any baseline correction.

The line widths of the jres projections were approximated as the median of the full width at half height. The line widths were calculated for the lactate methyl singlet and the alanine methyl singlet in plasma and urine in addition to the tsp and succinate singlets in urine; for the untilted data, the higher frequency component of the lactate and alanine doublets was used.

3.3.5 Statistical and chemometric analyses

Various multivariate modelling methods and univariate correlation approaches were employed:

• pca86 _{models (see §2.3.3) were generated using mean-centred spectral data to evaluate the} data before and after alignment. The direction of differentiation between the scores of the two groups was used to evaluate the pc at which separation occurred.

• To evaluate if alignment removed orthogonal variation, opls93 _{models (see §2.3.5) were} constructed with in-house software on mean-centred data scaled to unit variance, with a dummy vector representing the two classes (encoded as 0 = control, 1 = dosed).

• Pearson’s correlation coefficient (r) was calculated between spectra to estimate the similarity between different samples and experiments. The correlation r(s1, s2)is calculated based on all data points in two spectra s1 and s2.

• Statistical total correlation spectroscopy (stocsy)67_{displays the Pearson correlation between} the signal intensities Xδ at each resonance frequency δ and a reference variable of interest,

see §2.3.7. The used reference values were: the alanine CH₃ resonance (for doublets the high frequency peak was used) and the creatinine CH₃ resonance at δ 3.05 (for figure 3.12), or a dummy vector representing the class (in figures 3.6 and 3.11).

3.3.6

STORSY

: statistical total regression spectroscopy

Peak intensities from the same molecule should, in principle, approximate the proportional ratio of the contributing protons. Whether this proportionality can be used for improved correlation analysis is investigated by using a regression of two signal intensities from the same molecule across a series of samples, which is a method coined statistical total regression spectroscopy (storsy). It is hypothesised that calculating a regression slope of two resonances, similar to the correlation in stocsy, would aid peak identification and could differentiate significant correlations into inter- and intra-molecular correlations.143

For example, if storsy is performed for the CH-proton of lactate, one would expect the methyl group to have a relative regression slope of 3, whereas other pathway or related high correlations may have regression values high or low enough to indicate these cannot arise from the same molecule; e.g. a peak ratio exceeding 10 is unlikely to arise from signals within the same metabolite.

The most straightforward way in which storsy could be implemented in standard metabon- omic research is by performing this calculation on jres projections, since that eliminates the need to take multiplicity patterns into account, such as present in typical one-dimensional proton spectra. Theoretically, the sum projection retains the quantitative information whereas the skyline projection does not. The projections are performed along F1, which is the J -coupling direction (the vertical direction in figure 3.3). In the skyline projection, the maximum signal along F1 is retained and will be a fraction of the original intensity, dependent on the splitting pattern, hence peak integrals of the skyline projection will not correspond to the original proton ratios.121 _{With a} CH3integral set to 3, the ratio of the lactate peaks for the sum projection, which sums all signals along F1, should remain 1:3, whilst the ratio would be 0.75:3 for the skyline projection (the relative intensities for the CH quartet at δ 4.11 are 0.125:0.375:0.375:0.125, and for the CH₃ doublet at δ 1.33 these are 1.5:1.5, with the retained skyline intensity displayed bold, see also figure 2.4).

Inclusion of an intercept in the regression equation can be used to allow for baseline offset and overlapping peaks, although both are diminished in jres spectra. A verification of the proportions is to perform a regression from peak B to peak A: the slope should be the inverse of the slope from peak A to peak B. Before interpretation of the storsy values, it should be verified that the results are not affected by outliers in the peak intensity data, as these can have a large leverage effect on the regression slope value.

STORSY analysis of glucuronidation time-series

To evaluate storsy, data from a previous study were used,88_{where the data were reprocessed with} and without symmetrisation as indicated and zero filled to 128k in F2 and 256 in F1. Data were sum projected, baseline corrected (zero-order polynomial correction using Topspin, Bruker Biospin) and referenced to tsp (byNMRproc v0.3, Drs. T.M.D. Ebbels and H.C. Keun) and interpolated from δ -0.1 to δ 9.0 in steps of δ 0.00025 (36401 data points). As all data were acquired from one sample, normalisation was performed using tsp area (δ -0.02 to δ 0.02) and peak positional variability is minimal, alleviating the need for alignment.

STORSY for galactosamine toxicity study

Correlation and regression analyses were performed on the peak maxima (peak height) or peak integrals (peak area) of the CH and CH₃ peaks of lactate. Pearson correlations were calculated between the CH and CH₃ data, and a standard linear regression for the CH versus CH₃ data was calculated with and without intercept, i.e. y = bx and y = bx + a.

In document Development and Application of Chemometric Methods for Modelling Metabolic Spectral Profiles (Page 53-57)