Numerical methods
4.4. BASELINE CORRECTION 40
Figure 4.1: An example of the modified polynomial plus glass signal algorithm being applied for background subtraction - whereby the recorded background signal is combined with a fifth order polynomial until a value of C is found such that the modelled background fits directly under the Raman peaks of the original cell spectrum.
4.4.3 Extended multiplicative signal correction
An extended multiplicative signal correction (EMSC) algorithm can also be applied to Raman spectra to remove known background contaminants along with the slowly varying baseline sig-nal. (79) This algorithm computes an optimum baseline made up of an N order polynomial and a weighted contaminant signal that is recorded at the beginning of each experiment (e.g. signal from a glass substrate), as described in further detail in Chapter 8. The EMSC algorithm applies a least squares fit to (i) a reference Raman spectrum, (ii) the contaminant signal, and (iii) an N order polynomial. The weight of (i) and (ii), as well as the coefficients of the polynomial are returned by the EMSC algorithm. The reference spectrum provides a basis for all other spec-tra to be fitted, and for the purposes of this thesis, it is represented by an epithelial cell type recorded on a CaF2 substrate. Following an OLS determination of the “best fit” for a given Raman spectrum, components (ii) and (iii) are subtracted from the raw spectrum. The value of N is dataset dependent, with higher order polynomials required for accurate modeling of the baseline signal across some datasets. It has been shown elsewhere that the use of high values of N (up to N = 7) does not result in over-fitting with EMSC (80), although for many of the cases presented here, only a straight line (N = 1) is required (see Chapter 8).
EMSC can also be applied for the removal of Mie scattering artifacts from FTIR datasets, as applied in Chapter 10. This algorithm is based on a reference spectrum either generated from Matrigel or from the mean spectrum for a given dataset. Further information on the application of EMSC for resonant Mie scatter correction is available elsewhere. (81; 82; 83)
Based on the same notation as Section 4.4.2, EMSC can be defined as follows: a raw spec-trum, X0(nδλ), which can be described as a linear superposition of the Raman spectrum of
interest, R(nδλ), the baseline signal, P (nδλ), and the contaminant signal, B(nδλ):
X0(nδλ) = R(nδλ) + B(nδλ) + P (nδλ) (4.10) The goal is to estimate the values of B(nδλ) and P (nδλ) such that they may be subtracted from the recorded spectrum. Although noise will always be present in the raw spectrum (84), it is assumed that the SNR is sufficiently high such that the noise signal may be ignored.
A reference spectrum, r(nδλ), is first obtained such that it may be assumed that R(nδλ) can be approximated by the product of this reference spectrum and a certain weight:
R(nδλ) ≈ cr× r(nδλ) (4.11)
where cr is a scalar for a given spectrum.
Similarly, by recording a spectrum directly from a pure contaminant (e.g. a glass slide), b(nδλ), it is possible to represent the spectral contribution of glass in the recorded cell spectrum, B(nδλ), as the product of the pure glass spectrum and a certain weight:
B(nδλ) = cb× b(nδλ) (4.12)
It should be noted that both crand cbare scalar values that are unique to each cell spectrum, and are dependent on experimental parameters such as the Raman acquisition time.
The slowly varying baseline P (nδλ) can be represented using an appropriate N order poly-nomial:
PN(nδλ) = c0+ c1(nδλ) + c2(nδλ)2+ ... + cN(nδλ)N (4.13) where N is the order of the polynomial, and cm for m = 0 → N represents the various coefficients in the polynomial. (76)
The raw spectrum, X0(nδλ), the reference spectrum, r(nδλ), the contaminant spectrum, b(nδλ), and the order of the polynomial, N , are all input to the EMSC algorithm, which returns estimates for cr, cb, and cmfor m = 0 → N . These estimates are based on an optimal fit of the various vectors in Equation 4.14 in an OLS sense. (80; 79)
X0(nδλ) ≈ [cr× r(nδλ)] + [cb× b(nδλ)] +
The background corrected cell spectrum, Xf inal(nδλ), is given by:
Xf inal =
For consistency, the same notation has been used throughout Sections 4.4.2 and 4.4.3, however,
4.5. NORMALISATION 42 if the reader prefers vector notation to describe this algorithm, this is available in Ref. (85).
4.4.4 Other background correction methods
Various other background algorithms and techniques exist that are not applied within this thesis.
The modified polynomial method can sometimes lead to unstable behavior at the end points of the baseline of a spectrum, which led to the development of the “rubberband” method. This method fixes the endpoints of the dataset in order to avoid any such alterations from occur-ring. (74)
The presence of contaminant signals is highly undesirable in Raman spectra. These signals can arise from unwanted chemical artifacts or from the sample substrate (see Chapter 7). These signals can be removed using the methods highlighted in Sections 4.4.2 and 4.4.3, or using an independent component analysis (ICA) approach. Tfalyi et al. applied ICA along with non-negative least squares to remove residual wax contributions from Raman spectra of biological samples that had previously been stored in embedded paraffin. (86) A similar method was ap-plied for the removal of pharmaceutical drug components (87), and the signal from polystyrene nanoparticles (88), present in Raman cell spectra.
Alternatively, it is possible to record Raman spectra using different Raman system designs in order to reduce the background signals. Such systems include the ultrafast optical Kerr effect (89; 90), or the use of a modulated laser source (9; 91). However, both of these methods are expensive, and experimentally complicated.
4.5 Normalisation
Normalisation in the intensity axis is performed to adjust peak intensity values from each spec-trum in order to provide a common scale for comparing Raman peaks across a range of spectra.
Normalisation is achieved by dividing each variable in a spectrum by some constant. There are a few normalisation methods that can be utilised for Raman spectra; peak normalisation and vector normalisation are most commonly applied, but area normalisation or min-max normal-isation could also be applied. For peak normalnormal-isation, the constant is measured as the height difference between the baseline and the maximum point of a chosen peak; vector normalisa-tion obtains the constant value by calculating the sum of the intensity values for each variable in the spectrum, and finding the square root of this value; for area normalisation, the constant corresponds to the sum of the intensity values for each variable in the spectrum; and min-max normalisation involves setting the maximum intensity value to 1, and the lowest intensity value to 0. (92; 93)