2.4.1. Background
FTIR is a method commonly used for structural analysis of lignocellulosic biomass particularly because of its simplicity in sample preparation, fast analysis, non-destruction of samples and the possibility of investigating more than one compound at time using the same spectra (Xu et al., 2013a).
The principle of FTIR is the detection of absorbed radiation due to the interaction of IR radiation with vibrating bonds. Different bonds interact to different components of the IR spectrum (Xu et al., 2013a).
38
FTIR spectra can be used to identify/evaluate biomass structure, composition, and modifications during different processing (Park et al., 2010). There are several attempts to generate calibration curves in order to predict lignocellulose composition using FTIR spectra, however the lack of standards make quantification challenging (Xu et al., 2013a).
2.4.2. Limitations
Although FTIR can provide fast analysis and comparison among samples, this method usually only generates qualitative results unless calibrated with known standards (Park et al., 2010). Also, the interpretation of spectra is challenging, particularly in biomass due to overlapping of peaks due to different compounds (Barnette et al., 2012).
2.4.3. Principal Component Analysis (PCA) on FTIR data
Principal Component Analysis is a statistical multivariate technique, which decomposes the original data into orthogonal components to investigate possible data correlations. PCA is a powerful tool to analyse large amounts of data by decreasing the number of variables into few principal components (uncorrelated variables) (Xu et al., 2013a).
The principle of PCA is to ‘plot’ a matrix of X variables and Y samples in a multidimensional space, called variable space, and find ‘hidden correlation’. This variable space is composed by z variable axis, i.e., x1 = variable1, x2 = variable2 and so on. Although it is only possible to visualise z<=3, the number of variables are usually much higher in multivariate data analysis (Esbensen, 2002).
After ‘hidden correlations’ are found in the variable space, a central axis can be created in the direction of the maximum variance. This central axis is obtained by minimizing the distance between each variable to the axis using the principle of least squares. This axis will be
39
a new variable, called PC1, which partially describes the data (Esbensen, 2002). An illustration of data plot and PC1 determination is shown in Figure 2-1.
Figure 2-1 - The creation of PC1 in data plotted in space.Source: (Esbensen, 2002).
PC2 can be determined using the same technique of finding a new axis, orthogonal to PC1, that minimizes the distance of each data point to the axis. Then, PC3 is determined in the same way and so on (Esbensen, 2002). The number of possible PCs is the smallest between variable or sample number, however, usually only the first few PCs (PC1-PC3) are needed for data interpretation as they describe most of the data variance (Hori and Sugiyama, 2003, Sim et al., 2012).
The projection of each data point into a coordinate formed by a pair of any 2 PCs is called scores plot (Figure 2-2).
Figure 2-2 - Scores plot for PC1 and PC2. Source: (Esbensen, 2002).
The scores plot is widely used to find relations among the samples, identify sample grouping (clusters), recognise possible outliers, etc. (Esbensen, 2002) and that is the tool that was mostly used in this work.
40
The loading plot in PCA is related to the scores plots and provide some further sample analysis (Plácido and Capareda, 2014). Figure 2-3 shows an example of the loading plot for PC1 and PC2 from FTIR data.
Figure 2-3 - Loading plot for PC1 and PC2 from FTIR data. Source: modified from (Hori and Sugiyama, 2003)
In the loading plot, the highest peaks (positive and negative) are the ones that contribute the most for the variability that affects the position of the samples and, therefore, the clusters observed in the scores plot (Kline et al., 2010). In this work, the analysis of loadings plots was focused on PC1, as this is the PC that explains the highest data variability.
2.4.3.1. PCA for FTIR analysis and FTIR data manipulation
FTIR spectra can easily generate thousands of variables and visual comparison among samples is unlikely to result in definitive conclusions (Sim et al., 2012). Therefore, the use of multivariate analysis is increasing especially for biomass analysis in order to obtain more conclusive results (Xu et al., 2013a).
Prior to PCA analysis, one or more data treatment such as smoothing, normalisation, 2nd-derivative and baseline correction, is commonly applied with the purpose of decreasing noise and increasing spectra resolution (Hori and Sugiyama, 2003, Michell, 1990, Xu et al., 2013a).
41
Normalisation of FTIR data is commonly applied before PCA. Moreover, this tool can be applied using a variability of peaks (height and area) such as 1162, 1800 and 2900cm-1 (Chen et al., 2015, Monrroy et al., 2015, Ryden et al., 2014). In this work, normalisation was applied using the highest peak of each spectrum (which varies for each spectrum). Hence, the wavenumber values used for PCA are a relation between absorbance in each spectrum, not an absolute absorbance value. Hence, comparison among spectra is easier.
Finally, although PCA can be used for a variety of purposes, including prediction and calibration curves (Monrroy et al., 2015), the focus of this work was mostly try to understand the FTIR data and to answer simple questions such as if the spectra represent significant differences. Therefore, the scores and loading plots were the main tools used in the PCA study and analysis was focused on PC1 and PC2.
2.4.4. Material and method
FTIR was performed in a Jasco FTIR 6300 spectrometer. Samples (few milligrams) were analysed with no previous preparation. Parameters used were resolution of 4cm-1 and 32 scans in a range between 4000-600cm-1. Background scans, without samples, were performed before each sample using the same parameters.