Challenges selecting the model for predictions

In document Screening genetic variation for photosynthetic capacity and efficiency in wheat (Page 107-112)


4.5.2 Challenges selecting the model for predictions

PLSR can generate multiple models; for a given training set and one challenge remains in how to choose the best model. Selecting the model from PLSR was done using the lowest, Root Mean Square Error of the prediction (RMSEP) from leave one out-cross validation (LOO-CV). LOO-CV generates artificial data from the training data to try to predict the behaviour of the test data, and then RMSEP is used to choose the model that best predicted the best the new artificial data. In this project, these artificial data indicated that the model with 5 components was the best to predict Vcmax25. However, when the real test data were used, it was shown that 3 components fitted the test data better (Figure 4.18). There are two parts of the predictions that could be improved: the prediction of the artificial data from the LOO-CV and the method to choose the model or number of components to use (in this case RMSEP). Until this moment LOO-CV is the best option to use, since it uses a continuous iteration to create artificial test data. In fact, both LOO- CV and RMSEP are highly used in chemometrics, giving very accurate predictions (Kalivas, 1997; Swierenga et al., 1999; Mevik and Wehrens, 2007; Martin et al., 2013). Probably the quality of this prediction is due to measuring optically homogenous solutions such as oil, petrol or sugars which is much easier than measuring a leaf. The light reflected by a leaf can be scattered, absorbed and reflected from different depths within a leaf, depending on the leaf thickness (Markwell et al., 1995; Jones and Vaughan, 2010), causing noise in the spectra. With leaves, a further challenge results from the leaf surface, which contains water, waxes, veins and trichomes.

In the literature, alternatives to RMSEP have been reported, for choosing the best model to use in PLSR predictions. The predicted residual sum of squares (PRESS) is an

alternative algorithm that has been used to select a model or number of components generated from the PLSR LOO-CV from reflectance measured in plants (Feret et al., 2011; Serbin et al., 2012; Ainsworth et al., 2014; Serbin et al., 2014). PRESS uses the training data to predict new values; it takes each predictor aside sequentially and estimates the model each time with the error generated from the data point removed. PRESS is automatic, unlike RMSEP which requires that the user look for the lowest value. PRESS does this itself. Its orthogonal design makes it computationally affordable and efficient (Chen et al.,


2004). Thus, future models can be selected with both the RMSEP and PRESS minimized and they will be used in the next chapter to calibrate the traits.


Reflectance spectra can be used to predict photosynthetic traits in wheat. The protocol developed in this chapter proposes to use reflectance measurements from 400 to 2400 nm, since changes in reflectance occur in both at the visible and infrared regions of the

electromagnetic spectrum and better results were obtained than using a smaller number of discrete wavelengths.

A measurement can be made in 3 s with a leaf-clip attached to the spectroradiometer. For narrow leaves it is recommended to use a mask that creates a small aperture thereby

reducing the standard error across measurements. A foam gasket stuck to the mask avoided damaging the leaves when placed in the leaf clip as well as preventing stray light from interfering. A pre-treatment of the spectra also eliminated outliers due to low signal intensity. During the analysis of the reflectance spectra, it is recommended that both RMSEP and PRESS be used to select the number of components in the PLSR LOO-CV.



Validation of reflectance spectra for predicting

the main photosynthetic characters in wheat

Centro Experimental Norman E. Borlaug, Cd. Obregón, Mexico. 2013.

Chapter 5 Validation of reflectance spectra for predicting the

main photosynthetic characters in wheat



This study investigates whether having a larger number of observations helps to determine the best models to predict Vcmax, Vcmax25, J, Narea, SPAD, LMA and Vcmax25/Narea from

hyperspectral reflectance and partial least square regression (PLSR). Aus1, Aus2, Aus3 and Mex experiments were used to calibrate the model for each trait in the PLSR, and Root Mean Square Error of Prediction (RMSEP) and the Predicted Residual Sum of Squares (PRESS) were used to select the model for each trait. SPAD values were calibrated against direct measurements of chlorophyll content. Reflectance spectra were used to predict both SPAD and chlorophyll content. Using all experiments to calibrate the PLSR with RMSEP and PRESS improved the choice of a model to accurately predict each trait. In the

validation, when observed data were compared against predictions from reflectance spectra, correlation coefficients (R2 values) of 0.62 for V

cmax25, 0.71 (J), 0.82 (SPAD), 0.77 (Chlorophyll content), 0.89 (LMA) and 0.93 (Narea), were obtained. Potential regions of the

spectra were identified to help mechanistically understand how the model predicts each trait.


Screening large populations in the field for phenotypic variation is challenging when measuring photosynthetic parameters and traits requiring destructive harvesting. One conclusion from Chapter 3 is that an accurate detection of genetic variation for

photosynthetic parameters requires multiple measurements between leaves of the same plant, between plants and during the plant life cycle. In order to detect and understand genetic variation in crops, it is important to develop reliable and fast phenotyping tools that act as a bridge between genomics, plant function and agricultural traits (Furbank and Tester, 2011).

Leaf chemical properties such as nitrogen, chlorophyll a and b, carotenoids and leaf mass per unit area (LMA) from trees and crops have successfully predicted from hyperspectral reflectance and the partial least square regression (PLSR) (Asner and Martin, 2008;

Townsend et al., 2008; Asner et al., 2009; Asner et al., 2011a; Asner et al., 2011b; Doughty et

al., 2011; Ecarnot et al., 2013). The same method has been used to predict photosynthetic

parameters such as Vcmax and J, in tropical trees, aspen, cotton and soybean (Doughty et al., 2011; Serbin et al., 2012; Ainsworth et al., 2014).

PLSR is a more robust analysis than the classical multiple regression and principal component regression model (Geladi and Kowalski, 1986). PLSR is a method that

111 correlates a variable with multiple measured values, in this case, reflectance values at many wavelengths. To deal with this dimensionality problem it uses a latent decomposition (components) of the response matrix and the predictor matrix. In this process, matrices including linear combinations (scores), loadings and random errors are created. Scores and loadings used to create the regression coefficients that basically represent the model that is used to predict the traits (Mevik and Wehrens, 2007). PLSR consists of two steps:

calibration (training) and prediction (test). This chapter shows both steps for each trait measured.

It is possible to calculate many PLSR’s because there are a number of possible solutions. However some models only describe noise, so for this reason the cross validation aproach “leave one out” (LOO-CV) is used with Square Error of Prediction (RMSEP) and the Predicted Residual Sum of Squares (PRESS). RMSEP and PRESS find the smaller error or residual across the number of components, which is a criteria that evaluates the predictive power of the model and permits the identification of the best model to predict the test data (Geladi and Kowalski, 1986).

There have been several attempts to detect key wavelengths involved in predicting the traits. In Chapter 4, sixteen wavelengths proposed in the literature review to predict Vcmax in aspen (Serbin et al., 2012) were used unsuccessfully to predict Vcmax in wheat. Therefore, the models obtained in this study would need to be analysed further for their utility in other species.

The objective of this study was to calibrate and test predictions for Vcmax, Vcmax25, J, Narea,

SPAD, Chlorophyll content, LMA and Vcmax25/Narea obtained from hyperspectral

reflectance and PLSR analysis from measurements collected from wheat in four experiments (Aus1, Aus2, Aus3 and Mex).


5.3.1 Plant Material and experiment conditions

The germplasm and experiments used in this chapter have been described in detail in Chapter 3 (Tables 3.1, 3.2, 3.3). The first glasshouse experiment, Aus1, was designed to achieve a range in leaf colour with a drastic reduction of nitrogen levels in one treatment and high fertilizer in the other treatment. The second glasshouse experiment, Aus2, was also designed to vary nitrogen, but over a shorter treatment duration which resulted in smaller differences in leaf nitrogen content per unit leaf area and photosynthetic parameters. Field experiments Aus3 and Mex were designed to test if reflectance can


differentiate between wheat genotypes grown under moderate fertilization and to evaluate if the ASD FieldSpec®3 can be used in the field to screen quickly for traits that could be important in breeding for improved photosynthetic performance.

In order to calibrate the SPAD chlorophyll meter against chlorophyll content, another experiment was performed. This experiment was called Sun and Shade (S&S). Experiment S&S was carried out in a glasshouse at CSIRO Black Mountain, Canberra (-35.271875, 149.113982), where temperature was controlled to 25/15 °C (day/night). Six seeds of each wheat genotype (Table 5.1) were sown in pots of 5 L with 50:50 loam:vermiculite soil mix containing basal fertilizer on December 22nd, 2011. The day of emergence (DAE) was on

December 26th, and during the following week the plants were thinned to keep 3 plants per

pot. The experiment was organized in a randomized block design, 4 blocks with 10 wheat genotypes, one genotype per pot. At 10 DAE, when the plants had 2.5 leaves, two blocks were kept under usual glasshouse sun conditions (900 to 1250 μmol quanta m-2 s-1) and two

blocks were shaded (230-300 μmol quanta m-2 s-1).

Table 5.1 List of wheat genotypes used in the S&S experiment. Some genotypes are shared in experiments described in Chapter 3.

Names Acronym Characteristics Sunstar V6 (Table 3.2) Condon et al., 1990

Hartog (Pavon 76) V58 (Table 3.3) CIMMYT Historic (Condon et al., 1990) Seri M82 V61 (Table 3.3) CIMMYT Historic

Siete Cerros 66 V62 (Table 3.3) CIMMYT Historic

Bd 912 V77 Bodallin + Semi-dwarf + Tin gene; Very high LMA Bodallin V78 Tall, low LMA

Chinese Spring V79 Tall, low LMA Red Egyptian V80 Semi-dwarf, low LMA

Songlen V81 Parent of DH crossed with Sundor Sundor V82 Condon et al., 1990

In document Screening genetic variation for photosynthetic capacity and efficiency in wheat (Page 107-112)