• No results found

Comparison between data fusion methods

Sensor data fusion to predict multiple soil properties

5.4.6. Comparison between data fusion methods

The SMLR was able to predict several soil properties with improved accuracy coupling the outputs of both sensors, but not in all fields. The main drawback of SMLR is that it cannot be used for a larger number of predictor variables than the number of observations. Selection of predictor variables is also a difficult task when hundreds of highly collinear wavelengths relate to a soil property with a similar correlation. The ability of SMLR in dealing with the multi-collinearity among the predictor variables is also very poor as compared with PLSR, which can over- or under-estimate the accuracy of predictions. Another issue related to SMLR was about elimination of ECa-H and ECa-V values from the regression models during analysis.

This can be explained as either ECa measurements are highly correlated with the spectral data or they are adding subtle or non-significant contribution to the fusion model. The SMLR retains only a few variables that have a significant effect in the model and throws away other relevant or irrelevant information. Adding more variables in a regression model may increase its accuracy, but it can decrease the prediction ability of the model.

The PLSR showed the highest accuracy to predict many soil properties. Results were improved greatly from those of individual sensors. Addition of new predictor variables to models may add more multi-collinearity among them, but gains a modest improvement in the accuracy of PLSR analysis. In this scenario, PLSR has the ability to deal with multi-collinearity among the predictor variables and is not affected by the phenomenon of adding new predictor variables in a model, because it constructs a strong predictor variable from several weak predictor variables. Finally, PLSR detects the main latent structure present in the predictor variables that maximises the explained variance in the corresponding response variable.

The PCA+SMLR yields comparable results to PLSR in Field 1, but yields reduced accuracies in other fields for some soil properties. Slightly lower accuracies in the PCA+SMLR method than the PLSR method can be attributed to the difference in data analysis procedures of the two

Page | 105 methods. For example, PCA is an unsupervised analysis method because it is performed without a consideration of the target variable. In contrast to PCA, PLSR is a supervised analysis method because it maximises the correlation between the predictor variables and a target response variable. There was also the same issue of elimination of ECa variables from the model by SMLR. In many cases, the regression models did not use ECa measurements as predictor variables together with PCA scores during fusion. There is also the inability of the SMLR technique to handle outputs of different sensors. This might be because the ECa is collinear with the PCA scores of soil spectra or they were adding subtle or non-significant information as compared with the PCA scores.

In summary, the proposed data fusion methods have improved the accuracy of predictions for clay, silt, sand, EC and pH significantly, although not in all fields. It should be noted that the performance of data fusion is largely affected by the type of sensors used for data fusion.

Better results for performing data fusion are expected where individual sensors also show good correlation with soil properties. In this study, despite very low correlations between ECa

measurements and soil properties, predictions of the measureable soil properties were significantly improved. This is consistent with results from Schirrmann et al. (2011), who reported that the accuracy of measureable soil property (e.g. pH) was improved using a sensor fusion approach having a pH sensor, vis-NIR spectrometer and an ECa sensor. In our study, besides measurable soil properties, the accuracy of other soil properties that showed lower correlation with ECa, such as TOC, TN and CN were also improved in some cases. This indicates that the poor correlation of ECa with soil properties can also enhance accuracy of predictions during fusion. Although results of the three data fusion methods were comparable, PLSR outperformed the SMLR and PCA+SMLR methods. This implies that PLSR has better ability to deal with the multi-collinearity among the predictor variables and handle the data from different sensors effectively. The SMLR method sometimes showed results with low error (RMSE) for some soil properties in calibration models, but those models were unable to generalise when subjected to validation on a separate set of samples and produced large errors. In this case, although SMLR with fewer wavelengths may seem to yield comparable predictions to PLSR, over-fitting is likely. Over-fitted models display very good results in calibration models, but show very low ability to predict soil properties.

From the above results, it is clear that sensor data fusion is advantageous over soil property predictions based on a single sensor. Many possible benefits of fusion can be achieved, such as robust accuracy, extended attribute coverage and complementary information on certain soil properties. Despite a number of potential benefits of sensor data fusion, the approach may be hampered due to the difficulty of handling large volumes of sensory data from multiple sensors/sources, lack of accuracies in positioning systems for using multiple sensors real-time and complex statistical methods to be employed for data fusion.

5.5. Conclusions

Models based on data fusion of an EM38 and a vis-NIR spectrometer predicted clay, silt, sand, EC and pH better than those based on the output of the individual sensors. The accuracy of prediction of other soil properties that show low correlation with the output of one of the sensors, such as TOC and CN, is also improved in some cases. The highest accuracy was found

Page | 106

for clay, silt and sand content. It is expected that the performance of sensor data fusion is largely affected by the type of sensors used for fusion and hence selection of sensors is very crucial.

The three statistical methods tested yielded comparable results and can be used for data fusion. However, PLSR outperformed SMLR and PCA+SMLR to predict some soil properties.

The reason is that PLSR has a better ability to deal with the multi-collinearity among the predictor variables and can handle the data from both sensors. The SMLR and PCA+SMLR yielded similar results. The best results were found in a clayey field and the worst in a sandy field. The clayey field showed improved accuracy of predictions for about all soil properties in all fusion methods.

It is concluded that sensor data fusion can enhance the quality of soil sensing in precision agriculture. More efficient statistical data analysis methods are needed to handle a large volume of data effectively from multiple sensors for sensor data fusion.

5.6. Acknowledgements

We acknowledge Higher Education Commission (HEC), Pakistan, for supporting this research by providing a PhD scholarship under the Overseas Scholarship Programme, Phase-II, Batch-I.

Chapter 6

Mapping clay content using geostatistical