Image classification approach

CHAPTER 2: RESEARCH DESIGN AND METHODS

2.2. Methodology

2.2.5. Data processing in laboratory

2.2.5.2. Remote sensing

2.2.5.2.3. Image classification approach

The software system used during the satellite image interpretation is Erdas Imagine and the classification approach consisted first in running Principal Component Analysis (PCA) on the subset images and in reducing variables to the first meaningful components (Appendix E).

Principal Component Analysis (PCA) is a variable reduction procedure useful when working with a data that contains a large number of variables, and it is believed that there is some redundancy in those variables. Redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same construct. Because of this redundancy, it should be possible to reduce the observed variables into a smaller number of principal components (artificial variables) that will account for most of the variance in the observed variables (Byrne et al. 1980; Eklund and Signh 1993; Holden and LeDrew 1998; Lindsay 2002).

Technically, a principal component can be defined as a linear combination of optimally weighted observed variables. „Linear combination” refers to the fact that scores on a component are created by adding together scores on the observed variables being analyzed. And “optimally weighted” refers to the fact that the observed variables are weighted in such a way that the resulting components account for a maximal amount of variance in the data set (Byrne et al. 1980; Eklund and Singh 1993; Lindsay 2002).

A type of equation called eigenequation produces the weights (Lindsay 2002). The weights produced by these eigenequations are optimal weights in the sense that, for a given set of data, no other set of weights could produce a set of components that are more successful in accounting for variance in the observed variables. The weights are created so as to satisfy a principle of least squares similar (but not identical) to the principle of

least squares used in multiple regressions (Byrne et al. 1980; Eklund and Singh 1993; Holden and LeDrew 1998; Lindsay 2002).

Erdas Imagine version 9.2 was used to perform PCA and 12 orthogonal (uncorrelated) eigenvectors or components were obtained (Appendix E).

Matrices and tables of eigenvalues show that the first eigenvector (total brightness) accounts for a large amount of the total variance with the highest eigenvalue. The second eigenvector (total greenness) accounts for second highest value of the total eigenvalues. The third eigenvector (Vigor 1) carries the third highest value of the total variance while the fourth eigenvector (Vigor 2) carries only the fourth value of the total weight of the eigenvalues, and so on (Eklund and Singh 1993; Holden and LeDrew 1998). As we move from component 1 to component 12, eigenvalues decrease and the associated noises in components become more apparent (Fig. 2.9).

Fig. 2.9. Land cover image using 12 bands (A), PCA 1 (B), and PCA 12 (C). As we move from PCA 1 to PCA 12, the noise becomes more apparent (PCA 12)

Source: Landsat images of November 23, 2006 and May 18, 2007

The choice of meaningful components is based on the combination of several criteria: the eigenvalue-one, the proportion of variance accounted for, and the interpretability criteria (Byrne et al. 1980; Eklund and Singh 1993; Lindsay 2002).

In the eigenvalue-one criterion approach, it is retained and interpreted any component with an eigenvalue greater than 1.00. The rationale is that each observed

variable contributes one unit of variance to the total variance in the data set. Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained (Anonymous n.d.).

The proportion of variance accounted for criterion involves retaining a component if it accounts for a specified proportion (%) of variance in the data set (Anonymous n.d.). The interpretability is perhaps the most important criterion. It consists of interpreting the substantive meaning of the retained components and verifying that this interpretation makes sense in terms of what is known about the constructs under investigation. There are four rules to do this. (1) Are there at least three variables (items) with significant loadings on each retained component? (2) Do the variables that load on a given component share the same conceptual meaning? (3) Do the variables that load on different components seem to be measuring different constructs? And (4) does the rotating factor pattern demonstrate “simple structure”? Simple structure means that the pattern possesses two characteristics: (a) Most of the variables have relatively high factor loadings on only one component, and near zero loadings on the other components, and (b) most components have relatively high factor loadings for some variables, and near- zero loadings for the remaining variables (Anonymous n.d.).

The first four “meaningful” components were selected for the image interpretation (Appendix E). They account for 96.70% (land cover map), 96.97% (early dry season), 98.89% (middle dry season), and 98.61% (First three components for the late dry season).

An example of the application of these criteria (to the land cover map PCA) is described below. Based on the eigenvalue one criterion, only it could be chosen components 1-10 (Appendix E). However, components 5-10 present some noise when displayed. The application of the second criterion shows that only components 1 to 4 individually account for large amount of the total variance and do not contain noise. It is then decided to keep these four components that account for 96.7% of the total variance. In addition, these four components are the most interpretable of the 12 components using the factor pattern matrix (Appendix E).

The rows of the factor pattern matrix represent the variables being analyzed (blue, green, red, NIR, MIR 1, MIR2 for November 23, 2006 image and bands blue, green, red, NIR, MIR 1, MIR2 for May 18, 2007 image) and the columns represent components (components 1-12). The entries of the matrix are factor loadings. A factor loading is a general term for a coefficient that appears in a factor pattern matrix. Component 1 is correlated with brightness in the middle infrared channels 5 (0.58), 6 (0.35), 11 (0.52), 12 (0.37) and is responsible for 76.23% of the variance. This component is a measure of the total brightness in the infrared channels (Eklund and Singh 1993; Holden and LeDrew 1998).

Component 2 is highly correlated with the greenness in MIR 1 (0.58) and MIR 2 (0.30) for November 23, 2006 image and in MIR 1 (-0.50) and MIR 2 (-0.42) for May 18, 2007 image. Factor loadings for MIR 1 & 2 of the November 23, 2006 image have positive values and factor loadings for MIR 1 & 2 for the May 18, 2007 image have negative values. Both axes have opposite directions.

Component 3 and 4 are correlated with vigor in NIR (-0.55 & -0.79) for the image of November 23, 2006 and in NIR (-0.76 & 0.52) for the May 18, 2007 image. Component 4 has reverse signs -/+ in NIR for the two images.

For the May 18, 2007 image, factor loadings of component 1 and component 2 have opposite signs in NIR (C1 = -0.11 and C2 = 0.22), MIR 1 (C1 = 0.52, and C2= - 0.50), and MIR 2 (C1 = 0.37 and C2 = -0.42). Also for the image of May 18, 2007, Component 3 and 4 have opposite signs for all the six bands ((blue C3 = -0.06 & C4 = 0.0), green (C3 = -0.07 & C4 = 0.04), red (C3 = 0.04 & C4 = -0.05), NIR (C3 = -0.76 & C4 = 0.52), MIR 1 (C3 = -0.28 and C4 = 0.20), and MIR2 (C3 = 0.01 & C4 = -0.06)).

The opposite signs +/- may suggest different seasons (rainy versus dry seasons) at which images were captured. They also express differences in absorption and reflectance in visible, NIR, and MIR during the two contrasting seasons.

After running the PCA and choosing the most meaningful components, the unsupervised clustering, the Iterative Self-Organizing Data Analysis Technique (ISODATA) algorithm (supervised classification), and the maximum likelihood as parametric rule were performed on the selected top four meaningful components to produce 255 spectral classes that were finally grouped into nine land cover categories for

the land cover map (Human activities (Croplands and villages), water, fallow field, grass savanna, shrub savanna, savanna woodlands, dry forest, gallery forest) and three land cover categories for the burned areas maps (Water, no burning, burning).

In document Up in smoke: biomass burning and atmospheric emissions in the Sudanian savanna of Côte d’Ivoire (Page 72-76)