Characterizing the Feature Matrix

Chapter 3 Results

3.2 Duluth Harbor

3.2.1 Characterizing the Feature Matrix

Using the bathymetric surface in Figure 3.2, terrain measures for this subset were calculated using the processing methodology outlined in the methods section. This data was chosen to test the clustering procedure due to the large ridge feature that is present in the smoother regional bathymetry. To determine the effects of data resolution and analysis window size, the data was processed at multiple resolutions (0.25, 0.5, 1, 1.5 meter

resolutions) and multiple tile sizes (N=7, 9, 13, 15, 17, 21, 33, and 65). An example of features calculated using the processing methodology is shown in Figure 3.4 at 0.5 meter resolution and a tile size of N=13.

The processed data in Figure 3.4 shows two of the measures calculated for the analysis, slope and profile curvature. In total, 12 terrain measures were calculated and subjected to PCA and cluster analysis. The slope calculation illustrates the ability of the processing code to identify features in a bathymetric surface. Areas in which topography changes sharply are clearly defined in the calculated measures. The process not only identifies a ridge-like feature in Figure 3.4, but also characterizes the underlying regional bathymetry which includes artifacts linked to problems associated with instrument mounting. These can be seen in the bathymetry and calculated measures as a series of parallel lineations or ridges in the regional bathymetric signal that is oriented approximately normal to the ridge features. This error is present in all the calculated measures and will be discussed further in later sections.

25m

N N

Slope Profile Curvature

Figure 3.4: Two examples of features calculated on the Duluth Harbor subset data. On the left is an example of the slope measurement calculated in degrees. On the right is an example of profile curvature in degrees. Location of subset shown in Figure 3.2 with highlighted box.

3.2.2 PCA and Clustering

The series of terrain measures described in section 2.6 were computed for every cell in the interpolated bathymetric surface. Principle component analysis (PCA) was applied to the set of full feature vectors (FFV). The reduced data were clustered using K-means

clustering. Initial clustering into 8 classes did not prove to be advantageous, as the morphology and the geology of the surface’s bathymetry are relatively simple in this location. Subsequent clustering analysis used five classes to match the ideal number of ground-truth classes to be used in the accuracy assessment. Figure 3.5 shows an example

of clustering analysis applied to a data set reduced to its first three principal components. The cluster map of the Duluth Harbor subset in Figure 3.5 illustrates the ability of the K-means algorithm to distinguish different terrains in the bathymetry.

A

B

25m

N N

Duluth Harbor Subset Cluster Map

Cluster classes

Figure 3.5: Duluth harbor subset cluster map. Data is clustered into five classes. A) shows the original bathymetric surface as a shaded relief map. Note that this area highlights the large ridge features and roll artifacts can be seen in the surrounding surface. B) shows the clustered data. The bathymetry data was processed at 0.5-meter resolution and measures calculated using a tile size of N=13. Location of subset shown in Figure 3.2 with highlighted box.

The classes in the cluster map shown in Figure 3.5 are arbitrarily colored and do not represent any substrate type at this point. There are similarities between the cluster map

Fig. 3.5B and the bathymetric surface in Fig. 3.5A. The clustering accurately outlines the ridges and captures their spatial extents and shapes. This initial clustering will later be interpreted and used in conjunction with the ground-truth samples to conduct an accuracy assessment.

One parameter that can be calculated during the PCA is the percent variance explained. This number is derived from the eigenvalue decomposition of the input data during the PCA and is a metric that describes how much of the variance a particular measure is contributing to the overall analysis (Kaiser, 1958). For these data, around 85 percent of the cumulative variance could be captured by the first five principle components. Adding additional components to the analysis did not significantly increase the amount of explained variance in the data being tested.

Figure 3.6 shows a plot of the weighted contribution of each of the measures calculated in the Duluth Harbor subset. This data was taken from the analysis of the data at 0.5-meter resolution and a tile size of N=13. Similar plots were constructed for each of the data resolutions tested as well as different tile sizes. The variables used in the analysis

contribute between five and ten percent of the cumulative variance. This data will be used in the interpretation as a way to help determine which terrain variable(s) are most

Figure 3.6: Plot sho wing the w eighted contr ibution of eac h ter rain measure for the Duluth Harbor subset. Bef ore w eighting, the in put of eac h var iable w as nor malized to the total cumulativ e contr ibution value of ~85 percent.

The results of the analysis of Duluth Harbor data suggest that the processing methodology is accurately calculating the measures for a given bathymetric surface. The survey data were analyzed at several resolutions and tile sizes, as well as classified using multiple numbers of clusters in the K-means algorithm. Figure 3.6 suggests no simple dependence on any one measure will significantly influence the PCA. This will be discussed in Chapter 4. Figure 3.5A illustrates a significant data quality issue that became apparent when processing this data set. A significant roll artifact can be seen that effects the entire bathymetric surface. While figure 3.5B does not show this error found in the cluster map, the data has been re-classified in post-processing to show the ideal number of classes. Using a higher number of classes readily identifies the roll “artifact” and segmented it as its own class(es). Figure 3.5B also shows several red squares that are present in the cluster map, in all iterations of the PCA and clustering and as a the result of the processing code’s resolving abilities. An investigation of these areas on the bathymetric surface shows areas which have significant topographic features, most likely large boulders that exceed the data resolution. The effect of this is that the processing code will effectively smooth out the surface centered around that point. This will lead to the measures calculated at this point taking on the characteristics of the topographic feature which will significantly over saturate the surrounding bathymetry.

The clustering procedure cannot resolve features larger than the data resolution and tile size being used. Examples of this effect given in the Supplementary material for plots of all the measures calculated for this surface. This artifact cannot be avoided as it is an inherent limiting factor in the underlying mathematics used in the calculations as well as the multibeam sonar’s maximum resolving abilities.

In document Textural Analysis and Substrate Classification in the Nearshore Region of Lake Superior Using High-Resolution Multibeam Bathymetry (Page 58-64)