Performance measures - Experimental setup

2.9 Experimental setup

2.9.2 Performance measures

We now describe the performance measures used for evaluating the spectral-spatial classification schemes proposed in this thesis in terms of accuracy, as well as in terms of execution time and speedup.

Assessing the Accuracy

The classification process consists in labelling pixel vectors from a hyperspectral image cre- ating a final classification map. The accuracy is based on a reference map, also known as a ground-truth where the set of training samples are taken for the SVM training phase. In order to quantitatively evaluate the classification accuracies, we have used the Overall Accuracy (OA), the class-specific accuracy (CS), the Average Accuracy (AA) and the Kappa coefficient of agreement (k) [167] as the criteria for assessing the accuracy of the spectral- spatial classification schemes.

LetCirepresent the classi,Ci jbe the number of pixels classified to the class jand refer-

enced as the classi, andKthe number of classes.

– The OA is the percentage of correctly classified pixels:

OA= ∑

K i Cii

∑Ki jCi j

×100%. (2.26)

– The CS (or producer’s accuracy) is the percentage of correctly classified pixels for a given classi:

CSi=

Cii

∑KjCi j

×100%. (2.27)

– The AA is the mean of the class-specific accuracy for all the classes:

AA=∑

K i CSi

K ×100%. (2.28)

–kis the percentage of agreement corrected by the amount of agreement that could be expected due to chance alone, which ranges from 0 to 1. A value ofkbelow 0.6 is interpreted as a moderate agreement, whereas a value above 0.8 can be understood as an almost perfect agreement [167].

These criteria are computed from a confusion matrix. In the field of supervised learning, a confusion matrix is a table, where each column represents the instances in a predicted class, while each row represents the instances in an actual class (reference map) [173]. Table 2.4

2.9. Experimental setup 47

Predicted class

Reference data Asphalt Water Trees ∑KjCi j CSi

Asphalt 28 1 1 30 93.3%

Water 14 15 1 30 50.0%

Trees 15 5 20 40 50.0%

∑Ki jCi j 57 21 22 100

Table 2.4:Confusion matrix for a problem with three classes and class-specific accuracy (CS).

shows an example of a confusion matrix for a problem with three classes:Asphalt,Waterand

Trees. The number of values correctly classified are represented in the main diagonal. The OA is 63% and the AA 64%. The class-specific accuracy is shown in the last column of the table. It can be observed from the values below the main diagonal of the matrix that the scheme is “confusing” the classesWaterandTreeswith the classAsphalt. In this example, 14 values of the classWaterand 15 values of the classTreeswere incorrectly classified asAsphalt. Thek

coefficient is only 0.454, so there is a moderate agreement in this result.

In order to obtain a reliable evaluation of the results, the accuracies are calculated excluding the samples used for training, that is, from the reference map, a group of samples is used for training the SVM and the remaining samples for testing the classification accuracy.

Speedup

The proposed GPU implementations are also evaluated in terms of execution times and speed- ups comparing to optimized CPU multi-threaded OpenMP implementations. The speedup is a metric for relative performance improvements and it is calculated as the ratio between the old execution time (base for comparison) and the new (improved) execution time:

S= Told Tnew

. (2.29)

The execution time is measured in seconds. The timesToldandTneware the wall clock times (elapsed time from the start to end of the computation) excluding the file I/O time for both, the CPU and the GPU. The reason to exclude the harddisk reading data is that the implementations are part of a scheme where the different stages are concatenated in a pipeline processing, where the output data in one stage are used as input in the following stage. Therefore, the data are kept in memory during the process. For GPU timing, the CPU–GPU data transfer times are included in the analysis as an associated overhead for using that architecture.

Occupancy

When launching a kernel, different configurations are possible, and based on these configurations as well as the hardware requirements by the kernel, the performance may vary. The limit in the hardware resources required to execute a kernel can be achieved in several ways. For example, if a kernel is configured to launch blocks of 512 threads, the maximum number of concurrent blocks per SM will be 3 in Fermi architecture, and 4 in Kepler, although the theoretical maximum number is 8 (Fermi) and 16 (Kepler), as detailed in Table 2.3. The reason is that the maximum number of threads per SM is 1536 in Fermi, and 2048 in Kepler. Therefore, this limit is reached before the concurrent number of blocks, 8/16 in Fermi/Kepler, respectively. Another example is the number of registers per thread. If we create a kernel that requires 63 registers per thread, that is, the maximum allowed in both architectures, the maximum number of threads per SM will be 512/1024 in Fermi/Kepler because there are not available registers for more threads.

One factor used to measure the performance on the GPU is the occupancy, which is defined as:

Occupancy= number of active warps per SM

maximum number of possible active warps. (2.30) In Kepler we have 64 active warps per SM, see Table 2.3. With a kernel configured with 128 threads per block and with no other limit in the hardware requirements, the occupancy for this kernel is:

Occupancy=2048 threads max./32 threads per warp

64 =1.

This metric can be also studied in the basis of the total number of concurrent blocks per SM:

Occupancy_{by blocks}= number of active blocks per SM

maximum number of possible active blocks, (2.31) that is also 1 for the same example. With 128 threads per blocks and without any other limit in the hardware requirements, the occupancy is:

Occupancy_{by blocks}=2048 threads max./128 threads per block

16 =1.

In both (2.30) and (2.31), the occupancy is 100%.

The highest number of concurrent blocks is desired as it maximizes the occupancy whilst it hides the latency of memory accesses. The hardware resources that usually limit the occupancy on the GPU are the registers usage, the shared memory requirements and the block size [104].

2.9. Experimental setup 49

(a) (b) (c)

Figure 2.21:2D / 3D datasets used in this work. (a) Lena, (b) CT Scan Head, (c) simulated MRI volume.

In document Spectral-spatial classification of n-dimensional images in real-time based on segmentation and mathematical morphology on GPUs (Page 74-77)