• No results found

Implementation details

4.2 Materials and Methods

4.3.2 Implementation details

The networks are implemented with Caffe2 [165]. As explained in Section 4.2.1, different architectures are developed to adapt the T-CNNs to the large difference of image sizes in the various experiments. The results are reported without distinction since the same approach is used for all the experiments as described in Section 4.2.2. Two methods are reported, namely DT-AlexNet and DT-GoogleNet depending on the T-CNN architecture employed.

The hyperparameters of the networks slightly depend on the size of the datasets, setups and input sizes and are summarised in Table 4.2. The number of slices per sequence is equal in the three planes and the results are relatively robust to variations of this number. For Dyntex++ and UCLA, the number of slices is equal to the height, width, and depth of the sequences, i.e. 50 and 48 respectively. The sequences of the DynTex database being much larger (352×288×250), only 10 slices are extracted per plane. All the slices from DynTex are resized after extraction to 227×227. The networks are trained using stochastic gradient descent with softmax and multinomial logistic loss.

4.4

Results and discussion

4.4.1

Results

The results of the proposed DT-CNN approach are compared with the state of the art in Table 4.3. The methods from the literature that obtain the best results on each dataset are reported, namely 9-Plane mask Directional Number transitional Graph with SVM classifier (DNG) [152], Dynamic Fractal Spectrum (DFS) [177], spatial Transferred ConvNet Features (s-TCoF) and concatenation of spatial and

2An implementation of the method is available at the following GitHub repository: https://github.com/v-andrearczyk/DT-CNN

Table 4.3: Accuracy results (%) of the proposed DT-CNN approaches and of the state of the art on multiple DT datasets.

Dyntex++ Dyn-alpha Dyn-beta Dyn-gamma UCLA-50 UCLA-9 UCLA-8

LBP-TOP[14, 178, 41] 71.2 96.67 85.8 84.85 86.1 - 96.8 DNG[152] 93.8 - - - - 99.6 99.4 DFS[177] 89.9 - - - 100 97.5 99 s-TCoF[41] - 100 99.38 95.83 - - - st-TCoF[41] - 98.33 98.15 98.11 - - - MEWLSP[137] 98.48 - - - 96.5 98.55 98.04 DT-AlexNet 98.18 100 99.38 99.62 99.5 98.05 98.48 DT-GoogleNet 98.58 100 100 99.62 99.5 98.35 99.02

temporal Transferred ConvNet Features (st-TCoF) [41] and Multiresolution Edge Weighted Local Structure Pattern (MEWLSP) [137]. Finally, the LBP-TOP [14] is also reported as it was tested on most of the datasets and also combines an analysis of three orthogonal planes. The results of the LBP-TOP are provided in [178] and [41] with the original author’s implementation [14] for the former and their own implementation for the latter. The proposed DT-CNN methods (DT-AlexNet and DT-GoogleNet) consistently obtain high accuracy results on all the datasets and overall outperform all other methods in the literature. The UCLA dataset contains few training samples per class and therefore the DT-CNNs do not outperform the current state of the art, but a significant improvement is obtained on the large DynTex and Dyntex++ datasets. The consistency of high accuracy results across all the datasets demonstrates the robustness of the developed approach. The deep DT-GoogleNet approach outperforms the shallower DT-AlexNet even though more challenging datasets would be required to fully demonstrate the power of using deep architectures. In the following paragraphs, the results of each experiment are discussed in more detail.

DynTex alpha: The classification rates of DT-AlexNet and DT-GoogleNet are 100% on this dataset which is not challenging as it contains high inter-class separabil- ity and low intra-class variability and each sequence has at least one other sequence of the same class which is very similar. The spatial TCoF (s-TCoF) in [41] also obtains 100% on this dataset.

DynTex beta: The recognition rate on DynTex-beta is 99.38% with DT-AlexNet and 100% with DT-GoogleNet, which outperforms the state of the art s-TCoF [41] (99.38%). The only sequence misclassified with DT-AlexNet is illustrated in Figure 4.4a with some sequences of the true class (4.4b) and of the detected class (4.4c). This misclassification is due to the background of this sequence containing mainly a blue sky and trees and occupying the major spatial part of the sequence.

DynTex gamma: On the DynTex-gamma dataset, only one sequence is misclassi- fied out of 264 by the DT-CNNs, improving the state of the art [41] from 98.11% to 99.62%. The 79th sequence of true class “naked trees” is misclassified as class

(a)misclassified sequence (b)true class (c)detected class

Figure 4.4:A misclassified sequence of the DynTex beta dataset and examples from

the true class and from the detected one. (a) misclassified sequence, (b) true class “rotation” and (c) detected class “trees”.

(a)misclassified sequence (b)true class (c)detected class

Figure 4.5: A misclassified sequence of the DynTex gamma dataset and examples

from the true class and from the detected one. (a) misclassified sequence, (b) true class “naked trees” and (c) detected class “foliage”.

“foliage” with a very similar appearance and dynamic. As depicted in Figure 4.5, the tree in the misclassified sequence is denser as compared to other trees in the same class “naked trees”, which causes the confusion. This result is satisfying as DynTex-gamma contains relatively high intra-class variations (e.g. “fountains” and “flags”) and inter-class similarities (e.g. “naked trees” vs. “foliage” and “sea” vs. “calm water”) both in terms of spatial appearance and dynamics.

Dyntex++: In this experiment, the DT-GoogleNet (98.58%) also outperforms the best results in the literature [137] (98.48%). The classification rate of each of the 36 classes with DT-AlexNet (98.18%) is detailed in Figure 4.6. The most misclassifications occur for the classes with high intra-class variation “water fountain” and “smoke”, with sequences which, with a closer look, do not always exhibit the expected DT due to the automatic splitting process of the original sequences from DynTex described in [39]. The confusion matrix is not illustrated due to the large number of classes, yet one may notice by visualising it that there are no dominant categories with which these misclassified sequences are confused (i.e. the confusions are spread over several classes). This experiment shows the effectiveness of the

100 90 80 70 60 xy + xt + yt

1-textured cloth 2-artificial hair 3-blossoming tree 4-escalator 5-waves on beach 6-grass swaying 7-boiling water 8-evap. water/fumes 9-river water 10-faucet water 11-fish swimming 12-underwater life 1 13-underwater life 2 14-underwater life 3 15-underwater life 4 16-ants 17-waterfall 18-candles 19-rain on water 20-flushing water 21-water in sink 22-CD in CD player 23-wash cycle 24-water pouring 25-lamp globes 26-lights blinking 27-leaves 28-birds flying in sky 29-pond water 30-rotating wind orn. 31-vehicle traffic 32-flag 33-branches 34-water fountain 35-clouds 36-smoke

Figure 4.6: Classification rates of individual classes of the Dyntex++ dataset with

the proposed DT-CNN approach (DT-AlexNet).

proposed approach with a relatively high number of training samples and classes as well as high intra-class variation.

The UCLA database contains only four sequences per primary category (50 classes setup). Therefore, the number of training sequences per primary category is three, which is much lower than the other datasets. CNNs require a high number of training samples to adjust the weights in order to learn meaningful features, generalise the recognition task to new unknown data, and avoid overfitting the training data. The database is not highly challenging because of low intra-class variation; hence, the proposed DT-CNNs are able to reach nearly 100% classification accuracy and are close to the state of the art. As a result of the training size, however, they do not outperform the best shallow hand-crafted methods in the literature on the three sub-datasets as described below.

UCLA-50: A classification rate of 99.5% is achieved with only one sequence misclassified out of 200 with both DT-CNNs. The best performance in the literature is 100%, achieved by DFS in [177]. However, DFS performs poorly with larger and more challenging datasets and is largely outperformed by the DT-CNN on Dyntex++ by over 8%.

UCLA-9: The proposed DT-AlexNet and DT-GoogleNet obtain 98.05% and

98.35% classification rate respectively, close to the best result 99.6% in the literature [152]. The confusion matrix obtained with DT-AlexNet is detailed in Table 4.4. The number of training samples of a class largely influences the recognition rate of the test samples of that same class. The “smoke” class contains only four samples and obtains the lowest 75% classification. These results confirm that the DT-CNNs perform best on large datasets with a high number of training samples per class.

UCLA-8: The proposed approach achieves a classification rate of 98.48% with DT-AlexNet and 99.02% with DT-GoogleNet, close to the state of the art on this

Table 4.4: Confusion matrix of the proposed DT-AlexNet on UCLA 9-class.

boil fire flow. pl. fount sea sm. wat. wfall

boil 1 0 0 0 0 0 0 0 0 fire 0 0.97 0 0 0 0 0.03 0 0 flower 0 0 0.92 0 0.08 0 0 0 0 fount 0 0 0 0.96 0 0 0 0 0.04 plants 0 0 0 0 1 0 0 0 0 sea 0 0 0 0 0 0.97 0 0.03 0 smoke 0 0 0 0 0 0 0.75 0.25 0 water 0 0 0 0 0 0 0.03 0.97 0 wfalls 0 0 0 0.01 0 0 0 0 0.99

Table 4.5: Confusion matrix of the proposed DT-AlexNet on UCLA 8-class.

boil fire flower fount sea smoke water wfalls

boil 0.99 0.01 0 0 0 0 0 0 fire 0.01 0.99 0 0 0 0 0 0 flower 0 0 1 0 0 0 0 0 fount 0 0 0 1 0 0 0 0 sea 0 0 0 0 0.95 0.02 0.02 0.01 smoke 0 0 0 0 0 1 0 0 water 0 0 0 0 0 0 1 0 wfalls 0 0 0 0.01 0 0.02 0.01 0.96

Table 4.6:Accuracy results (%) of the proposed DT-AlexNet on multiple DT datasets

using various combinations of planes.

Dyntex++ DynTex-alpha DynTex-beta DynTex-gamma UCLA-50 UCLA-9 UCLA-8

xy 94.28 100 98.77 98.11 99 95.7 98.04 xt 96.57 100 95.68 97.35 98.5 97.4 95.76 yt 96.28 100 95.06 97.73 93 97.15 95.65 xy+xt 97.57 100 99.38 99.24 99.5 97.4 98.26 xy+yt 97.71 100 99.38 99.62 99 97.8 98.59 xt+yt 97.84 100 97.53 98.11 98 97.95 96.85 xy+xt+yt 98.18 100 99.38 99.62 99.5 98.05 98.48

sub-dataset of 99.4% [152]. The confusion matrix in Table 4.5 shows that the few confusions with DT-AlexNet mainly occur for classes which exhibit high similar- ities in the spatial appearance and/or dynamics (e.g. “sea”, “smoke”, “water” and “waterfalls”).