5.4 Components
5.4.2 Feature Extraction: Low-Level and Medium-Level Features
Feature extraction is an extremely popular Computer Vision approach which is specially used in Image Processing problems to work with large amounts of images in an efficient manner. Since all the images in the database, no matter how similar or different their characteristics, are described using the same parameters, feature extraction also serves as a homogenization process. Moreover, it can be seen as a method of dimensionality reduction, which helps the “Curse of dimensionality” [109].
The main aim of extracting features is to collect the most descriptive but compact information from an image. It is important to notice that the selection of features is an extremely decisive task. However, it is also highly problem-dependent [56]. Different types of problems will call for different types of features and extracting and combining a vast number of diverse features will not necessarily yield better accuracy than extracting a small but representative number of features, as will be demonstrated in Chapter7and in Chapter8. For example, low-level shape features will be specially suited for tasks such as face recognition [203], while colour features might be more suited for problems such as bird classification [25]. Therefore, the aim in extracting features is to find a balance between the dimensionality of the features extracted and the quality of the information collected.
In order to work more efficiently with the ground-taken photographs, we extract low- level features from them. Moreover, we have created a new type of feature, referred to as Medium-Level Features, with the aim to extract more relevant information from the images. Consequently, the second element of our framework are the features we extract from our annotated ground-taken database.
5.4.2.1 Low Level Feature Extraction
Low-level features collect local or global statistics about different aspects of an image. Low-level visual features are one of the most popular types of features commonly ex- tracted in Image Processing problems. Extracting low-level visual features enables us
to work with a large number of high-definition photographs in an efficient and accu- rate manner. Moreover, they also allow for an easier comparison between images with different characteristics.
Commonly, low-level visual features can be divided into, at least, three groups [109]: colour features, such as colour histograms, texture features, such as the Tamura coeffi- cients [175], and shape features, such as the Hough transform [98]. However, there is a large number of other features which extract other types of relevant information, such as pattern features [148].
As mentioned in the previous section, feature selection is dependent of the problem to solve. In our case, since we are aiming to classify different types of natural habitats, we will focus on extracting colour, texture and pattern features. This is due to the fact that examining colour, texture and pattern similarities between habitats is similar to the process followed by trained ecologists when surveying a site. In particular, we have used pattern features [148] as a guideline for the behaviour of our classifier under different testing scenarios. We chose to do this because the pattern features we extract, called Colour Pattern Appearance Model (CPAM) features, have two main advantages over colour and texture features: they are more compact, with only a 128-dimension feature vector, and, at the same time, they collect a large amount of information on both the colour and pattern texture of the images. Moreover, they have obtained successful results in image classification tasks [148,151].
Low-level visual features are one of the components of the ground-taken photograph databases we have created as part of this thesis. Consequently, low-level feature extrac- tion will be described in more detail in Chapter6.
5.4.2.2 Medium Level Feature Extraction
While low-level features have been proven to be effective for image classification and image annotation tasks [169], they have some limitations with regards to the type of information they can effectively extract. In particular, low-level features are not suitable for the extraction of higher level or semantic information which can be crucial when classifying FGVC problems. This entails that objects that are easily identifiable to humans, might be complicated for computers to differentiate due to their similar visual properties. This is normally referred to as the “semantic gap” problem [18]. For example, a human can easily differentiate between a water habitat (class G) and the sky. However, given their similar colour, texture and pattern properties, it might be more difficult for a computer to classify both correctly, as will be shown in Chapter7. Semantic features
were developed as a medium to bridge the semantic gap and to include higher level information in the decision-making process.
In our case, semantic features can be very useful when automatically classifying habitats. In order to include higher-level information, we create and extract a second type of feature: medium-level features, which are the third contribution of this thesis. We also refer to them as medium-level knowledge. We follow the method described in [151] to incorporate medium-level information in the classification process using a Human-in- the-Loop approach.
To collect this medium-level knowledge, users were shown photographs from the Habitat 1K or Habitat 3K dataset and they were asked twenty three yes-or-no questions about the different types of natural objects that they can identify within the images. These natural objects included: trees with leaves, trees without leaves, trees with and without leaves, bushes, grass with flowers or non-uniform grass, uniform grass, reed, fern, herbs, heath, water, crops, boundaries, walls, fences, the sky, other (i.e. cars, people, buildings, animals). Along with the answer to each question, users are asked to measure the degree of confidence they have on their own assessment, which ranged from 0(not sure at all) to 5 (completely sure).
Medium-level knowledge and medium-level features will be described in full detail in Chapter 8.