Location Based Voting System - Automatic image annotation applied to habitat classification

5.4 Components

5.4.4 Location Based Voting System

Traditionally in Random Forest, each tree in the ensemble casts a unit vote, also referred to as a prediction, on the classes present in the unseen test image. These unit votes are commonly linearly combined to create the final prediction. This voting system method assumes that all trees in the ensemble are equally accurate classifiers. However, literature has shown that not all trees in a random forest are equally good at classifying unseen samples images [152].

In our case, we take advantage of the geographical properties of habitats to determine which trees might be more accurate in the classification process. Geographically close areas have similar ecological characteristics, since habitat properties do not generally change abruptly. For example, ground properties will not change from calcareous (class B.2) to neutral (class B.3) suddenly. Therefore, near regions will have similar habitats. Moreover, even if abrupt changes in habitat types were to occur, for example the sudden change between an inland cliff (class I.1.1) and acid grassland (class B.1) at the bottom of the cliff, a robust annotated database, such as the Habitat 1K database we have created, would have enough geo-referenced photographs of the site to accurately reflect that particular combination of habitats.

Since all the images in the database are geo-referenced, we benefit from this premise and we use their GPS coordinates to assign weight to the predictions. Weights are assigned according to the distance between the test sample and the images that are in the leaf node the sample has reached. By minimizing the distance and assigning weight, the predictions of trees with closer leaves influence the final classification more. However, it is important to notice that while some trees’ predictions will weight more than others, all predictions are taken into consideration in our framework. Therefore, the final contribution of this thesis is a novel voting system which takes into consideration the geographical location of the photographs during the testing phase.

The inclusion of geographical location in the testing phase will be described in more detail in Chapter9.

5.5 Concluding Remarks

In this chapter we have described what Automatic Image Annotation is and how it can be successfully applied to automatic habitat classification. Additionally, we have discussed some of the main challenges that AIA methods present. Moreover, we have presented our first contribution: an image-annotation framework for the automatic classification of habitats. We have given an overview of the whole framework and we have briefly introduced its components: its source data, feature extraction, the Machine Learning classifier used to annotate the photographs and the weighted voting system.

In the following chapters, we will describe in detail how each element of the framework works and how it relates to the other components of our system. The next chapter will describe the first element of our framework: the ground-taken photograph database, which is the first fully annotated database created for the classification of habitats.

te r 5. A u to m a tic Im a ge- A n n o ta tio n F ra m ew o rk 72 CPAM

Random Projection

Forests

Low-Level

Features

Medium-Level

Features

TAMURA SIFT

Human in the

Loop

T

n Habitats Habitats

Voting System

According to GPS

Location

Habitats

Annotations

Ground-Taken

Photographs

Figure 5.4: _{Image Annotation-Based Habitat Classification. Our framework consists of four elements: the photographs, the features extracted,}

Figure 5.5: _{Ground-Taken Photographs Used In Our Framework. Photographs (a)} and (b) belong to Habitat 1K and (c) and (d) belong to Habitat 3K.

Ground-Taken Photograph

Database

A

s_{shown and discussed in Chapter}₃_{, remote-sensed data has proven to be insufficient} to accurately classify Phase 1 habitats. In this thesis, we study the use of an alterna- tive source of data: ground-taken photographs. Our image annotation framework uses these types photographs to automatically classify habitats. For this purpose, we have created two different annotated datasets: Habitat 1K and Habitat 3K. These are, to our knowledge, the first ground-taken photograph datasets specially created and used for the purpose of automatic habitat classification. Moreover, our framework is also, to our knowledge, the first type of system which uses these types of photographs for the problem of habitat classification.

This chapter is divided into five sections. Section6.1describes the overall characteristics of the photographs we will be using and Section 6.2 gives a brief description of the three components of the databases we have created and annotated: the ground-taken photographs, the annotations and the low-level visual features we have extracted from them. Section 6.3 gives a detailed description of the first component: the ground- taken photographs from our datasets. Section 6.4 describes how the annotations were collected, created and stored and how they can be used in conjunction with the visual datasets. Additionally, Section 6.5 describes the third element of the databases, the low-level visual features extracted from the ground-taken photographs, which are also publicly available. We finish this chapter with concluding remarks and a brief summary in Section3.4.

6.1 Ground-Taken Imagery: Definition

In this thesis, we use ground-taken photographs to automatically classify Phase 1 habitats. We define the term “ground-taken photograph” formally as:

A ground-taken photograph is a digital photograph taken by a human on the ground. There are no limitations to the subject of the photograph. Additional- ly, there are no limitations to its layout in terms of orientation or perspective. These photographs may be taken with any type of digital or mobile camera. These photographs may or may not be geo-referenced.

This definition of ground-taken photograph is very broad, as it includes both indoors and outdoors photographs. Moreover, it does not restrict its subject. Multiple examples of ground-taken photographs are shown in Figure6.1.

However, given our goal, the ground-taken photographs that we will be working with need to have some additional restrictions. We are interested in outdoor ground-taken photographs, specifically those taken in rural and coastal areas in the United Kingdom, Europe, for which Phase 1 was specifically designed by the JNCC [102]. There must be at least one discernable habitat instance, either natural (i.e. grasslands, dunes, etc.) or artificial (i.e. walls, fences, parks). Moreover, these instances must be the focus of the photograph.

Thus, for example, in Figure6.1, we are not interested in working with any of the ground- taken photographs from the first row: (a) would not be used in our database because it contains an indoor scene, (b) is an outdoor scene but it is not a photograph taken in the United Kingdom and contains no habitats and (c), even though it was taken in the United Kingdom, does not have the habitats as the main focus of the photograph. It is important to notice that there are no restrictions as to the layout of the photographs. Consequently, all three photographs from the second row in Figure 6.1 can be used in our framework: (d) shows a ground shot of New Forest, (e) shows a landscape shot of Titchfield Haven which includes three habitats and (f) shows an artificial or man-made boundary habitat, a wall, taken in rural England. These photographs are, in fact, part of our Habitat 3K database.

As can be seen, there are no limitations to the number of habitat classes within a photograph, nor to how they might appear on said photographs. This has been done purposely with the aim of including as much variety and as much information as possible in our database. Additionally, by including the same habitats under many different circum- stances (i.e. different times of the year, different perspectives, different orientations, etc.) our database will become more representative and robust.

(a) (b) (c)

(d) (e) (f)

Figure 6.1: _{Ground-taken Photographs.}

We have chosen to work with ground-taken photographs in our framework for two main reasons. First, remote-sensed imagery has been proved to be insufficient for the automatic classification of Phase 1 habitats. Second, satellite and aerial photographs are more difficult to obtain. On the other hand, ground-taken photographs can be obtained more easily. Web sites such as Geograph [154] or Flickr [125], can be used to obtain ground-taken imagery with habitats on them. By using these crowd-sourcing sites, we benefit from their large collection of photographs to create a vast and robust database in a relatively effortless manner.

6.2 Annotated Ground-Taken Databases For Automatic

In document Automatic image annotation applied to habitat classification (Page 92-98)