done in three steps: classification, organization and selection. Traditional active learning methods perform these steps in every learning iteration making active learning impracti- cal for interactive applications. The methods described in this section improve the time complexity by reducing or pre-organizing the dataset. In Table 2.1, we resume the char- acteristics, concerning organization and dataset reduction, of the active learning methods described in this section and the traditional active learning methods. Additionally, we also present the characteristics of the proposed method MCLU-Reduc (Section 3.3), based on a supervised dataset reduction. The attribute organization is divided in two: a priori and on-the-fly. A priori means that the organization is done before the active learning process. On-the-fly means that organization is excecuted in every learning iteration.
2.7
Markov random fields for image classification
The theory of Markov random field (MRF) is part of the probability theory and describes the spatial or contextual dependencies of physical phenomena [38]. MRF provides a con- sistent framework to model contextual dependencies between pixels or correlated features. The MRF framework assumes that the physical properties within a neighborhood do not change abruptly, this is also known as the smoothness prior model, that is used to pro- duce a smooth image classification [65]. The contextual information can be derived from spectral, spatial and temporal dimensions [65]. The MRF formulation is presented bellow as described in [65].
Let a set of random variables r = {r1, r2, . . . , rn} defined on a set S containing n sites
(i.e. pixels) in which each random variable takes a label from a label set W . The family
r is called random field. The set S contains the image pixels; r is a set of pixel DN
(Digital Numbers) values and the set of labels W depends on a specific application (i.e.
W = { Forest, Non-forest }). A random field with respect to its neighborhood system
is a Markov random field if its probability density function satisfies the following three properties:
Positivity: P (r) > 0, all possible configurations of r have a non-zero probability.
Markovianity: P (ri|rS\{i}) = P (ri|rNi), this defines the neighborhood system. This
property states that the membership value of a pixel i depends only on its neighboring pixels.
Homogeneity: P (ri|rNi) is the same for all sites i. For all pixels, its probability is
2.7. Markov random fields for image classification 22
(a) First order (b) Second order (c) Fifth order
Figure 2.5: Pixel neighborhood system [65]: a), b) and c) depict the first, second and fifth order neighborhood system
Usually in the MRF based image analysis, the first neighborhood system is composed by the four pixels sharing the same boundary with the pixels of interest. The second order neighborhood system contains additionally the four pixels having same corner boundary with the central pixel. Figure 2.5 shows three neighborhood systems.
MRF allows to express a global model of the contextual information by using relations between the pixels and its neighbouring pixels. Thanks to the relation with Gibbs random field MRF can be used in practical applications [65]. The Hammersley-Clifford theorem proved that Gibbs Random Field is equivalent to MRF [65]. The Gibbs probability density function is given by [65]:
P (r) = 1
Zexp(−U (r)/T ) (2.19)
where P (r) is the probability of r, U (r) is the energy function, T is the temperature and Z is a normalizing constant, also called partition function. We can observe that the maximization of the posterior probability is equivalent to the minimization of the energy function. The formulation of the energy function depends on the application. We describe an example of energy function for image classification in the next subsection.
2.7.1
Energy minimization
In image analysis many problems are formulated in terms of energy minimization (i.e. image restoration, classification). The MRF framework has been used largely to regularize classification maps [38, 65]. In this framework, it is assumed that a pixel belonging to a class wi has neighboring pixels belonging to the same class. We use Simulated
Annealing, first proposed in [45] and applied to image segmentation in [21], to compute the maximum a posteriori probability (MAP) estimate of a classification map given an
2.7. Markov random fields for image classification 23
initial classification. This method minimizes the global energy in the image, by iterative minimization of local energies at the pixels of the image. This can be extended easily to the case of superpixels.
Let W = {w1, w2, . . . , wn} a set of class labels. We use the next function in order to
compute the local energy of a pixels xi for a label wi:
U (xi, wi) = Udata(xi, wi) + Uspatial(xi, wi), (2.20)
where Udata(xi, wi) is the energy of the observed data and Uspatial(xi, wi) is the energy
computed over the local neighborhood. The energy of the observed data is computed as follows:
Udata(xi, wi) = −ln(P (wi|xi)), (2.21)
where P (wi|xi) is the probability of label assignment provided by a classifier. We use the
spatial energy function described in [63]
Uspatial(xi, wi) =
X
xj∈Ni
β(1 − δ(wi, wj)), (2.22)
where β is the parameter that control the influence of spatial energy and δ is the Kronecker function:
δ(wi, wj) =
(
1 if wi = wj
0 otherwise (2.23)
The algorithm 2 describes Simulated Annealing for image labeling. The temperature starts high and is decreased until the system is frozen (T → 0). At high temperature, T can increase the probability of the label w being replaced by a new label w0 (because when ∆ > 0, large T indicates large exp(−∆/T ). At lower temperatures, only small increases of U are accepted. The main idea of Simulated Annealing is to introduce noise in the system to shake the search process away from the local minimum [65].
2.7. Markov random fields for image classification 24
Algorithm 2 Simulated Annealing algorithm.
1: Formulate energy function U
2: Initialize T (temperature)
3: Initialize the classification map L
4: repeat
5: for j = 1 to NumIterations do 6: for all pixels xi do
7: Randomly select a new label w0 from W
8: Compute the energy difference between w (current label of xi) and w0
9: ∆ = U (xi, w0) − U (xi, w)
10: Both energy functions are computed using Equation 2.20
11: if ∆ < 0 then
12: Replace w by w0 at the position of xi in L
13: else
14: if exp(−∆/T ) >= random[0, 1] then
15: Replace w by w0 at the position of xi in L
16: end if 17: end if 18: end for 19: end for 20: Decrease T 21: until T → 0
Chapter 3
Methodology
3.1
Introduction
In this chapter, we present the proposed methodology for superpixel-based remote sensing image classification. We start by over-segmenting the image into superpixels via the Sim- ple Linear Interactive Clustering (SLIC) algorithm. In order to characterize the generated superpixels, we propose a scheme for contextual superpixel description based on the bag of visual words (BOW) model. The experiments described in Section 4.2 show that the proposed contextual descriptors present better performance as compared to widely used region descriptors. The computation of the superpixel contextual descriptors is described in Section 3.2.
In order to create an effective classifier with a limited number of annotated samples, we use the active learning method Multi-Class Level Uncertainty (MCLU) described in Section 2.6. For the selection of the most informative samples, MCLU computes Equation 2.18 for all unlabeled samples. This process is slow for large datasets. To overcome this problem, we propose a supervised dataset reduction method, denoted as MCLU-Reduc that diminishes the number of unlabeled samples for active learning, reducing the time for sample selection in MCLU. The details of the proposed supervised dataset reduction method are described in Section 3.3. We execute the active learning process until a certain amount of samples is annotated. Then, with the final set of annotated samples, we train an SVM classifier to create an initial classification map.
Finally, we use the Markov Random Field (MRF) framework to further improve classi- fication accuracy by producing a relaxed classification map. The results of the experiments in Section 4.5 show that the relaxation process can improve the classification performance. The details of the proposed method for relaxation of the superpixel-based classification is described in Section 3.4.