4.1 Problem Statement
4.2.2 Finding the Correspondence Set
4.2.2.2 Graph Cut Methods
Graph cut methods are inspired by the discovery that energy minimisation (or the optimisation of some objective function) can be efficiently performed by modelling the problem as a set of edges and vertices arranged in a graph. Typically the pixels in the image is represented by vertices in the graph with neighbouring pixels connected by edges called neighbour links. Special terminal vertices, which represent pixel labels, are also added to the graph and these are also connected to the pixel vertices with special edges called terminal links.
Each edge has a weight associated with it that is set by the energy function. When the problem is correctly modelled as a graph, combinatorial optimisation theory teaches that the optimal labelling, that is the one that minimises the energy function, is repre- sented by theminimal graph cut [112]. The minimal graph cut is a way to partition the graph such that the sum of the edge weights associated with the edges that were cut in the partition is minimised [113]. An example of a pixel labelling problem modelled as a graph is shown in Figure 4.3.
Graph cut minimisation has been used successfully in image segmentation and clus- tering applications and has shown to be more efficient than most state-of-the-art meth- ods in these applications [114, 115]. The visual correspondence problem is also es- sentially a pixel labelling problem that can be solved through energy minimisation.
Figure 4.3: In this diagram an image labelling problem is represented as a graph. The square vertices are pixels in the image and the circular vertices are image labels. All the image labels are connected to all the pixel vertices with terminal links (t-link) (some links are omitted for clarity) and pixels are connected to their neighbours with neighbour links (n-link). The partitionspktoPkresults from a single graph cut that associates pixels with labels and represent a specific pixel labelling. This image was taken from [112].
4.2 Overview of Current Approaches
input :CRA= the set of matched features (observations)
input :M = a camera motion model that can be fitted to matches
input :n= the minimum size of an inlier set
input :Gmax= the total number of iterations
input :t= threshold value to decide when a match fits the model
output:Mbest= the overall best model output:Sbest= the best inlier set
output:ebest= the fitting error from the best model
1 i= 0 2 Mbest ={} 3 Sbest={} 4 ebest=∞
5 whilei < Gmax do
6 Sinliers=nrandom matches
7 Minliers= the model fitted to Sinliers
8 Sconsensus=Sinliers
9 foreachmatch ∈CRA and ∈/ Sinliers do 10 if match fits Minliers with error < tthen
11 add match toSconsensus
12 end
13 end
14 if |Sconsensus|>|Sbest|then // this is a good model
15 Sbest=Sconsensus
16 end 17 i=i+ 1 18 end
19 Mbest = model fitted to all matches inSbest
20 ebest= fitting error from Mbest 21 RETURNMbest, Sbest, ebest
In traditional energy minimisation the aim is to minimise a energy function that has the following form:
E(CRA) =K·Esmooth(CRA) +Edata(CRA), (4.2)
whereCRA is a correspondence set between the features of the reference image and the
alternate image (see Figure 4.1). The first term is a smoothness term that imposes a penalty on solutions that violate spatial smoothness while the second term penalises solution that are inconsistent with the observed data. K is a weighting constant that determines the relative importance between the two terms. When searching for a correspondence set, the data term is typically built around a feature matching measure and the SSD measure introduced in Equation (4.1) is often used. The smoothness term is more difficult to choose and many variations have been proposed [112].
A popular smoothness term that is often used in graph cut minimisation is the Potts model. The Potts model preserves discontinuities in the image and has various other features that make it preferable for use in graph cut minimisation [112]. The Potts model for the smoothness term is defined as
Esmooth(CRA) = X p,q∈N
T(fp 6=fq) , (4.3)
where T(·) is 1 if its argument is true, and 0 otherwise. The term is only calculated for those pixels inN which is the set of interacting pixels and is typically those pixels that are adjacent.
The Potts model encourages solutions consisting of regions where pixels in the same region have equal labels. This type of term is sometimes informally called a piecewise constant model [112]. The Potts model is especially useful when the labels are unordered and the number of labels is small. It has been shown that the Potts model based energy minimisation problem can be directly reduced to the multiway graph cut problem [112]. It can therefore be proven that the global minimum for a Potts model based energy function can be computed by finding the minimum cost graph cut on an appropriately constructed graph.
Therefore, one can solve the visual correspondence problem by first modelling it as an appropriately constructed graph and then using known combinatorial optimisation