• No results found

word lists all landmark co-occurrences that exist in the database with this particular word triplet combination. In this case, two exist,z4

xy andzxy5 , and each has an associated weight ωz=p(z7→w|τw)

5.5

Parameter Learning

The discrete nature of the scene models presented in this chapter makes it tractable to learn a deep probabilistic model of landmark pairs, as a joint distribution across both appearance and geometry. Rather than treat the visual and geometry words assigned to a pair as independent, we propose to compute a full joint distribution across all three words in a triplet. In this way, effects on one word are modelled in the knock-on effect on another word. For example, if a particular illumination condition causes the visual word of one landmark to change, then we model the corresponding effect on the visual word of the other landmark in the pair. Furthermore, if the viewpoint on the scene changes such that the pair’s geometry word is affected, then the effect on the visual words due to the apparent change in scene appearance can be modelled.

Returning to Equation 5.9, we now consider the probability that an observation of land- mark pairz is assigned to the word tripletτ. This triplet itself is composed of two visual words, πx and πy, together with a geometric word φxy, and we factorise out the joint distribution as follows:

p(τ|Tz) =p(πx, πy, φxy|z)

=p(πx|z)p(πy|πx, z)p(φxy|πx, πy, z)

(5.12)

As such, y’s visual word is dependent on x’s visual word, and the geometric word is dependent on both these visual words.

Visual word for

landmark x Visual word for landmark y

Geometric word for landmark co-occurrence zxy

Vote weight for landmark co-occurrence zxy . . . 𝜋1 𝜋1 𝜋2 𝜋3 .

.

. 𝜙5 𝜔𝑧𝑥𝑦 1 𝜋2 𝜋1 𝜋2 𝜋3 . . . 𝜋3 𝜋1 𝜋2 𝜋3 .

.

. 𝜙4 𝜔𝑧𝑥𝑦4 𝜙2 𝜔𝑧𝑥𝑦2 𝜙7

𝜔

𝑧𝑥𝑦3 𝜔𝑧𝑥𝑦5

First layer

Second layer

Figure 5.3: The index structure used to search for instances of word triplets in the database.

5.5. Parameter Learning 95 effects of overfitting. We remedy this in a similar way to learning the generative model of landmark appearances in Chapter 4, based on the use of alternative words. For each geo- metric word, we now define a set of alternative geometric words which may also represent the landmark pair on a subsequent observation. These alternative words are defined as all the adjacent words in the dictionary across all four geometries, and their alternative word probabilities are set to equal values such that the summation is 1.

Given a word tripletτ ={πx,¯ πy,¯ φxy¯ }, the smoothed distribution is then the product of the terms in Equation5.12, but with each term reflected by the distribution of alternative words, rather than the maximum-likelihood distribution:

p(τ|Tz) =p(¯πx,π¯y,φ¯xy|Tz)

= X

πx∈Πx

p(πx|Πx)p(¯πx|πx) X πy∈Πy

p(πy|πx)p(¯πy|πy) X φxy∈Φxy

p(φxy|πx, πy)p( ¯φxy|φxy) (5.13)

Here, Πx is the observed set of visual words for landmark x, and similarly Πy for y. Φxy is then the observed set of geometry words for this landmark pair.

This smoothing is illustrated in Figure 5.4. In (a), landmark x is observed in a feature track of three features, with three different visual words. The dark shading shows the features’ visual words, whilst the light shading shows the corresponding alternative visual words. Similarly in (b), co-occurring landmarkyis observed with three different features. Then, (c) shows the geometric word for the pair, together with associated alternative geometric words. The yellow words in each image are representative of the triplet τ =

{πx,¯ πy,¯ φxy¯ }, and although they have not been observed directly, they are incorporate into the distribution due to the effects of the alternative words. Furthermore, as we progress from (a) to (c), the distribution on the right is not only dependent on the alternative words in that image, but also on the distribution of alternative words on the left image.

𝑝(𝜋

𝑥

𝑥

)

𝑝(𝜋

𝑥

|𝜋

𝑥

)

(a) The distribution of visual words for three observations of landmarkx

𝑝(𝜋

𝑦

𝑦

)

𝑝(𝜋

𝑦

|𝜋

𝑦

)

(b) The distribution of visual words for three observations of landmarky

𝑝(𝜙

𝑥𝑦

𝑥𝑦

) 𝑝(𝜙

𝑥𝑦

|𝜙

𝑥𝑦

)

(c) The distribution of geomet- ricl words for three observations of landmark co-occurrencezxy Figure 5.4: Learning a deep probabilistic model of word triplets. The dark shade is the observed word, and the lighter shades are the associated alternative words. Computing the likelihood of the triplets represented by the yellow shades involves consideration of both the alternative words, and the dependencies between each word in the triplet.

5.6

Geometric Cliques for Global Consistency

Whilst the pairwise geometry embedded in the inverted index offers strong constraints on local configurations, as of yet there is no enforcement of global geometric consistency. As such, a set of feature pairs voting for one scene may be independently representative of landmark pairs observed in the scene, but when considering the relationships between each pair, the overall configuration may be incompatible. Consider Figure 5.5, which depicts three sets of pairwise matches, all of which agree locally by definition. When we consider the geometric relationships between each pair, we observe that the red and blue pairs are consistent with each other across the two images, because the relative distances, angles, scales and orientations between the red and blue features are similar in each image. However, the green pair is not consistent with either the red or blue pairs, in terms of all these geometries, although disagreement in any of distance, angle, scale and orientation is sufficient to define incompatibility. The goal is now to eliminate pairwise matches that may be locally plausible, but are globally inconsistent when considered against all others. The proposed solution, which we denote the method of Geometric Cliques, is based on finding a maximum clique in an adjacency matrix, whose elements indicate compatibility