5.3 Constructing the Scene Matrix
6.1.2 The Global Configuration Coefficient, G c
This section continues from the discussion of the previous section by formulating a novel measure of similarity known as the Global Configuration Coefficient, Gc.
This similarity metric addresses the shortcomings of using the number of matches alone by including the global landmark/keypoint configuration information into its design. This configuration information exploits the previously introduced concepts of rank correlation measures. From section 3.3.4, the usefulness of an ordinal scale for viewpoint invariant scene recognition is highlighted - as long as the viewpoint change is not too extreme, the spatial configuration of the matched keypoints is preserved. Using rank correlations of the ordinal positions of the matched keypoints (section 3.3.2), a measure of similarity sensitive to the rank orders of the spatial configuration is introduced.
Suppose two scenes are presented, represented by their individual Scene matrix cells (M1
s, M2s). The first step is to match the salient-SURF keypoints separately
6.1 A novel scene similarity metric 131
initial matches are then validated by RANSAC so as to remove any erroneous matches that do not respect the epipolar constraint (section 5.2.2):
˙
mjkp = mj1s ↔ mj2s, j ∈ {H, S, V } (6.1) where the scene matrices of the first and second scenes from the jth colour space
are denoted by (mj1s, mj2s) respectively. The symbol ↔ denotes the salient-SURF matching procedure together with verification by RANSAC. Next, the matched keypoints in the three colour spaces, denoted as ˙mjkp for the jth colour space, are
grouped together to form the Matching matrix, ˙Mkp:
˙
Mkp = ˙mHkpm˙Skpm˙Vkp
(6.2)
Grouping the matches together into one matrix loses all information concerning the colour space from which the keypoints originate. This is justifiable as the main reason for separating the keypoints and scene matrices into separate cells is to prevent mismatches of the keypoints across incompatible colour spaces (section 5.1.1). Since no more matching is required after this step, combining the matches together into ˙Mkp simplifies the implementation of the proposed SRS significantly.
The structure of ˙Mkp is defined as:
6.1 A novel scene similarity metric 132
matrix where Nmatch denotes the number of matches between two scenes with the
following structure: ˙Mkp = [x1 y1 d1prox x2 y2 d2prox] The matrix retains only the
localisation information of the matched keypoints of the two scenes, denoted by the subscripts (1, 2) respectively.
Next, a novel similarity metric known as the Global Configuration Coefficient, Gc, is defined with ˙Mkp as the input:
Gc( ˙Mkp) =
N%test
200 × (Sρ+ Kτ) (6.3)
where N%test is the percentage matches with respect to the test (first) scene given
by:
N%test=
Nmatch
N1d
× 100 (6.4)
and N1d denotes the original number of salient-SURF keypoints in the test scene.
(Sρ, Kτ) are the means of the positive Spearman’s ρ (3.1) and Kendall’s τ (3.2) rank
correlations in the three spatial (x, y, z) directions (see Fig. 3.14 for an illustration) given as: Sρ= 1 3 X i Sρi, i ∈ {x, y, z} Kτ = 1 3 X i Kτi, i ∈ {x, y, z} (6.5)
6.1 A novel scene similarity metric 133
where Spearman’s ρ and Kendall’s τ of a particular direction are denoted as (Sρi, Kτi), i ∈ {x, y, z}. The rank correlations are computed from the elements of the Matching matrix, ˙Mkp (definition 6.11), given as:
Sρx = Sρ( ˙Mkp(x1), ˙Mkp(x2)) Sρy = Sρ( ˙Mkp(y1), ˙Mkp(y2)) Sρz = Sρ( ˙Mkp(d1prox), ˙Mkp(d2prox)) (6.6) and Kx τ = Kτ( ˙Mkp(x1), ˙Mkp(x2)) Ky τ = Kτ( ˙Mkp(y1), ˙Mkp(y2)) Kz τ = Kτ( ˙Mkp(d1prox), ˙Mkp(d2prox)) (6.7)
Using the mean values penalise the rank correlation when one (or more) of its spatial configuration does not preserve the ordering constraint of the matched keypoints. This is the usually the case when the scenes are dissimilar. Although mismatches occur in all cases, the degradation in the rank correlations is expected to be less pronounced when two scenes are similar.
The formulation of Gc (6.3) combines both the local keypoint similarity and
6.1 A novel scene similarity metric 134
of similarity. The local similarity is indirectly captured by N%test that measures
the percentage of the keypoint matches stored in ˙Mkp. The global configuration
of the matched keypoints is captured in the mean rank correlations, Sρ and Kτ.
Gc is thus close to 1 for a perfect match with a high N%test that preserves the
overall spatial configuration. For dissimilar scenes, Gc is near to zero with very
few matches (small N%test) and the rank correlations of the mismatched keypoints
are likely to be small too as the spatial configuration is not preserved.
The incorporation of N%testis important as the use of rank correlations is highly
dependent on the number of matched keypoints. This is because a small number of matches is often not statistically significant for the computed rank correlations to be useful. For example, if only three keypoints are matched in the test scene, it is very likely that the three keypoints will have a similar configuration with many scenes in the reference database. One can thus view the formulation of (6.3) as weighing the confidence of the rank correlations by N%test.
In practice, the effects of wrong correspondences and occlusions due to image distortions often degrade Gc significantly (between 0.3 to 0.4), even for positive
scenes. This degradation is, nonetheless, usually more pronounced in negative scenes. As the amount of image distortions increases, this degradation will get even worse. This means that using a fixed threshold for scene decision is not feasible in practice. An adaptive threshold, estimated from (6.3) and (6.5), is presented in section 6.2.2 as a reliable alternative.