• No results found

Matching of large images through coupled decomposition

N/A
N/A
Protected

Academic year: 2021

Share "Matching of large images through coupled decomposition"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Matching of Large Images Through

Coupled Decomposition

Panagiotis Sidiropoulos, Member, IEEE, Jan-Peter Muller, Member, IEEE

Abstract— In this paper, we address the problem of fast

and accurate extraction of points that correspond to the same location (named tie-points) from pairs of large-sized images. First, we conduct a theoretical analysis of the performance of the full-image matching approach, demonstrating its limitations when applied to large images. Subsequently, we introduce a novel technique to impose spatial constraints on the matching process without employing subsampled versions of the reference and the target image, which we name coupled image decomposition. This technique splits images into corresponding subimages through a process that is theoretically invariant to geometric transfor-mations, additive noise, and global radiometric differences, as well as being robust to local changes. After presenting it, we demonstrate how coupled image decomposition can be used both for image registration and for automatic estimation of epipolar geometry. Finally, coupled image decomposition is tested on a data set consisting of several planetary images of different size, varying from less than one megapixel to several hundreds of megapixels. The reported experimental results, which includes comparison with full-image matching and state-of-the-art techniques, demonstrate the substantial computational cost reduction that can be achieved when matching large images through coupled decomposition, without at the same time compromising the overall matching accuracy.

Index Terms— Image matching, high-resolution imaging, image registration, image decomposition.

I. INTRODUCTION

O

VER the last few years, smartphones and digital cameras have made tens-of-megapixel images available to the general public, while new domains, such as high-resolution planetary mapping, have resulted in the release of images that reach up to several gigapixels (see [1]). As expected, the size increase raises scalability and computational complexity issues that make imperative the re-design of a number of image processing approaches.

Image matching is an essential step in multiple image processing applications, including image registration, image retrieval, stereo and 3D reconstruction. When dealing with large images, feature-based matching techniques are typically used. The latter employ extracted features (either

Manuscript received August 12, 2014; revised December 15, 2014; accepted February 24, 2015. Date of publication March 4, 2015; date of current version April 6, 2015. This work was supported in part by the Science and Technology Facilities Council through the Mullard Space Science Laboratory under Grant ST/K000977/1 and in part by the European Union’s Seventh Framework Programme (FP7/2007-2013) through the IBM Multimedia Analysis and Retrieval System under Grant 607379. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Quanzheng Li.

The authors are with the Mullard Space Science Laboratory, University College London, London WC1E 6BT, U.K. (e-mail: p.sidiropoulos@ucl.ac.uk; j.muller@ucl.ac.uk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2015.2409978

in the form of corresponding tie-points [?], [?], [2], [3] or characteristic image curves [4], [5]) and perform image matching using characteristics of these features, thus limiting the computational cost required to accomplish this task.

The number of features is expected to increase with the image size, thus even the feature-based matching of large images becomes a computationally cumbersome task. State-of-the-art techniques designed to ameliorate image matching computational speed are mostly focused on the orthogonal issue, i.e. handling large datasets of “normal-sized” images [8]–[10]. All these techniques perform full-image matching, i.e. they don’t impose any spatial constraints on the matching process. On the contrary, they ignore the actual image areas from where the tie-points are extracted, even though these may guide the matching process by defining the range and size of the tie-point neighbourhood, thus reducing the required time for matching large images.

When dealing with large images the most common way to circumvent the substantial computational cost is through pyramidal schemes. These are techniques that employ a number of sub-sampled versions of the reference and the target image to propagate matching results from the coarsest to the finest resolution. However, pyramidal schemes are fully dependent on the successful matching of all intermediate resolution versions, which is a non-trivial task for a large range of images (e.g. images of low contrast, images with a small number of distinctive features, etc.). If there are any mistakes made at any one level of the pyramid, these mistakes will be hugely amplified at the finest resolution level.

Additional challenges arise when images have partially changed. For example, when dealing with planetary images of the same area acquired at different times, it is straightforward that due to natural processes such as dust deposition or geological phenomena, the images may not be perfectly co-aligned. In such applications, it is desirable to have a sub-image ranking according to the matching quality that was achieved. In this way, a measure is attained that enables a discrimination between un-changed and potentially-changed areas. The latter may be subsequently either discarded from the matching output or become the input to a semantic analysis stage aiming to recognise the change (e.g. this may signify some unidentified dynamic natural processes [9], [10]). In order to address the aforementioned issues, we present a novel adaptive image manipulation technique, which we name “coupled image decomposition”. This is a two-step, iterative, global-to-local algorithm. At each iteration, the corresponding images (or sub-images) are initially coupled, before a concurrent image decomposition process generates multiple corresponding sub-images. The coupled decomposition

(2)

Fig. 1. Flowchart of the image matching pipeline from image input until tie-point extraction. The solid-lines in the flowchart corresponds to the typical processing used in image matching. The dashed lines show the newly introduced parts of the flowchart.

algorithm output defines an image neighbourhood for each feature in the first image and restricts matching into the corresponding neighbourhood in the other image. As a result, image matching is performed at the sub-image level, i.e. in pairs of sub-images that are matched independently from each other, thus significantly reducing the required computational time. Moreover, as it will be both theoretically and experimentally shown, this approach achieves an increase in the number of correctly identified tie-points, which can be used to enhance the overall matching accuracy. Finally, on top of the coupled decomposition algorithm, we introduce a tech-nique that exploits the information redundancy of sub-image matching to improve the automatic estimation of the epipolar geometry as well as assess the sub-image matching quality.

The rest of the paper is organized as follows: related work is presented in Section II, followed by a theoretical analysis of the shortcomings of full-image matching of large images in Section III. Subsequently, the coupled decomposi-tion method is presented in Secdecomposi-tion IV. In Secdecomposi-tion V, image matching and fundamental matrix estimation using coupled decomposition is introduced, while experimental validation is performed in Section VI. Section VII concludes the work.

II. RELATEDWORK

A. Tie-Point Matching

A standard tie-point matching pipeline for two input images, W , and Z [11] consists of two independently executed parts, i.e. the extraction of descriptive points p, q from W and Z respectively, and their subsequent matching (solid-line flowchart on Fig. 1). The latter can be considered as the definition of a function f from the set Sp (pSp)to the

set Sq∪ ∅(qSq) [6], based exclusively on local descriptors

estimated at each point of Sp and Sq. Note that apart from

the Sp subset that is mapped to ∅, which signify points

of Sp that don’t have a match to Sq, this function is bijective

(i.e. one-to-one).

The most commonly used local descriptor in image matching tasks is the Scale-Invariant Feature Transform (SIFT) [11], which is a 128-dimensional local orientation histogram. Popular recent SIFT extensions and variations, such as SURF [12], GLOH [13] and DAISY [2] have been found to suffer from accuracy, compactness or speed loss respectively when compared with SIFT. As a matter of fact, there is evidence that SURF is not as accurate as SIFT in image matching applications [2], while GLOH and DAISY are slower than SIFT [13], [14], the latter being also less compact since it employs 200-dimension feature vectors.

No matter which descriptor is employed, when dealing with either large datasets of small images or very large images, the most time-consuming stage of the pipeline is point matching. Actually, the computational cost of a brute-force search (i.e. that examines all (p,q)(SpX Sq)) may become

prohibitive, while the use of efficient algorithms such as kd-tree search [15] and ball-tree search [16] are undermined by the high SIFT dimensionality [11], due to the curse of dimensionality [17].

The developed approaches for reducing the computational cost of point matching are designed either for applications in which the target image is compared to a large number of reference images or for applications in which the target image and the (single) reference image are exceptionally large. The former approaches typically perform matching through fast approximate nearest neighbour algorithms, such as priority search [18]. While recent variations limit the approximation error either by employing multiple kd-trees having between them a constant angle offset [7] or through a bi-directional image matching scheme [8], such techniques have non-trivial approximation errors, which may be tolerated or not depending on the application.

On the other hand, matching of large images usually involves some sort of pyramidal scheme in order to reduce the computational cost. Pyramidal schemes employ a set of sub-sampled versions of the reference and the target image. Matching is performed by propagating results from the coars-est to the fincoars-est level, in order to limit the computational time. In [19] the full images are matched only at the coarsest resolution, with the results being propagated and updated at progressively finer resolution. In a recent pyramidal technique, which was introduced in [20], coarse level registration results are used to decompose the images into 4 corresponding sub-images (using the center as the splitting point), which are subsequently independently matched in a finer level. This iterative algorithm starts from small sub-sampled images (128×128 or 256×256 pixels) and doubles the resolution at each step, until reaching the actual image dimensions. Finally, a similar approach was proposed in [21]. Matching is performed in a progressively finer resolution, while the matchings that are established at the coarsest levels are used to decompose the image planes into corresponding sub-images through Voronoi tessellation.

Pyramidal approaches follow the assumption that both the images and the image descriptors are robust to extreme scale differences (which in [20] case may reach up to 107). Unfortunately, this is not generally true, since theoretical scale invariance can be achieved only when the image resolution satisfies the Nyquist sampling criterion [22]. For a large range of images, including the majority of remote sensing products, this criterion is far from being satisfied, since it would require spatial frequency much more coarser than the actual ones. Additionally, this approach generates a more sparse set of tie-points than matching at full resolution, since the latter are typically extracted from extrema (e.g. corners) and the generation of coarse resolution versions is equivalent to image blurring. As a result, pyramidal schemes often fail to approximate the correct registration at the coarsest levels, an

(3)

error that is propagated to the finer ones, thus contaminating the final matching outcome.

However, a technique that decomposes the reference and the target image into corresponding sub-images would be very useful not only for alleviating the overall computational cost but also for controlling the matching false acceptance and false rejection rates. This becomes apparent from the fact that typically a match is declared if the ratio of the distance of a query point p from its second nearest q2 neighbour to the

distance from its nearest neighbour q1 ( pSp, q1,q2∈ Sq)

is above a threshold. This approach raises scalability issues. since the second nearest neighbour distance over the nearest neighbour distance ratio depends on the size of Sq. When Sqis

large enough and point p matches to q1 then there is a high

probability that by mistake q2 is so similar to p that it will

be erroneously omitted. On the contrary, when the set Sq is

small and point p doesn’t match to q1 then it is possible by

mistake that q2will be less similar than is required to correctly

discard the match. Consequently, an increased false rejection rate is expected when the size of Sq is large enough, while an

increased false acceptance rate occurs when the size of Sq is

rather small. A global ratio threshold may bias the algorithm towards achieving a low false acceptance ratio even when a false rejection ratio is of major importance, e.g. when dealing with textureless or low-contrast images. This is especially the case when RANSAC [23] (or an analogous algorithm) can successfully prune erroneous matches in a post-processing step when the false rejection-false acceptance trade-off must be carefully examined so as to as to limit the number of correct matches that are missed (errors of omission).

B. Matching of Remote Sensing Spacecraft Images

Remotely sensed images of large areas of the earth and planetary surfaces constitute an image sub-class with several special characteristics, which render their matching a very challenging task. Firstly, matching accuracy is of fundamental importance. Actually, any matching error is likely to result in distortions being reprojected in geocoded images and/or maps, which are going to be used in tasks that require high accuracy. For example, images taken by Mars orbiters, with a resolution on the order of metres per pixel, have been extensively used as part of the selection of landing sites for Mars lander and rover missions [24], [25]. It is likely that any error in the employed maps would jeopardize the mission outcome. As a result, approaches that might compromise accuracy are avoided as far as one is able to.

On the other hand, a typical remote sensing image is on the order of tens-to-hundreds of megapixels, and can reach up to 10 gigapixels (e.g. such as the 25cm high-resolution Mars images from the HiRISE camera [1]). Thus, brute-force tie-point matching are very likely to result in a prohibitive computational cost. The combined requirements for high accuracy and reliability with a realistic computational time leads to the manual decomposition of the input images into corresponding, non-overlapping sub-images. However, such a task is cumbersome, un-scalable and subject to human errors. Alternatively, a pyramidal scheme may be employed (see [19]). However, pyramidal schemes induce severe

Fig. 2. An example of corresponding image pairs from Martian imagery.

performance degradations when dealing with texureless areas, which are very common in planetary data, since the scenery often involves extended flat or dusty areas such as valleys, with a small number of distinctive features (Fig. 2).

Moreover, remote sensing images arise from sampling of a 2D projection of the 3D land surface. As a result, the remote sensing image is independent from the sampling frequency of the input image (i.e. the pixel resolution) only if the latter satisfy the Nyquist sampling criterion [22]. In a typical remote sensing scenario, this implies very fine pixel resolution, i.e. on the order of millimetres. Such a resolution is not currently available. Consequently, in this type of image, a scaling variance due to aliasing is to be expected.

Image matching of remote sensing images, especially plan-etary images of bodies with atmospheres, also has to deal with the fact that the planet’s surface is exposed to unpredictable local changes, due to weather or dust/aerosol/pollution phenomena [26], seasonal processes [9], clouds, possible impacts by large meteorites which do not burn up in the thin atmosphere, etc. These are often impossible to model due to the currently limited understanding of the natural mechanisms that trigger them. Apparently, it would be desirable to identify such local changes, not only to discard them from the image matching process but additionally because they refer to an unknown natural process whose extent is unknown.

Finally, remote sensing images are commonly acquired using pushbroom cameras, i.e. line cameras that acquire images perpendicular to the spacecraft flight direction, which scan distinct surface areas as the spacecraft flies forward [27]. It has been shown that matching images from pushbroom cameras only partially satisfy typical linear epipolar constraints, i.e. the epipolarity curves may not be considered linear but only partially linear [27]. This property provides additional merit for an algorithm that would automatically decompose image pairs into corresponding sub-images, as the one that is described analytically in the next section.

III. FULL-IMAGEMATCHING OFLARGEIMAGES

Full-image matching of large images is associated with an increased computational cost, since image matching by default

(4)

has quadratic computational complexity. In this section we are going to show that full-image matching of large images is also associated with degraded accuracy, which can not be improved through parameter tuning.

This analysis derives from the fact that matches are typically identified through the nearest neighbour distance ratio (NNDR), i.e. the ratio of the distance to the nearest neighbour over the distance to the second nearest neighbour. For example, in the original SIFT publication [11] a point in the reference image was declared matched if NNDR was less than a threshold H =0.8, while it was further claimed that the exact threshold value is not critical on the matching performance.

When matching two sets of feature points coming respectively from the reference and the target image, it is fair to assume that the distances between a feature point in the reference image that does not have any match in the target image and all NW feature points in the target

image are random, i.e. coming from a distribution fD(d)

(the corresponding cdf is FD(d)). Then the joint distribution

of the distance to the nearest neighbour d1 and the second

nearest neighbour d2 is [28]:

fD1,D2(d1,d2)=NW(NW −1)F

NW−2

D1 (d1)fD1(d1)fD2(d2)

(1) where d1 ≤d2. The probability P of correct rejection is the

probability of d1≤ H d2, i.e., P =NW(NW −1) −∞F NW−2 D1 (d1)fD1(d1) ( H d1 d1 fD2(d2)dd2)dd1 = −∞F NW−2 D1 (d1)(FD1(H d1)FD1(d1))fD1(d1)dd1 (2) By expanding Eq. (2) we get:

P = NW(NW −1)[ −∞FD1(H d1)F NW−2 D1 (d1)fD1(d1)dd1 − −∞F NW−1 D1 (d1)fD1(d1)dd1] (3)

Since−∞NWFDN1W−1(d1)fD1(d1)dd1=1 the above equation

is re-written as: P = NW(NW −1) −∞FD1(H d1)F NW−2 D1 (d1)fD1(d1)dd1 −(NW −1) (4) Finally, [FD1(H d1)F NW−1 D1 (d1)] = H fD 1(d1)F NW−1 D1 (d1)+ (NW −1)FD1(H d1)F NW−2 D1 (d1)fD1(d1), so P=NW(1−H −∞ fD1(d1)F NW−1 D1 (d1)dd1)(NW −1) (5) or P =1−NWH −∞ fD1(d1)F NW−1 D1 (d1)dd1 (6)

Fig. 3. The F-score dependence from the tie-point number NW and the

NNDR threshold H as a function of the non-matching distance standard deviationσ. Black-coloured pixels correspond to F-score=1, white-coloured pixel to F-score=0 and grey-coloured pixels to all intermediate values, using linear colour stretch.

Consequently, the mis-detection ratio Emd is

Emd =NWH

−∞ fD1(d1)F NW−1

D1 (d1)dd1 (7)

Eq. (7) implies that the mis-detection ratio is determined by the distribution fD(d), the threshold H and the number

of tie-points in the target image NW. Following a similar

rational we can conclude that the false rejection ratio and the mis-classification ratio is also determined by the above parameters, along with the distribution of the matching feature point distances gD(d). Consequently, this parameterisation can

be employed to examine matching performance dependence from the number of points NW and the NNDR threshold H .

In this implementation, it is assumed that correct matches show consistently low distance values, thus gD(d)was selected

as a uniform distribution with minimum value 0.01 and maximum value 0.1. On the other hand, fD(d)was supposed

to be a normal distribution, with mean value 0.5 (since distance takes a value within the range [0, 1]) and variable standard deviation σ. The above selections reduce the number of degrees of freedom to 3: the standard deviationσ, the thresh-old H and the number of tie-points in the target image NW.

The matching quality measure in this case is the F-score, i.e. the harmonic mean of the information retrieval mea-sures, Recall and Precision [29]. F-scores were estimated using the above model by sampling NW, H and σ.

More specifically, F-score was estimated for NW equal

to 100,300,1000,3000,10000,30000,100000 and 300000. On the other hand, σ samples were taken in the interval [0.05, 0.4] using a step 0.005 and H in the interval [1.1, 2] using a step 0.01.

As a result, for each NW value, a matrix M of dimension

71×91 was retrieved. M(i,j)value is the estimated F-score when σ =0.05+0.005(i−1)and H =1.1+0.01(j−1). These tables were quantised and visualised in Figure 3. Black-coloured pixels correspond to F = 1, whille white-coloured pixels to F = 0. All intermediate F-score values are drawn using a linear grey-scale stretch, i.e. F-score x corresponds to grey level 255x . We selected to demonstrate these results as grey-level images because the purpose of this synthetic experiment is not to give quantitative F-score estimates but rather to examine the qualitative dependence of F from NW and H .

(5)

As expected, Fig. 3 confirms that higher standard deviation of the non-matching distance scores results in lower perfor-mance. As a matter of fact,σ expresses the inherent matching ambiguity that varies according to the content of the image, the camera parameters, the imaging conditions, etc as well as the capability of the employed descriptor to index the local image content. A rough discrimination can be generated by labeling matchings with low σ values as “inherently easy” while matchings with highσ values as “inherently hard”.

Fig. 3 shows that while the former class seems rather robust to the number of tie-points NW, the latter’s performance is

severely degraded when the number of tie-points increase. Consequently, a full-image matching of large images is exposed to unpredictable parameters (such as those modeled throughσ) and for a significant range of images (e.g. remote sensing spacecraft images) is expected to lead to poor matching performance. Moreover, the matching performance can not be tuned through NNDR threshold H , since H values do not seem to play a critical role in the matching performance, especially for large NW values. As a matter

of fact, Fig. 3 implies that avoiding the selection of a very low H value seems to be the only necessary H tuning.

This inability of H to effect accurate matching of large images suggests the need to limit the number of candidate matches. The latter can be achieved by imposing spatial constraints on the matching process. In the next section we introduce such a technique, named coupled decomposition.

IV. COUPLEDIMAGEDECOMPOSITION

The image matching pipeline is illustrated in Fig. 1. The solid lines denote the previously established process while the dashed ones represent the ones introduced in this work for the coupled decomposition step. As already discussed in Section II-A, in the state-of-the-art approach, initially inter-est point detection is conducted on the reference and the target image (e.g. through a Harris-Laplace detector [30]), before processing the extracted informative image points to generate local descriptors (usually SIFT [11]). Thus, a point is projected from image space to descriptor space. During matching, each target image descriptor is matched with a reference image descriptor or discarded if no match is found (possibly using a tree structure to facilitate efficient queries), and the matches are reprojected from the descriptor space to the image space. This approach does not impose any spatial constraints on the matching process, which is fully conducted in the descriptor space, while pyramidal schemes impose spatial constraints based on sub-sampled versions of the images, a strategy that is problematic when dealing with large images. Instead, in this section we introduce a novel technique, named coupled decomposition, imposing spatial constraints on pairs of images using only the full resolution images, which subsequently are matched.

A. Line and Point Invariants

We assume that Z and W are the reference and the target input images, respectively. As a matter of fact Z , (as well as W ) is used to represent both the reference (target) image per se as well as its luminance at a point pi (qj), in which case

this is denoted as Z(pi)(W(qj)), or Z(xi,yi)(W(xj,yj)) if

the point coordinates are given. Moreover, for a pair of points pi and pj,θi j is the angle between the vector pipj with the

horizontal line that pass through pi.

Using the above nomenclature, the profile S¯(pi, θ) is

defined as the set of points pj of Z for which θi j = θ or

θi j =θ+π, i.e. the set of points of a line passing from pi.

We further define the mean profile Z¯(pi, θ, ) as the mean

luminance of the profile points:

¯

Z(pi, θ, )=E[pj], pjZ, θi j=θ or θi j =θ+π (8)

In the above equation E[.] is the expected value operator. Moreover, since the only available information is the image luminance in integer coordinates, the mean profile depends on the employed interpolation method,. Nevertheless, since the luminance variation caused by different interpolation is usually small and is further reduced through averaging, we assume that Z¯(pi, θ, 1)≈ ¯Z(pi, θ, 2)= ¯Z(pi, θ)for two

different interpolation methods1, 2.

Through the mean profile, we define for each point p a function fZ¯,p(θ)that maps each angle θ to the mean profile

¯

Z(p, θ). If two images, Z and W , differ only in terms of geometric transformations and additive noise (i.e. Z can be reproduced from W after a set of geometric transformations and noise addition) and if p and q are two corresponding points in Z and W (i.e. p is where point q is mapped through the geometric transformations) then the following equation stands:

fZ¯,p(θ)= fW¯,q +φ) (9) In the above equation φ signifies the orientation angle difference of W and Z .

Finally, in order to find two corresponding points in Z and W , two different approaches are proposed, depending on the image matching scenario. More specifically, when Z and W completely overlap Z and W centroids are selected as corresponding points, since geometric transformations and additive noise leaves the image centroid invariant, i.e.:

(xiZ(xi,yi),

yiZ(xi,yi))

=(xjW(xj,yj),yjW(xj,yj)) (10)

However, when the reference and the target image overlap only partially then the reference and the target image centroid are not expected to correspond. When dealing with partially over-lapping image pairs, a 1-element sample of the set of matches is proposed to be used as the corresponding point. More specif-ically, this is estimated by matching one by one the feature points extracted from Z with the complete set of feature points extracted from W , until an unique match is identified. The matching point coordinates in Z and W determine the pair of corresponding points. In the rest of this paper, the two variations are going to be named mean-based coupled decom-position and match-based coupled decomdecom-position, respectively. B. Coupled Image Decomposition Algorithm

The coupled image decomposition algorithm is presented in Alg. 1. The input of the algorithm consists of the reference

(6)

Algorithm 1 Coupled Image Decomposition Algorithm

image, Z and the target image, W . The coupled decomposition algorithm starts by estimating the corresponding points p and q respectively and by estimating the profile functions fZ¯,p(θ) and fW¯,q(θ) on them. As a matter of fact, since the profile functions are continuous, we employ a quantised version of them, which aggregates the mean profiles for angle ranges ofθ. Asθ gets smaller the quantised profile function becomes the profile functions introduced in the previous subsection.

From Eq. 9 it is assumed that fW¯,q(θ) is a (circularly) translated version of fZ¯,p(θ). The angle translation parameter φ is the one that maximizes the profile vector correlation. Consequently, a polar coordinate system fit in p with the axis at angle θ (relative to a horizontal line passing through p) corresponds to a polar coordinate system fit in q with the axis at angle θ+φ. This property is used to decompose images into corresponding pairs of sub-images,

Zi and Wi.

The sub-images Zi (as well as Wi) are actually radial

sections of the polar co-ordinate system that is defined on the point p (q). For simplicity reasons, we use a constant angular range for each radial section. For example, if the corresponding point is the point p1 and 4 radial sections are

used then the range of each radial section is π/2 and the first, second, third and fourth sub-image would include the image points pi for which 0 ≤ θ1iπ/2, π/2 ≤ θ1iπ,

πθ1i ≤3π/2 and 3π/2≤θ1i ≤2π, respectively.

For each of the corresponding sub-images, the coupled decomposition process is iterated, until the maximum number of iterations is reached. The algorithm output is a set of flags that determine for each W and Z pixel the sub-image that they belong to. If W and Z differ only in terms of geometric transformations and additive noise, then points in Zi would

correspond to points in Wi and vice versa. Consequently,

instead of matching the complete images Z and W, image matching can be performed on a set of corresponding sub-images Zi and Wi.

The parameters that control coupled decomposition output are the quantization step θ, the number of iterations K and the number of radial sections M. In order to tackle cases in which the differences between Z and W are not limited to geometric transformations and additive noise, the sub-images Zi, Wi are enlarged by a factor (1+a) before

Fig. 4. (a) Z image plane and the centroid perturbations when decomposing Z coupled to Winstead of W . The grey rectangle in the bottom-right corner represents the image region that WWdiffer, while the solid and the dashed lines the sub-image boundaries when decomposing Z coupled to W and W, respectively. (b)|R|mean value histogram when using image pair 1 (Table I) and 105runs. At each runλ1was randomly selected (λmax=0.1) and pixels

were randomly changed in the bottom-right corner until it was reached. Then λ2,λ3was estimated from the image, and finally R from Eq. 14.

matching. The overlap ratio a is the fourth parameter of this algorithm.

C. Algorithm Properties

1) Robustness and Invariance: Mean-based coupled decomposition is by default invariant to image rotation, trans-lation and scaling. Moreover, the estimation of both centroid points and profile vectors in Algorithm 1 includes exclusively mean pixel values. As a result, the mean-based coupled decomposition outcome is expected to be invariant to both additive white noise and global radiometric differences due to different lighting conditions or transparency of the atmosphere. Similar conclusions can be drawn for match-based coupled decomposition, except from the non-theoretical invariance to geometric transforms. However, since the corresponding points are established through a single matching pair of points, elaborate features and matching can be employed to confirm that the corresponding points would be robust to geometric transforms.

Match-based coupled decomposition is additionally robust to local changes, i.e. changes that happen only to a limited image area, since such changes are expected to have limited effects both on the determination of the corresponding points and on the points’ mean profiles. On the other hand, robustness to local changes is not straightforward for the mean-based coupled decomposition variation.

Actually, in order to examine this property, we assume that two target images, W and W, are to be matched with a refer-ence image, Z . We further assume that Z and W differ only in terms of geometric transformations, additive noise and global radiometric differences, i.e. that by using Z and W we can achieve perfectly accurate mean-based coupled decomposition (meaning that no points in Zi, Wj, i = j refer to the same

point in world coordinates). Additionally, W is considered identical to W, except an area in the bottom-right corner. In Fig. 4a, the reference image plane is demonstrated (M =4), along with the Z sub-image boundaries for both input pairs, i.e. Z -W and Z -W (solid and dashed lines, respectively). We focus this analysis on the top-left sub-image Z1, examining

(7)

whether the boundary “error” triggered by the modification in the bottom-right sub-image Z2is propagated to the sub-images

in which Z1is decomposed in further coupled decomposition

iterations.

Apparently, local change robustness can be examined via the ratio of the distance of corresponding sub-image centroids. More specifically, if p0(m1,m2)( p0(m1+d1,m2+d2)) is the

image centroids of the reference image estimated through cou-pled decomposition of Z and W (Z and W) and p1(m1,m2)

( p1(m1+d1,m2+d2)) is the corresponding top-left sub-image centroid, then a local change robustness measure R is

R=

d12+d22/

d12+d22 (11) R depends on several parameters. In order to simplify the analysis, it is assumed that i) due to the modification in the bottom-right area, the centroid is displaced evenly in both coordinates, i.e. that d1/m1 = d2/m2 = λ1 and ii) the

top-left centroid p1 coordinates are analogous to the image

centroid p0, i.e. that m1/m1=m2/m2=λ2.

The number of pixels that lie in the top-left sub-image is N1 = m1m2 when decomposing Z coupled with W while

N1 =(1+λ1)2m1m2when decomposing Z coupled with W,

i.e. in the second case 21+2λ1)m1m2 additional pixels are

assigned to the top-left sub-image. Consequently, m1+d1 =λ2m1(1/(1+λ1)2)

+λ31(m1+d1)((λ21+2λ1)/(1+λ1)2) (12)

whereλ2m1=m1is the top-left sub-image centroid longitude

of the N1 pixels that are common to the top-left sub-image

centroid corresponding to W , andλ31(m1+d1)is the centroid

longitude of the pixels that are assigned to the top-left sector only when Z is decomposed coupled with W.λ32(m2+d2)

is the corresponding latitude. By further assuming that λ31=λ32=λ3, d12+d22= m21+m22λ11+2) ×2−λ31+1))/(1+λ1)2 (13) R =1+2)(λ2−λ3λ1−λ3)/(1+λ1)2 (14)

In Eq. 14, λ1 represents the relative image centroid

perturbation due to a local change,λ2the normalized original

top-left sub-image centroid position (relative to the image centroid position) andλ3 the centroid position of the changed

pixels relative to the perturbed image centroid position. We have experimentally confirmed the validity of the assumptions of Eq. 11, finding that in the testset employed in this work, that 98% of the values of (d1m2)/(m1d2)

vary between 0.93 and 1.07 and the 98% of the values of (m1m2)/(m2m1) vary from 0.96 and 1.04. Moreover,

the experimentally determined value ranges of λ1, λ2, λ3

are [0,0.1], [0.3,0.7] and [λ2,2λ2]. It can be

straightfor-wardly computed that for this range |R| < 1. For example, by selecting λ1 to be randomly selected in a range from

0 to λmax, where λmax =0.01,0.02,0.05,0.1 and averaging

over 105runs, the estimated mean|R|value was 0.368, 0.372, 0.383 and 0.4, respectively. The results of a small experiment on a planetary image (Fig. 4b) are in line with the reported

mean |R| values. Such values signify that a local change disturbing the image centroid position is minimized by>2.5 at each iteration. Consequently, even when the number of iterations is small, the mean-based coupled decomposition outcome is robust to moderate local changes.

Additional evidence for the method robustness is given by the fact that image matching is typically robust to small displacements of the image boundaries. As a matter of fact, if a decomposition in sector centroid p generates a sub-image Zi with Ni pixels, Ci of which are correctly

matched with pixels of Wi, and p moves by d pixels, then

the number of pixels Ni in the new sub-image Wi would be (Nid2)Ni(Ni +d2). Consequently, the expected

value of correctly matched pixels that will be missed is less than Cid2/Ni. If Zi is assumed square with size d, then the

expected percentage of missed pixels is less than (d/d)2. In mean-based coupled decomposition, d depends on the image size as well as the total number of iterations, K . Thus, in the case of large-sized images, the generated sub-image sizes are on the order of hundreds or thousands of pixels, which implies that even a misplacement of tens of pixels is not expected to significantly degrade the algorithm output.

However, when severe local changes are expected, match-based coupled decomposition is suggested to be employed instead.

2) Computational Concerns: The most computationally demanding part of Algorithm 1 is the interpolation required for the mean profile estimation. However, when dealing with large images, as in this work, the digital image pixel grid provides a rather dense image representation, which can be used to circumvent interpolation in the mean profile estimation step. More specifically, in order to generate the mean profile, all image pixels are parsed sequentially and the angleθ1i with

the corresponding image or sub-image centroid p1is estimated

and quantized, before the pixel illumination is adjusted to the corresponding mean profile.

In this way each pixel is visited only twice at each iteration in mean-based coupled decomposition (once for the profile vector estimation and once for sector centroid estimation) and only once at each iteration in match-based coupled decomposition, since the corresponding points are estimated through matching. Actually, the overall computational complexity of mean-based coupled decomposition and match-based coupled decomposition is O(2K(NZ+NW)+Tang)and

O(K(NZ +NW)+Tang +Tmat), respectively. In the above

equations K is the number of iterations, NZ and NW the

num-ber of pixels at the reference and target image, respectively, Tang the computational time related to the estimation of the

angle that maximizes the correlation of profile vectors and Tmat the computational time needed to identify the

match-based corresponding points in Z and W . In large images Tang and Tmat is negligible, i.e. O(2K(NZ+NW)+Tmat)

O(2K(NZ +NW)) and O(K(NZ +NW)+Tang +Tmat)

O(K(NZ +NW)). As is subsequently shown, even for very

large-sized images, only a small number of iterations is required (typically K5), while both NZ and NW are much

(8)

Algorithm 2 Tie-Point Extraction Using Coupled Decomposition

of the coupled decomposition algorithm is linear to the sum of the total number of pixels in the reference and the target image, independently whether their size is almost equal or not. This small increase in the computational cost, caused by the coupled image decomposition, results on a significant decrease in the overall computational cost, as will be demonstrated in the experimental results section.

V. IMAGEMATCHINGUSINGCOUPLEDDECOMPOSITION

The set of corresponding sub-images, which is the output of coupled decomposition, can be used to develop several image matching solutions that focus either on the decrease of the computational complexity required for image matching or on exploiting the information redundancy associated with the fact that all sub-images belong to the same image. In this section, we introduce two such techniques.

A. Tie-Point Extraction Using Coupled Decomposition In this rather straightforward technique, the coupled decomposition outcome is used to impose spatial constraints on the tie-points of the target image that can be matched to the tie-points of the reference image. When no overlap is used (i.e. a=0%) this is equivalent to decomposing both reference and target images in corresponding sub-images and performing tie-point extraction independently on each pair of sub-images, before aggregating the identified tie-points. A non-zero overlap rate a can be used to determine the tradeoff between computational complexity and the number of identified tie-points. The actual algorithm is presented in Algorithm 2.

The above algorithm achieves a significant reduction in the required computational time. Supposing that in the reference image (the target image) the total number of extracted interest points are CZ (CW), the candidate matches that need to be

checked are reduced from CZCW to

CZiCWi, where i is

the sub-image index. In large images, in which CZ and CW

may be on the order of millions, this constitutes a substantial computational gain. In this section, we demonstrate that the computational gain increases with K (i.e. the total number of iterations of Algorithm 1) and decreases with overlap rate a.

In order to establish this property for mean-based coupled decomposition, it is initially assumed that a = 0, M is the number of radial sections per iteration, K the total number

of iterations, and CZ and CW the total number of tie-points

in Z and W , respectively. Then the average number of tie-points in each sub-image would be CZ/MK and CW/MK.

Moreover, the fact that the image centroid usually lies near the image center implies that, generally speaking, CZi and CWi

are more probable to take values near their average than take extremely low or high values, i.e. that the probability density function maximum and the average value coincide. It is also fair to assume that the distribution is symmetric around its mean value since there is no generally applicable reason for P(CZi(CZ/M K)) = P(C Zi(CZ/M K)) or P(CWi(CW/M K)) = P(C Wi(CW/M K)). Finally,

CZi and CWi are not expected to be mutually independent,

since coupled decomposition generates pairs of matching sub-images.

All of the above specifications are satisfied if the distribution of CZ and CW is a bivariate normal distribution with mean

vector(CZ/MK,CW/MK). In this case the mean value of the

product CZiCWi is

E[CZiCWi] = Z WσZσW +(CZCW)/M

2K (15)

whereσZ (σW) is the marginal standard deviation of CZi (CWi)

and Z W is the correlation coefficient. If the reference and the

target image match Z W ≈1 and the achieved computational

gain G can be expressed as G= CZCW

CZiCWi

M2K CZCW CZCW+M2KσZσW

(16) If a >0 and assuming that the number of extracted interest points is proportional to image size, G becomes

GM 2K (1+a)2 CZCW CZCW +M2KσZσW (17) Eq. (17) can be refined by considering that when an already sampled population is sub-sampled, resulting in B output bins for each input bin, the output standard deviation is √B times smaller than the input one [31]. If σZ0(σW0) is the

standard deviation of the reference image (target image) the feature points distribution at the first iteration, Eq. 17 is rewritten as: GM 2K (1+a)2 CZCW CZCW +MK+2σZ0σW0 (18) Since the mean number of points assigned to each sub-image in the first iteration is CZ/M and CW/M, the final expression

of computational gain G is GM 2K (1+a)2(1+MKC V ZC VW)MK (1+a)2C V ZC VW (19) where C VZ = σZ0M/CZ, C VW = σW0M/CW are the

coefficients of variation [31] for the feature point distribution of the reference and target image, respectively.

While the above analysis stands for the mean-based coupled decomposition variation it may be extended for the match-based coupled decomposition variation if the single corresponding point needed for decomposition is identified near the center of Z and W . This can be achieved if Z feature

(9)

Algorithm 3 Coupled Decomposition Enhanced Fundamental Matrix Computation

points are not parsed randomly but according to their distance from the image center, favouring feature points near the center of Z .

Note that the computational gain expressed in Eq. (19) refers to the number of candidate matches that are examined in a brute-force approach that doesn’t employ any data structure to organize interest points. However, it is apparent that coupled decomposition is orthogonal to data structuring, thus it can be used along with data structuring to further optimize the computational gain if this is gauged necessary.

B. Partial Fundamental Matrix Estimation and Sub-Image Matching Assessment

The previous sub-section examined the use of coupled decomposition for matching of coupled decomposition. However, coupled decomposition modifies the tie-point extraction part of image matching. Consequently, coupled decomposition may be encompassed in multiple matching schemes, each with a different architecture, objectives, etc.

In this sub-section, we introduce an algorithm that deals with one of the most common image matching objectives, i.e. the estimation of the fundamental matrix [32] that describes the epipolar geometry between two images. Actually, being a sub-image matching scheme, coupled decomposition may comply with the partial epipolarity of image pairs acquired by pushbroom cameras [27], since a distinct fundamental matrix can be estimated for each sub-image. The algorithm output also includes a rank of the corresponding sub-images according to the reprojection error. Coupled decom-position output is robust to local changes (Section IV-C1), thus, these are assumed to correspond not just to sub-image misalignments but also to grey-level image changes.

The algorithm is presented in Algorithm 3. In the first 6 algorithmic steps the matching quality of each corresponding sub-image is estimated through a common state-of-the-art practice, used for validation when no ground-truth is available. More specifically, the tie-points’ set is split into training

and validation sub-sets. The latter is assumed to include actual matches that are going to validate the accuracy of the fundamental matrix that is computed through the former. A fixed number of iterations ensures a decent approximation of the minimum reprojection error.

The intermediate output of the first 6 steps is a reprojection error vector, di. If di is a normal distribution sample, then the

mean and the median reprojection error would concur, while the 84.1% quantile error value would correspond to the sum of the mean (i.e. the median) and the standard deviation. Taking into account that 95.5% of a normal distribution N(μ, σ) values are smaller than T0 = μ+2σ, T0 is employed as

an outlier detection threshold (μin the above equation is the median value).

Summarising, for each sub-image pair with reprojection error higher than T0 a region is identified as with

grey-scale change (and may be further examined in applications that require the characterisation of the change), while the fundamental matrices for the rest of the sub-images are retrieved.

VI. EXPERIMENTALRESULTS

A. Datasets and Experimental Setup

The experimental dataset consists of 30 pairs of remote sensing images from orbiter missions around Mars and Phobos (Table I). The image pairs differ in relation to the camera and data type, the resolution, the image size, etc. More specifi-cally, the 30 pairs can be classified into 6 sub-sets, named “HiRISE JPG”, “CTX”, “Phobos SRC”, “HRSC”, “Mixed” and “HiRISE”.

The first and the last sub-sets come from the NASA Mars Reconnaissance Orbiter (MRO) mission camera HiRISE [1]. The former (pairs 1−6) includes 6 pairs of subsampled images of 7−12 megapixels, which constitute a dataset with a size similar to the one that is currently available from the latest smartphones, digital cameras, etc. The latter (pairs 28−30) includes 3 pairs of raw HiRISE images, which depict large areas of the Martian surface with 25cm per pixel resolution [1], reaching up to 2.4 gigapixels. The second sub-set (pairs 7−12) are 6 image pairs taken by the NASA MRO mission Context Camera (CTX) [33]. The resolution in this dataset is on the order of few metres per pixel (typically 6−12 metres [33]), while their size vary from 18 to 87 megapixels. The third subset (pairs 13−18) consists of 6 pairs of small planetary images, which are mainly used to examine the size limits under which coupled decomposition becomes redundant. These are less-than-one-megapixel Phobos images taken at a maximum resolution of 2.3 metre per pixel resolution from the ESA Super Resolution Channel (SRC) of High Resolution Stereo Camera (HRSC) which is on board Mars Express mission [34]. The latter mission is also the source of the fourth subset (pairs 19−24). This consists of 6 pairs of HRSC [34] imagery, with a resolution of 12.5−25 metres per pixel and a size of 27 to 220 megapixels. Finally, the fifth dataset (pairs 25−27) includes 3 mixed pairs, i.e. an image acquired from CTX image and another from HRSC.

In the current experimental setup, four measures of comparison are employed: the computational time T , the

(10)

TABLE I

IMAGEPAIRSDATASET(SEETEXT FORFURTHERDETAILS)

number of extracted tie-points C, the rigid mean reprojection error ErrR and the warping mean reprojection error ErrW.

Rigid mean reprojection errors are used to contrast fundamental matrix estimation versus partial fundamental matrix estimation (as discussed in Section V-B), while warping mean reprojection error is used to compare the tie-point quality that is achieved from the introduced techniques with the quality achieved from full-image matching and two techniques taken from the literature, namely, [22], [23].

The evaluation of the fundamental matrix estimation is based on the fact that for the employed dataset (as well as most planetary image datasets) there is no unambiguous ground-truth to estimate the achieved accuracy. In order to circumvent this inherent impediment, we estimate the accuracy of the fundamental matrix estimated from a tie-point set, S1, relative

to another set of tie-points, S2. This is achieved by initially

splitting the sets of tie-points into a training and a testing set, Sitr and Sit e, i ∈ {1,2}. The training and testing sub-sets are

randomly selected so as SitrSit e = Si, SitrSit e = ∅,

#(Sitr)=#(Sit e). Subsequently, the fundamental matrix

com-puted from S1tris used to project S1t eS2and the fundamental

matrix computed from S2tr is used to project S2t eS1. The

two reprojection errors are linearly combined using weights proportional to the cardinality (i.e. the number of matched tie-points) of sets S1, S2.

However, when comparing more than 2 methods the above measure is not practicable since it requires pairwise compar-ison of all methods. Consequently, tie-point quality that is achieved through some image matching method was evaluated using warping reprojection error. More specifically, for each image pair of Table I, full-image matching was performed. The total number of estimated tie-points N is a constant parameter for all methods. Since sub-image matching consistently esti-mates more tie-points than full-image matching, RANSAC was employed to prune these sets to N tie-points. Subsequently, the tie-point sets were randomly split into a training and a testing sub-set, the former used to estimate a 3D warping function

and the latter to evaluate the mean reprojection error caused by the warping function. Finally, the reported error value is the average of the mean reprojection errors, estimated over 1000 iterations.

Apart from the parameters and algorithms that are examined in this section, the rest of the implementation is based on state-of-the-art techniques. Interest point detection was conducted through Harris-Laplace corner detection [30], SIFT [11] was selected as the local descriptor, the “gold standard” method of [32] was used for fundamental matrix estimation and least squares for warping function parameter estimation.

B. Parameter Tuning

In this subsection, we test the dependence of the tie-point matching outcome on the coupled decomposition algorithm parameters. As explained in Section IV-B, there are 4 parameters involved in coupled decomposition, K , M, a andθ. The following analysis is conducted on mean-based coupled decomposition, however, the examination of match-based coupled decomposition (which is not presented here due to length limitations) led to similar conclusions, the only difference being that overlap seems redundant (i.e. a = 0% is the default value for match-based coupled decomposition). The number of sub-images that coupled decomposition generates is MK. Consequently, both of these two parameters determine the decomposition coarseness. M is also associated with the grid shape, while K with the robustness to local changes (Section IV-C1). When dealing with planetary images, the robustness to local changes is very important, while no specific symmetric pattern is expected. Consequently, in the following experiments, we selected a fixed, small value for M (M = 4), and used K to examine the dependence of tie-point performance from the number of generated sub-images.

In the experiments, we examined five alternatives for K (1 ≤ K ≤ 5), two profile vector quantization angles

(11)

TABLE II

θIMPACT ON THETIE-POINTMATCHINGALGORITHM. EACHSCORE

IS THERATIO OF THEMEANSCOREWHENθ=π/720 DIVIDED BY THEMEANSCOREWHENθ=π/180

Fig. 5. A log-linear plot of the coupled decomposition runtime ratio versus the image size. Red dashed line: TC D/T0. Black solid line: TC D/T1. (θ ∈ {π/180, π/720}) and three overlap ratios (a ∈ {0%, 20%,50%}). The parameter tests were mutually indepen-dent through the neutralization of the effect of the other two parameters. For example, when the computational time T relatively to θ was examined, then (for each image pair of Table I) T(K,a, θ)was estimated for all valid parameter combinations, and the ratio T of the mean T value ofˆ combinations with θ = π/720 over the mean T value of combinations withθ=π/180 was the evaluation output. Table II summarizes the algorithm parameters as a function of the profile vector quantization angle θ. The reported scores are the mean and the maximum computational time (E[ ˆT], Max(Tˆ)), the maximum computational time if the Phobos image pairs are ignored (Max(Tˆ∗)), the mean, max-imum and minmax-imum number of tie-points (E[ ˆC], Max(Cˆ), Mi n(Cˆ)), the rigid mean reprojection error (E[ ˆErrR]) and the number of image pairs (out of 30 image pairs) for which the rigid reprojection error ofθ =π/720 is smaller than the reprojection error ofθ =π/180 (#(Errˆ R <1). Each score

is the ratio of the mean score when θ=π/720 divided by the mean score when θ=π/180.

Table II provides evidence that θ does not play a critical role in the algorithm performance. As a matter of fact, all three metrics (i.e. computational cost, tie-point quantity and reprojection error) seems to fluctuate randomly around 1 for all large-image pairs. Following these clues, we have decided in our implementation to use a fixed value ofθ=π/720.

It should be noted that in the small-image pair sub-set (Phobos sub-set) the relative computational time seems to increase withθ. As a matter of fact, the mean computational time for this sub-set is 1.364, while all 6 pairs exhibit computational time higher than 1.199. The reason for this difference between small-image and large-image pairs is that quantization density has an effect on the required coupled

TABLE III

THEIMPACT OF THEOVERLAPRATIOaON THEIMAGEMATCHING

ALGORITHM. THERESULTSARENORMALIZED BYDIVIDING THEACHIEVEDSCORESWITHTHOSECORRESPONDING

TO0% OVERLAP

TABLE IV

INCREASE OFCOMPUTATIONALTIME ANDNUMBER OFTIE-POINTSVERSUSOVERLAPRATIOa

decomposition time, which constitutes a significant part of the computational workload only in small-image pairs.

This is further shown in Fig. 5, in which the ratio of the cou-pled decomposition runtime TC Dover the total required image

matching time is plotted. When no coupled decomposition is used the total time is denoted with T0(TC D/T0is shown in the

red line) while when coupled decomposition is used with T1

(TC D/T1is shown in the black line). Runtime is plotted against

“mean pair size” (i.e. the harmonic mean of the pixel number of the two images) for the first five datasets. The one excluded is the HiRISE dataset, for which T0is prohibitively large. As a

hint, the “largest” included pair is No. 21, for which image matching without coupled decomposition requires more than 10 days in a 2.7 GHz Intel Core i7, 16Gb RAM workstation. The size scale represented in Fig. 5 covers images from less than 1 megapixel to 100 megapixels. It can be deduced that as the size increases, coupled decomposition stage constitutes a progressively smaller part of the image matching workload. Finally, red line shows that TC D > T0

for images smaller than 1 −2 megapixels. Consequently, image matching computational time can benefit from coupled decomposition only when dealing with images larger than 1−2 megapixels.

As already noted, three overlap levels were tested, 0%, 20% and 50%. Table III summarizes the overlap ratio impact in the employed experimental dataset. Actually, the reported scores in Table III are the mean computational times, the mean number of estimated tie-points and the mean reprojection error, both for 20% and 50% overlap. These scores use the same convention as in Table II, i.e. they are normalised by the corresponding scores when no overlap is used. The first conclusion that can be derived from this Table is that a high overlap ratio enhances both the quality (i.e. the reprojection error) and the quantity of the extracted tie-points, but with a large consumption of the available computational resources.

In order to get a more complete insight, the computational time and number of tie-points scores are reported separately for each dataset in Table IV. This table provides evidence that

(12)

Fig. 6. Image pair 12 (Table I). The right image is a close-up of the encircled area of the left image. The image sizes are asymmetrically adjusted for printing reasons. The actual size of the right image is 1/3 of the left image.

the overlap triggers a quadratic increase in the required com-putational time when dealing with images of size on the order of tens of megapixels. As the image size exceeds 1 gigapixel the computational time increase becomes approximately cubic. On the other hand, the increase in the quantity of tie-points doesn’t depend on the image size but on whether the specific pair setup favours a correct coupled decomposition. For example, if the local changes on the target image are so extensive that the assumptions of Algorithm 1 don’t stand, then the resulting coupled decomposition output discrepancy may be alleviated through the image enlargement implied by a non-zero overlap rate. Such a case is demonstrated in Fig. 6, in which image pair No. 12 is demonstrated. The two images of pair 12 only partially match, thus the number of points increase that is achieved via an overlap of 50% is more than 120%. On the contrary, when the reference and the target image differ mostly in terms of geometric transformations, additive noise and global radiometric difference, coupled decomposition would benefit less from a non-zero degree of overlap. Consequently, selecting the employed overlap ratio a depends on an a priori estimate of the extent of the non-invariant transformations that are required to match the reference and the target image. In our implementation we selected a=20% as a tradeoff between accuracy and speed.

The final parameter that is analysed is the number of iterations, K . As already mentioned, this is related to the required local change robustness and the false rejection ratio. Additionally, as shown in Fig. 7, computational complexity gain is controlled mainly through the tuning of K . Fig. 7 plots the logarithm of the computational time required for tie-point matching (counted in seconds) as a function of the image pair size, for different K values.

Fig. 7 implies that there is a certain “size zone”, in which coupled decomposition achieves optimized speed when using a fixed number for K . So, for images on the order of megapixels 1−2 iterations achieve the optimal speed, while K = 3 is proposed for images of size around 10 megapixels, K = 4

Fig. 7. The logarithm of the required computational time, counted in seconds, as a function of the pair size.

for images between 30 to 100 megapixels and K = 5 for images on the order of hundreds of megapixels. Finally, K =6 gives the fastest image matching pipelines for image pairs with more than several gigapixels. In this work, K is tuned to the maximum score that allows each sub-image to have on average 1,000 SIFT points. This score, which is consistent with the analysis of Section III, implies that 2≤K ≤6.

C. Method Comparison

After tuning coupled decomposition algorithms we continue with the evaluation of their performance, when comparing both will full-image matching and with other state-of-the-art techniques. More specifically, the included techniques are:

The full-image matching algorithm (named F) that performs straightforward tie-point extraction without involving coupled decomposition or any other sub-image matching approach. This is the baseline technique.

• Mean-based coupled decomposition algorithm (named C Dc), using the parameters implied by the previous

section.

• Match-based coupled decomposition algorithm (named C Dm), which also use the parameters implied by the

previous section.

The pyramidal scheme of [21] (named Pv), which establish matchings in the coarsest levels and use them to decompose the image planes into corresponding sub-images through Voronoi tessellation, and, finally,

The pyramidal scheme of [20] (named Pr), which register

the images in coarse level and decompose the registered images into 4 corresponding sub-images, using the center as the splitting point.

As already said, for all methods the number of retrieved tie-points was the same and determined by F. More specifically, full-image matching was followed by RANSAC [23] to prune any erroneous matches. The number of tie-points retrieved by this method for each image pair of the dataset is demonstrated in the second column of Table V. Exception to this is the mixed pairs (i.e., pairs 25–27), for which full image matching (as well as the state-of-the-art methods) fails. The number of tie-points in this case is the number of tie-points estimated from C Dm,

(13)

TABLE V

TIE-POINTACCURACY ANDCOMPUTATIONALCOST OFCOUPLEDDECOMPOSITIONMETHODS(C DcANDC Dm)INCOMPARISON TOFULL-IMAGEMATCHING( F )ANDPYRAMIDALSCHEMES( PvANDPr)

Apart from the two first columns, Table V summarises the performance of the 5 examined techniques regarding both computational time and warping mean reprojection error. A first conclusion that can be learned from Table V is the extreme computational cost of full-image matching when dealing with large images, which is not justified by its perfor-mance. As a matter of fact, full image matching may be tens of times slower than all of the included sub-image matching techniques, while not being more accurate than them. For example, full-image matching of image pair 9 requires more than 3 days and its mean reprojection error is 2.2742 pixels, while C Dm requires almost 3 hours and its mean reprojection

eror is 0.8797 pixels.

In summary, F is less accurate than C Dc in 16/27 image

pairs, than C Dm in 17/27 image pairs, than Pv in 13/27 image

pairs and than Pr in 5/27 image pairs. When focusing on large

images the full-image accuracy is even worse, as expected from the analysis of Section 3. From the 4 largest image pairs (pairs 9, 12, 20 and 21) only for pair 21 full-image matching performs better than sub-image matching. On the other 3 pairs its performance is far worse than the maximum achieved from sub-image matching. More specifically, for pair 9 its mean reprojection error is 2.59 times larger than C Dm, for pair

12, 7.98 times larger than C Dr and for pair 20, 2.03 times

worse than C Dm. The most characteristic example is pair 9,

which consists of large and noisy images (due to dust in the atmosphere on the time that they were acquired). The

low-quality of the images along with their large size cause the full-image matching to be much worse both in terms of accuracy and of computational cost from all examined sub-image matching techniques.

Passing on the comparison of individual techniques, from the 27 image pairs the maximum accuracy was achieved from C Dm on 12 pairs, from C Dr on 7 pairs, from F on 4 pairs,

from Pv on 3 pairs and from Pr on 1 pair, i.e. the coupled

decomposition technique outperformed pyramidal schemes as well as full-image matching on 19/27 images. Additionally, while coupled decomposition is consistently much faster than full-image matching (reaching up to 26.5 times for image pair 9 and match-based coupled decomposition) for large images, it is faster than pyramidal schemes in half of the examined pairs. More specifically, a coupled decomposition variation required the least computational time of all 5 tech-niques in 13/27 image pairs, usually C Dm since the non-zero

overlap ratio of mean-based coupled decomposition increased the corresponding computational cost. Even when C Dm didn’t

require the least time, it’s cost was near the optimal value. Actually, C Dm required less than 1.25Topt (where Topt is

the optimal time) for all but 2 image pairs (pairs 5 and 16). On the other hand, while C Dc computational cost was usually

larger than the rest of the sub-image matching methods, it never reached prohibitive levels. For the 23 image pairs that C Dc didn’t fail, 16 required less than 1 hour to match the

Figure

Fig. 1. Flowchart of the image matching pipeline from image input until tie-point extraction
Fig. 2. An example of corresponding image pairs from Martian imagery.
Fig. 3. The F-score dependence from the tie-point number N W and the NNDR threshold H as a function of the non-matching distance standard deviation σ
Fig. 4. (a) Z image plane and the centroid perturbations when decomposing Z coupled to W  instead of W
+5

References

Related documents

Using EnCase to Examine Windows Event Log Files (continued). Figure 5-4 EnCase allows an investigator to find the

1 Glucose ( a ) and xylose ( b ) yields obtained following enzymatic hydrolysis of hybrid poplar pretreated under our standard (reference case) or our modified Cu-AHP

Present current campus sustainability initiatives to the Cornell Council’s Sustainability Development &amp; Environmental Stewardship sub- committee at the annual meeting

The approach is multidisciplinary, taking elements and ideas from several theoretical frames related to consumers ’ decision-making, for example Decision theory, Consumer

Accordingly, it all points to the conclusion that the unemployment rate in Bulgaria for 2018 by dis- tricts is significantly higher than the average for the country – 7 out of the

hardness on the wear rate of PCD tools corresponds to that of carbide tools, but the wear rate increase due to the reinforcement hardness is not as high. In [30], the

Comparison of moderate hyperventilation and mannitol for control of intracranial pressure control in patients with severe traumatic brain injury — a study of cerebral blood flow