Extensions to the standard ransac algorithm for efficiency and robustness

(1)

*Corresponding author: Wenyan Ci ISSN: 0976-3031

Research Article

EXTENSIONS TO THE STANDARD RANSAC ALGORITHM FOR EFFICIENCY AND

ROBUSTNESS

Wenyan Ci., Jue Wang., Mingxiang Zhu and Aiyu Dou

School of Electric Power Engineering, Nanjing Normal University Taizhou College,

Taizhou, China

DOI: http://dx.doi.org/10.24327/ijrsr.2018.0912.2942

ARTICLE INFO ABSTRACT

Over the past decade, many improvements have been made to RANSAC, each of which addresses the specific weaknesses of the original algorithm. However, there are relatively few comprehensive studies on these developments. The purpose of this paper is to fill this gap by researching related technologies, so as to promote the development of new algorithms. This paper first introduces the standard RANSAC algorithm, and discusses its mechanism and limitations. Then, some extensions of RANSAC algorithm are introduced in detail. These extensions solve the limitations of robustness and efficiency. By combining these ideas, the performance of the algorithm can be further improved in terms of real-time performance and robustness.

INTRODUCTION

The RANSAC algorithm proposed by Fischler and Bolles (Fischler and Bolles, 1987) is a well-known method for estimating model parameters in the presence of outliers. It is based on iteration and hypothesis verification framework, and has robustness to contaminated observation data. The basic idea of RANSAC algorithm is to randomly extract a sample set from the complete set, calculate the model parameters, and then verify the other data points in the complete set with the obtained model parameters. After many iterations, the model parameters that can obtain the highest consistency in the data points are regarded as the solution of the model, while those data points which are inconsistent with the model parameters are regarded as the outliers. The size of the sample set is usually set to the minimum value that can be used to solve the model parameters. For example, the value is usually 3 in stereo vision. The standard RANSAC framework consists of hypothesis and verification stage, which are iterated over a set of observations. In the first stage, a set of randomly selected points is used to instantiate the model, in which the cardinality of the sampling set is equal to the minimum number of points needed to determine the parameters of the model, so the sampling set is also called the minimum set. The second stage of the algorithm is to determine the observation values

consistent with the model, which constitute a consistent set. The consistency here is usually defined based on some predefined error thresholds. After several iterations, the algorithm returns to the model that obtains the maximum consistent set.

RANSAC algorithm is an uncertain algorithm, i.e., the results of each execution of the algorithm may be different. It obtains reasonable results with a certain probability. The more iterations, the greater the probability. In fact, RANSAC algorithm has become a general method of excluding outliers in visual navigation system (Maimone et al., 2007, Deigmoeller and Eggert, 2016). In recent years, some improved versions of RANSAC algorithm have appeared. Proposed methods which focus on consensus measure, sampling strategy, hypotheses verification are discussed in the paper.

Consensus Measure

The loss function of RANSAC algorithm sets the value of inliers to 0, without considering the difference of error between inliers. When the setting of



2 is larger, more solutions of the same

C

will be generated, which makes the estimation of the model worse. This section introduces two robust consistency measurement methods.

International Journal of

Recent Scientific

Research

International Journal of Recent Scientific Research

Vol. 9, Issue, 12(A), pp. 29842-29846, December, 2018

Copyright © Wenyan Ci et al, 2018, this is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.

DOI: 10.24327/IJRSR

CODEN: IJRSFP (USA)

Article History:

Received 6th_{September, 2018} Received in revised form 15th October, 2018

Accepted 12th October, 2018 Published online 28th _{December, 2018}

Key Words:

(2)

MSAC

Inspired by M-estimator, Torr and Zisserman (Torr and Zisserman, 2000) proposed a new form of loss function:

2 2 2

2

2 2 2

( )

i i

MAC i

i

e











 





(1) At this point, the function value of the outliers is still a constant, but the function value of the inliers become related to the quality of data points and model fitting. This new estimator is called MSAC (M-estimator Sample Consensus). Literature (Torr and Zisserman, 2000) shows that the accuracy of MSAC is 5-10% higher than that of standard RANSAC for epipolar geometry problems.

MLESAC

MLESAC (Maximum Likelihood Estimation Sample Consensus) algorithm (Torr and Zisserman, 2000) has further developed the idea of MSAC. Instead, maximize the following objective functions:

log ( | )

MLE i

i

C





p e z

₍₂₎

where

Z

is the model parameter,

log ( | )

p e z

i _{represents the}

probability distribution model of error. In order to consider both noise and outliers, the error distribution of data points is expressed as a mixture model of Gauss distribution and uniform distribution. Assuming that the error of the inliers conform to the Gauss distribution



(0, )



with standard deviation of



and the outliers conform to the uniform distribution



(0, )



, the following distribution model is established:

2

1

(

| )

exp(

)

(1

)

2

i MLE i

e

p

e z







 









(3)

where the mixed parameter



is the proportion of the inliers in the observation set, and its value can be estimated by EM (Expectation Maximization) algorithm;



is a constant, reflecting the size of the search window in the matching algorithm. Similar to MSAC, MLESAC can improve the accuracy by 5-10% compared with the standard RANSAC algorithm, but the computational complexity is increased.

Sampling Strategy

In many implementations, it may be possible to incorporate prior information that allow observations to be scored according to how likely they are of being an inlier. This can have a dramatic effect on the efficiency of RANSAC, particularly for low inlier ratios. Several approaches take advantage of this fact to generate better sample sets.

Guided-MLESAC

One disadvantage of MLESAC is the need to calculate the mixed parameter



. However, this parameter reflects the prior attributes of data and does not depend on a single model. It is better to assume that there is a larger

C

MLE_{for any mixed}

parameter (Tordoff and Murray, 2002). Therefore, it is not required to solve the mixed parameter in each iteration. In

order to select the best hypothesis, it is sufficient to use a fixed mixing parameter (e.g.





0.5

) for all hypotheses.

Furthermore, if we can get an independent priori value of whether the matching pair is correct or not, we can get better results than using global mixed parameters. Formula (3) can be rewritten as follows:

2

1

(

| )

( )

exp(

)

(1

( ))

2

i

G MLE i i i

e

p

e z

p



p







 











(4)

where



i_{is the marker variable of whether the data point}

i

_is

an inlier, and

p

( )



i _{is a prior probability.}

In two view geometry, the prior probability can be obtained by matching scores of feature matching. Assuming that there are

i

n

possible matches of feature point

i

, their correctness is expressed by



ij_{, and the matching score is expressed by}

s

ij_,

where

j



1...

n

i_{. In small baseline image pairing, the}

matching score

s

ij_{of mismatching}



ij



can be empirically expressed as follows:

2

3

(

|

)

(1

)

4

ij

ij ij

p s



s







(5)

For correct matching, we have:

2

1

(

_ij

|

_ij

)

s

ij

exp[

s

ij

]

p s



a













 







(6)

where

a

is a normalized constant and



is a "compactness" parameter. Suppose that all possible matches of the feature

i

have the same priori value, i.e.,

p

(



ij

)



1 / (

n

i



1)

,

(

ij

)

_i

/ (

_i

1)

p



n







. Finally, given all matching scores, the correct probability of matching

ij

can be obtained:

,1...

1 1

( | ) ( | )

( | )

( | ) [ ( | ) ( | )]

i

i i i i

n

ik

ij ij ik

k j

ij i n n n n

ik _ik _ij _ij ik _ik

k j k j

p s p s p s







 

  











(7)

,1...

(

|

)

i ij i n

p



s

can be computed by

p s

(

ij

|



ij

)



and

(

_ij

|

_ij

)

p s



, and

p

(



ij

|

s

i,1...ni

)

can be used as an estimate of

( )

_i

p



in formula

p

G MLE

( | )

e z

i _.

(3)

PROSAC

The PROSAC (Progressive Sample Consensus) algorithm (Chum and Matas, 2005) uses the quality of data points as a metric in order to priority generation the hypotheses that are more likely to be effective. Unlike RANSAC which extracts samples from all data, its samples are extracted from a subset of the highest quality datas. In fact, the samples extracted by PROSAC are the same as those extracted by RANSAC, but the sampling order is different. PROSAC begins with testing the most promising hypothesis. As the quality score of data points decreases, its sampling strategy gradually tends to RANSAC. In PROSAC algorithm, matching pairs are arranged in descending order according to similarity. If the number of matching pairs is

N

and the corresponding ordered set is



N_,

then:

( )

_i

(

_j

)

_i

,

_j _N

i



j



q u



q u



u u





(8)

where

q

( )



is a similarity function.

Consider randomly extracting

T

N_{samples from all}

N

matching pairs, and the sequence is expressed as

 

1

N

T i i

M

_

.

Suppose that the sequence



( )



1

N T

i _i

M

 _{contains the same}

samples as

 

1

N

T i i

M

_

, but it is arranged in order of sample quality from high to low. A subset of the complete set of



N

is represented by



n_{, which contains n highest quality data}

points. Assuming that the sequence

 

1

N

T i i

M

_

contains

T

n_data

points on average that are all from



n_{, then:}

1

0

m

n N N

i

n

m

n

i

T

N

i

m





































(9)

Thus:

1 1

1

0 0

1

m m

n N

i i

n N

T

n

i

N

i

n

T

N

i

n

i

n

m

 



 

 









 



(10)

Finally, the recursive formula of

T

n1_{is obtained:}

1

n n

n

T

n

m





 

(11)

n

T

samples come from



n_{, and}

T

n1_{samples come from} 1

n



_

. Since subsets



n1_and



n_satisfy



n1





n



u

n1

, there are

T

n1



T

n_{samples containing one data point}

u

n1

and

m



1

data points from



n_{. In order to generate}

M

( )i

arranged in descending order according to sample quality,

1

n n

T

_



T

samples can be extracted. These samples consist of

a data point

u

n1_and

m



1

_{data points randomly sampled}

from



n_{, where}

n



m N

...

_.

Because the good hypothesis is generated early in the sampling process, the efficiency of PROSAC is significantly improved compared with that of RANSAC. However, when quality scores are not very useful (e.g., in scenarios with significant repetitive structures), the improvements are not significant.

Group SAC

GroupSAC (Kai et al., 2009) assumes that there are some natural groupings in the datas, some of which have a higher proportion of inliers, while others have a higher proportion of outliers. This kind of natural grouping exists in most of the problems we are interested in, e.g., feature pairs can be grouped according to optical flow in wide baseline matching.

In the standard RANSAC algorithm and many of its variants, it is assumed that the probability of one data point being an inlier is independent of other data points. For any minimal set

S

with

m

data points, the number of its inliers

I

S

complies with the distribution . Among them,

( , )

B m



_{represents binomial distribution;}



_{is the Bernoulli} test parameter, i.e., the probability of the inliers in

S

. Therefore, the probability that all points in

S

are inliers can be expressed as follows:

(

)

(

1)

i

S i

x s

p I

m

p











₍₁₂₎

where



i_{is the marker variable of whether the data point}

x

i

is an inlier. In the absence of prior information, the probability is



m. Although many works (Chum and Matas, 2005; Tordoff and Murray, 2002) have realized that the probability of inliers is not necessarily the same for different data points, they still regard the probability of inliers as independent.

(4)

HYPOTHESES VERIFICATION

The efficiency of RANSAC algorithm largely depends on the number of outliers and the total number of data points. The larger the proportion of outliers is, the more contaminated samples are obtained. The models generated from these contaminated samples have arbitrary model parameters, which do not contain any useful information. The standard RANSAC algorithm needs to test these wrong models for all data points. Therefore, a lot of computing time is spent on this meaningless test. Some methods attempt to optimize the verification stage by reducing the time spent on evaluating error models.

Although the number of samples required by RANSAC depends on the proportion of inliers and the confidence level, the number of all data points

N

affects the calculation time, because the hypothesis model needs to be tested at each data point. In particular, the full run time of RANSAC can be expressed as follows:

(

_G _v

)

t



k t



Nt

(13)

where

k

is the number of samples,

t

G_{represents the time}

required to calculate model parameters, i.e., the time required to generate hypotheses, and

t

v_{represents the time required to}

verify the model at data point. It can be seen from the formula that the value of

N

directly affects the time required for the hypotheses stage. The contaminated-free samples generated by the RANSAC algorithm are usually very few. If we assume that the consistent sets of these contaminated models are very small, we can abandon these bad models early in the verification stage. In fact, this assumption is valid in cases other than degraded data.

In recent years, many works have focused on making the verification stage of RANSAC more effective. The proposed method has the same basic idea: statistical testing for a small number of data points, and discarding or accepting models based on test results. In this case, Formula (13) can be rewritten as follows:

(

_G

)

t

k t

t







(14)

The time required to test a data point is set to unit time, and

t



represents the average number of data points that a model needs to validate. The purpose of statistical testing is to reduce the number of data points that need to be tested, thus reducing the time complexity of the test stage. However, the test may mistakenly reject the "good" model, it is necessary to increase the number of sampling to maintain a certain level of confidence in the solution. Generally speaking, the more datas tested, the less likely it is to reject a "good" model. So we need to find a balance between the number of test points and the number of sampling.

R-RANSAC Based on

T

d d, Test

R-RANSAC (Matas and Chum, 2004) first performs a pre-test on a small subset of data points, assuming that the number of data points in this subset is

d

, and .

d

<< N Only when all the

d

data points pass the test, the algorithm continues to test

the remaining

N



d

points, otherwise the test will be completed.

Two situations need to be considered: first, a "good" sample is extracted by probability

P

I_{. If this sample passes the pre-test}

with probability of



, then all

N

data points will be tested. Otherwise, if the sample fails to pass the pre-test with the probability of

1





, then only

t





samples need to be tested on average; Secondly, a "bad" sample is extracted with probability

1



P

I_{. If this sample passes the pre-test with}

probability of <<



then all

N

data points will be tested. Otherwise, if the sample does not pass the pre-test with the probability of

1





, then only

t





samples need to be tested on average. The average number of data points to be tested for each sample can be expressed as a function of

d

:

( )

_I

(

(1

)

(1

_I

)(

(1

)

t d

P



N



t



P



N



t



  















(15)



_and



_{can be calculated by the following formula:}

,

d d













(16)

where



denotes the proportion of inliers and



denotes the probability that a data point is consistent with a random model. The average time of pre-test can be expressed as:

1 1

(1

)

,

(1

)

d d

i i

t



i

 

t



i

 

 

 













(17)

The R-RANSAC algorithm derives the optimal setting of

1

d



_{by minimizing the function}

t

k t

(

G

t

)







. The

T

1,1

test first randomly extracts a data point to test the model, if the model is consistent with the data point, then continues to test; otherwise, it abandons the model and generates new samples. As mentioned above, R-RANSAC may mistakenly reject valid hypotheses and increase the number of samples compared with standard RANSAC. However, hypotheses generation takes less time, the pre-test method can generally reduce the running time of the algorithm.

R-RANSAC Based on SPRT Test

In 2005, Matas and Hum proposed R-RANSAC based on Sequential Probability Ratio Test (SPRT) (Matas and Chum, 2005). This method uses likelihood ratio to determine whether the estimated model is "good" model

H

g or "bad" model

H

b_.

The calculation formulas are as follows:

1 1

(

|

)

(

|

)

(

|

)

(

|

)

j

j b

r b

j j

r r g j g

p x

H

p x

H

p x

H

p x

H













(18)

(

_r

|

_b

)

p x H

(5)

respectively. If the r-th data point is consistent with the model parameters, then

x

r



1

_{, otherwise}

x

r



0

_.

p

(1 |

H

g

)

denotes the probability of randomly extracting a data point that is consistent with a "good" model. It can be approximated by the ratio of the inliers in the data points. Similarly,

p

(1|

H

b

)

denotes the probability of randomly extracting a data point consistent with a "bad" model, which can be modeled by Bernoulli distribution with parameter



. SPRT test is to add data points continuously and calculate the likelihood ratio



j

shown in formula (18). If the ratio is greater than a threshold

A

_{, the model is considered to be a "bad" model; if the ratio}

does not exceed the threshold

A

after all data points are tested, the model is considered to be a "good" model. Threshold

A

is the main parameter to be determined in SPRT test. Assuming that the parameters



and



are known, the optimal running time can be achieved by setting

A

. However, these parameters are usually unknown and need to be estimated during the testing process, and the threshold

A

should be adjusted according to the current estimates. For example, in multi-view geometry, initial estimates of



can be obtained by geometric constraints; initial values of



can be estimated by the maximum number of RANSAC iterations that users are willing to perform, and updated at any time using the current maximum consistent set. In the multi-view geometry experiments (Chum and Matas, 2008), R-RANSAC based on SPRT test is 2.8 to 10.9 times faster than standard RANSAC, and its efficiency in all experiments is higher than R-RANSAC based on

T

d d, test.

DISCUSSION AND CONCLUSIONS

As a basic step of image matching technology, mismatching point detection can not only detect mismatching points, but also make up for the shortcomings of image matching algorithm and further improve the accuracy of image matching. RANSAC method are commonly used to detect mismatched points. A number of recent research efforts have attempted to improve the performance of the standard RANSAC algorithm. In this paper, we presented a discussion and comparative analysis of a number of important RANSAC techniques.

Although there are many methods to detect mismatched points and good application results have been achieved, there are still some problems to be further studied. For example, how to find a balance between accuracy and efficiency.

Acknowledgment

This research was supported by Taizhou Science and Technology Support (Social Development) Program (Project No. TS201701) and Taizhou Science and Technology Support (Social Development) Program (Project No. SSF20180055).

References

Fischler, M. A. and Bolles, R. C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. San Francisco (CA): Morgan Kaufmann, 1987.

Maimone, M., Cheng, Y., and Matthies, L. 2007. Two years of visual odometry on the mars exploration rovers, Journal of Field Robotics, 24, 169-186.

Deigmoeller, J. and Eggert, J. 2016. Stereo visual odometry without temporal filtering, in 38th German Conference on Pattern Recognition (GCPR 2016), Hannover, Germany, 166-175.

Torr, P. H. S. and Zisserman, A. 2000. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry, Computer Vision and Image Understanding, 78, 138-156.

Tordoff, B. and Murray, D. W. 2002. Guided Sampling and Consensus for Motion Estimation, in European Conference on Computer Vision, 82-98.

Chum, O. and Matas, J. 2005. Matching with PROSAC-progressive sample consensus, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 220-226. Kai, N., Hailin, J., and Dellaert, F. 2009. GroupSAC: Efficient consensus in the presence of groupings, in 2009 IEEE 12th International Conference on Computer Vision, 2193-2200.

Matas, J. and Chum, O. 2004. Randomized RANSAC with Td,d test, Image and Vision Computing, 22, 837-842. Matas, J. and Chum, O. 2005. Randomized RANSAC with

sequential probability ratio test, in Tenth IEEE International Conference on Computer Vision, 1727-1732.

Chum and Matas. 2008. Optimal randomized RANSAC, IEEE Trans Pattern Anal Mach Intell, 30, 1472-1482. How to cite this article:

Wenyan Ci et al.2018, Extensions To The Standard Ransac Algorithm For Efficiency And Robustness. Int J Recent Sci Res. 9(12), pp. 29842-29846. DOI: http://dx.doi.org/10.24327/ijrsr.2018.0912.2942