Feature detection and matching - Increasing the Position Precision of a Navigation Device by a

Feature detection and matching are considered as the important part of computer vision so in this part of my document, feature detection and matching are discussed with the help of classical computer vision steps. Under this section, few topics like feature detectors, descriptors, extraction and matching are part of interest. The main goal of feature matching is detected interesting regions in the image and match them across the images. Feature matching is a part of systems used for object recog-nition, images stitching, 3D reconstruction, Motion tracking and robot navigation.

In figure # 3.1, an example of feature matching is given. During the project few

Figure 3.1: Example of feature matching

things are considered as very important. Locality, distinctiveness, quantity and effi-ciency. Reasons are so obvious, positioning signs which are considered as landmarks in the project are local and robust to occlusion and clutter, recognition must be strong so that it can differentiate a large database of positioning signs and last thing very important to be considered is to make program’s performance real-time achievable. Illumination invariant, scale-invariant, pose invariant and Intra-class in-variability are also considered in order to make robust landmark detection.

For complete feature detection and matching four methods are really important to study in order to get the background knowledge. These methods are mentioned about in the state of art section so how these are studied and implemented are shown under this section in my concept. These steps are performed in every algorithm.

These steps are implemented by different approaches and these approaches studied in depth in order to achieve the goal of robust detection.

3.2.1 Feature matching:

In order make robust detection, there are few rules which must be considered in order to get better results. These points are considered during the training of HAAR cascade. Under this section word feature can be considered as the positioning sign that is used under this concept which will be detected on the streets. In order make detection robust these points are considered:

1. The locality of object: These positioning signs are local and open to clutter and occlusion.

2. The distinctiveness of Object: Object that is required to be detected but differentiate the large database of objects.

3. Efficiency of system: HAAR cascade classification is chosen because it is real-time performance achievable.

There are two important steps under this detection part, which are performed:

1. Find the interesting object which needs to be detected.

2. Define the nearest pixels around this object.

These steps can implement with the help of Harris corner detection, LoG Detection and DoG pyramid.

For the detection purpose, similarity between the pixels is determined and this simi-larity is determined with the computation of the weighted sum of squared differences (WSSD):

E_{W S S D}(u, v) =X

x,y

w(x, y)(I₁(x + u, y + v) − I₀(x, y))² (3.1) This equation is given in order to measure the similar pixels between two different

Figure 3.2: Example of Window function [46]

images at locations. In this equation I₀ and I₁ are two different images, (x, y) is the location of pixel and w(x, y) is the window function that is used to limit the checking of similarity to a specific window which can be seen in figure # 3.2.

3.2.2 Feature description:

Feature description plays important role in order to make a feature highly distinctive.

Pixel patches came into account in order to make sure that these are right description of a feature or not. There are two important steps which are performed during construction of feature construction and these steps are:

1. Extract the region that is defined at each defined point and normalizes its content.

2. On the basis of normalization compute local descriptor.

These steps of feature detection and matching can be performed by different ap-proaches like SIFT, SURF, GLOH, HOG and PCA. These are all valid and widely used descriptors as description is given above in the state of the art section. SURF is faster and less accurate. SIFT is slow but more accurate than SURF. The main function of feature descriptor is to make detection invariant to illumination, pose, rotation, scaling and intra-class variations. Computation of Euclidean distance takes place during the comparison of two images. Actually, this comparison is between the feature descriptions which can be observed through this equation [46]:

d(patch₁, patch₂) = kSIF T (patch₁) − SIF T (patch₂)k₂ (3.2) A good description of feature results in good feature detection and matching. As feature extraction is the part of feature description in which its main goal to learn about the feature where it is not focused on creation of a feature. For this process, one can use Principal component analysis (PCA). PCA is used for the reduction of dimensions in order to extract principal component. An example is the figure # 3.3:

Figure 3.3: Principal Component Analysis

During creating the description or learning, one would like to extract the information from a particular region which contains more information instead of other regions can enable through PCA called feature selection. A principal component which contains most of the information is actually a vector in the given visual space. So it is providing information with high variance. In order find the information with high variance, there is complete PCA algorithm contains five steps:

1. For better results, one could apply mean removal on the given input image X.

2. For dimensional reduction one could perform computation on correlation ma-trix contains eigenvectors with high variance.

3. In order to get the eigenvectors and eigenvalues for further comparison in order to know the high variance.

4. For knowing the order of decreasing value one should perform ordering of eigen-vectors with high to low values.

5. After ordering of eigenvectors it will be easy to find the highest variance [46].

3.2.3 Feature recognition:

Once an object is detected one have to find the correspondence between the required image according to our description and the detected image. This correspondence is possible with the help of some transformations like Affine transformation and Projec-tive transformation. There are some basic challenges to positioning sign recognition while working with real-time data. Some problems are mentioned below:

1. During the detection, there is noise problem from 2 to many pixels.

2. Some signs are hidden behind the obstructions called occlusion.

3. Sometimes new points are detected which are not the part of our training.

Such kind of problems causes serious problems during the recognition process. In such kind of problems, Recognition process takes the help learning which helps to under the instance. With the helping of learning feature matching takes place in order to match the local descriptors. This learning can be designed by the imple-mentation of different approaches Nearest neighbors and K-d trees.

In order to make recognition efficient learning takes place in which solution is given to new data that is detected with the help assigning a class. This assignment of class is done very carefully with the help training data. When some possibilities come out in matching with training data it performs majority vote. These tasks are performed by the K-nearest neighbors algorithm that is a very good example of learning for such kind of problems.

3.2.4 Feature alignment:

Once the correspondence is found, how to make sure obtained correspondences are the correct one? in order to answer this question, one should perform the transfor-mation in order to know about detected feature’s infortransfor-mation. It is possible by the different approaches like, Least squares and RANSAC. These approaches are used when some points are not able to find any correspondence because of occlusion and some points correspondences are not real because of outliers.

In order to avoid outliers which means wrong correspondences, one should apply RANSAC approach. This approach is used for avoiding outliers with the help of it-erative method which performs calculation of a chosen transformation. It determines the transformation with many inliers from many iterations. It will be considered as the winning iteration with the many inliers. An example can be seen in figure # 3.4:

Figure 3.4: RANSAC Approach Figure 3.5: Set of points

A number of inliers is known with the help margin δ. Under this count, the transformation can be chosen with many inliers. In figure # 3.5 randomly set of point is chosen and with the help of margin δ computation take place and number of inliers in counted. Steps of RANSAC algorithms are explained below:

• This is iterative process and when its number of iterations is known:

1. Choose points randomly

2. Apply transformation on chosen pair of points.

3. Use margin δ in order to count the number of inliers.

• Choose the transformation which contains margin δ with the largest number of inliers.

• Start tuning the transformation by using least squares.

In document Increasing the Position Precision of a Navigation Device by a Camera-based Landmark Detection Approach (Page 50-55)