Developing an Algorithm to Process Matching of Complex Aerial Images

(1)

Developing an Algorithm to Process Matching of

Complex Aerial Images

Khalid M. Alrajeh , Tamer A. Alzohairy

Abstract— Matching of digital images problem is a crucial step in many image analysis applications. This paper investigates this problem and proposes a method to solve it. The proposed method is implemented and applied on real complex aerial images. The proposed method is based on two main stages. In the first stage, the edge elements are determined locally and then aggregated globally into better defined lines called straight-line segments. Based on the descriptions of these segments, matching is performed between them to estimate the transformation coefficients between the images. In the second stage, points of interests which have high variance are selected automatically and then their corresponding points are determined in the other image. The proposed algorithm shows excellent result when applied and tested on real urban complex images1_.

Index Term— Aerial digital images, Area-based approach, Digital image matching, Feature-based approach.

I. INTRODUCTION

Digital image matching is the basic problem in many image applications, including computer vision. For instance, in an industrial environment, the problem may be to detect a defect in a manufactured part, by comparing it with a pre-stored model of the same part, and then take a decision based on the result of the matching. In medical image applications matching has been applied in the classification of chromosomes to provide important insight into disease and genetic defects [1], [6].

Remote sensing satellite images often suffer from deformation. One cause is the rotation and curvature of the earth. Another is the variation in speed and altitude of the satellite. In order to obtain the maximum benefit from these images with good accuracy the geometric distortions should be corrected. The basic solution to correct this deformation is to match points in the distorted image against a reference image or map and then use them with a mathematical model to correct all the points in the deformed image. When mosaics constructed from multiple views, geometric corrections are required to make sure that the common features from the

1

K. M. Alrajeh is with the Computer Science Department, Riyadh Community College, King Saud University, Malaz, P. O. Box 28095,

Riyadh 11437, Saudi Arabia (e-mail: [email protected]). T . A. Alzohairy is with the Computer Science Department, Riyadh Community College, King Saud University, Malaz, P. O. Box 28095, Riyadh 11437, Saudi Arabia. On leave from Mathematics Department,

Faculty of Science, Al-Azhar University, Nasr City (11884), Cairo,

different views are identically positioned and the redundant overlapping parts in the images are removed [15], [16].

Images taken at intervals of time produce an excellent record of the change with time in surfaces features of the earth and they can help to monitor boundary expansion and construction activities in the urban areas. If these processes can be automated by matching the different parts of various images a tremendous saving of time and cost would result.

3-D information plays an important role in many applications and can be obtained from multiple views by using ste reo technique. The main problem in stereo imaging is the correspondence problem of finding matching points in pairs of images of a scene. From these corresponding points the 3-D co-ordinates of the points in the scene can be determined.

One of the applications that depend on 3-D information is image understanding for optical navigation of a mobile robot. Examples of image understanding [8], [10] are to answer question such as which is the tallest building or has this tree expanded since the last time it was observed. In the mobile robot case, the 3-D information is vital to avoid collisions or choosing the best path for the movement etc. Also 3-D data are important if one wants to automate cartography to generate 3-D models of natural terrain or man-made structures [7].

In this paper the focus is on the matching of aerial digital images. Using this matching information these images can then be registered and used for generating a map which indicates height variations across the imaged scene. Some factors that may complicate the matching in aerial images will be presented in section II.

The rest of the paper is organized as follows: In section II the difficulties associated with image matching are presented. The assumptions about the two images used in our stud y are given in section III. In section IV the previous matching approaches are given. The proposed matching approach is given in section V. In section VI the experiments and results of using the proposed matching approach are given. Finally, conclusions are remarked in section VII.

II. DIFFICULT IES ASSOCIAT ED WIT H IMAGE MAT CHING There are several factors effect and complicate the automation of image matching [11], [12], and hence finding a robust matching algorithm is not easy and can be a very big challenge. Some of these factors and their effectives are mentioned below.

1. Textural factor

(2)

levels such as tree tops and the ground below them, and hanging surfaces such as multi-level high-way intersections make difficult to interpret the ground level.

2. Photometric factor

Lack of resolution due to atmospheric conditions and the camera's optics quality produce images with different sharpness. Reflectance such as sparkling of water bodies, illumination, effect of the sun's angle and strength of illumination due to partial cloud cover. Noise due to Photographic processing: if images are digitized from photographic negatives or positives, the material may have spots or scratches due to careless treatment.

3. Geometric changes factors

Occlusion problems, when parts of one image are not visible in the other image. Such problems occur at the edges of the photographs, where the images may not overlap, and in regions where there are rapid changes in heights. Another Geometric factors perspective distortion, the same object viewed from different direction may have different apparent shapes.

III. ASSUMPT IONS

It is assumed that two images are taken by identical cameras whose optical axes are almost perpendicular to their base line. In vertical aerial photography one camera is used, and the photographs are taken at the same altitude in such a way that the area covered by each successive photograph along the aircraft path overlaps part of the coverage of the previous one. It is difficult to maintain the aircraft in a precisely straight path, and the path might be curved at some stages. This means that the positions of the corresponding points in the photographs may differ in scaling and both their horizontal and vertical components. In addition there may be some rotation between the photographs. The images can be arbitrary views of urban or rural landscape we assume that each will have at least a few prominent features.

IV. PREVIOUSMATCHINGAPPROACHES

The two main categories of matching approaches in images are classified into area-based and feature-based.

In area-based approaches the matching measurement is calculated by correlation equation between a template and the tested image [9], [14]; the calculation should be repeated at a variety of orientations, in order to account for possible rotational variations between the images. This method seems to be straightforward and easy to implement but, when the operation has to be repeated at each point in the second image, the amount of computer time required increases to such an extent that the method becomes unworkable.

In feature-based approaches the matching is preformed between scene features descriptors. Example of features is

using regions segments. It is basically done by segment the scene into its constituent regions, and do the matching on those parts, using their descriptors such as areas, lengths, widths etc. The method requires an accurate segmentation, which may not be feasible in complex scenes. It is also possible that large segments of the images may be occluded, which could lead to some erroneous matches. In summary, lack of good features, noise, complexity and high details in urban aerial images making feature-based approaches not the best choice for the matching.

An intermediate method is to determine significant and well defined features in all the images, and to match those features on the basis of their descriptors. It is then possible to define a transformation from the ordinates of one image into the co-ordinates of the other image, after which an area-based approach becomes more feasible. This is considered to be the solution to the problem of aerial photography, and will achieve the main objectives. In the next section the intermediate method will be explained in more detail.

V. PROP OSED MATCHING AP P ROACH

To overcome the difficulties mentioned in the previous section, a two-phase strategy has been introduced into the author's method. In the first phase, prominent and readily identifiable features are matched based on likelihood features coefficients. From this phase a few matching features are extracted, which are sufficient to define the overall transformation between the two images.

The extraction of such reliable features is the most challenging and important step in the matching process. Difficulties arise for several reasons, such as noise in the images, the wide variety of textures, the variations in the roofs of the buildings, and the density of small buildings and shadows. In the aerial images in particular, all these factors tend to confuse the extraction of good features. The whole matching process will be affected by the quality of the extracted features.

Several types of feature can be found in the aerial images, such as corners, curved lines , straight lines, circles, closed loops ect. One of the most interesting types is straight line edges. These features have advantages over the other types; they allow a simple description, and are usually clearer and better defined than the other types, such as closed curves in regions or surfaces of the objects. An object surface in an image may be segmented into several regions, and several regions may be merged into a single region. In addition the background regions in the different images are likely to segment into different sub-regions, because of noise, blurring and reflectance [2], [3].

(3)

matched features are not necessarily in a single plane. The coordinates of the matching points are then tuned to reach sub -pixel accuracy.

The candidate points for matching are chosen from the first image. Each candidate point is surrounded by a window called the template window; using the derived transformation coefficients, the template window is rotated to match the co -ordinates in the second image. It saves time to rotate the template, rather than the whole image, since only the significant parts of the image are involved. Following this step, the similarity measure is calculated between the template window and the test window in the second image. The size of the template window is chosen to be small as possible, while retaining sufficient local information to achieve unambiguous matching. A small window increases the matching speed and minimizes the effect of occlusion.

The search for corresponding points in the second image is performed within a certain area surrounding the position defined by the transformation of the template. The size of the search area should allow for the maximum expected disparity; prior knowledge of the maximum disparity allows the search area to be restricted.

It is possible to discriminate against erroneous matching by setting a threshold for the similarity coefficient; if the maximum value of the similarity measure over the search area fails to exceed the threshold, this match will be excluded from the list of corresponding co-ordinates. In cases when the threshold is exceeded, sub-pixel accuracy can be achieved by interpolation over the neighborhood of the maximum.

By using the matching points in the images, one image can be registered to the other image. One objective of the registration process is to make the two images aligned to each other and then using the stereo matching to compute the disparity map.

VI. EXPERIMENT S AND RESULT S

Fig. 1 shows two sequential images of a complex urban scene taken from the same height with an unknown rotation and translation. Both images are 512  512 pixels. In the left part of image 1, it is obvious that the scene contains a number of small buildings with low heights and roofs having a great variety of shapes and textures. On the other hand, the large area, in the right hand section, is nearly flat containing few buildings. This example provides a good test for finding matching points in buildings and flat areas.

Fig. 2 to Fig. 6 illustrate the results obtained by applying the steps of the method to calculate the transformation coefficients to the original images. The result of edge detection and thinning process is shown in Fig. 2. The edge maps for the two images are the result of applying the Laplacian filter followed by thresholding to remove the weak features. The images obtained show a large amount of detail. Much of this detail is noise, some emphasized by the application of the Laplacian

operator, and some due to the fine textures commonly found in urban images. Small objects such as trees and cars are examples of fine detail which sometimes complicates the extraction of clear features. In spite of these difficulties, some straight line features are clearly discernible.

Fig. 1. Original images 1 & 2.

(4)

segments. Each straight-line segment in Fig. 4 has a description record containing slope and two end points. From the line description, length, angle and middle point were computed for each straight line segment. These new values are used in the line matching process to find the transformation coefficients. Two lines are considered to be similar if they satisfy three conditions: (1) the ratio of their lengths must equal 1 with tolerance 4%; (2) the vertical translation must be non-negative for images in an upward vertical sequence; (3) the rotation direction between two lines should be similar to the global rotation direction between the images. In case if many lines have succeed in the previous conditions the best transformation coefficients estimation would have the highest frequency values.

Fig. 2. the result of edge detection and thinning process.

The histogram for the rotation angles between the matching lines is shown in Fig. 5; the histograms for the horizontal and vertical translations are shown in Fig. 6.

From Fig. 5, the highest frequency value occurs at 13.5 and this represents the best estimate for the rotation between the images. This process works despite some lines not having

been matched due to occlusion, or because they are clear in one image but not in the other due to contrast changes and noise. This rotation angle is used in next operation which is correlation matching.

Fig. 3. Generated multiple lines caused by clustering in the (HT ) and noise.

The peaks in Fig. 6 represent the translation distances required. Because the scene is not flat and there are variations in the buildings height, the two peaks do not necessarily represent precisely the translation between the images; however they give a rough guide to the translations allowing a great reduction in searching effort when matching points by correlation.

(5)

borders between neighboring regions. When such a cluster of points occurred, the point with highest variance was retained and the others were rejected.

Following, the matching points in image 2 corresponding to the candidate points of image 1 are selected. To achieve a better reliability in the matching and to exclude the possibility of wrong matching due to occlusion, a high threshold has been applied to the correlation. Some of the matching points have been discarded on the basis of this criterion. All the remaining points can be seen to be well matched to the candidate points.

Fig. 4. significant lines after removing multiple and short lines.

0 2 4 6 8 10 12

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89

Rotation Angle

F

r

e

q

u

e

n

c

y

Fig. 5. histogram of the rotation angles between the matching lines.

0 2 4 6 8 10 12 14 16

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300

Vertical Translation

Fre

qu

e

nc

y

Fig. 6. histograms of horizontal and vertical translations between the matching lines.

VII. CONCLUSIONS

In this paper a new method have been proposed to match sequence of complex aerial images with unknown rotation and translation. The levels of detail, scene clutter, complex textures and geometry variations greatly complicate th e matching process. The available techniques have limitations to achieve the mentioned objective. Features based matching techniques are limited by number of the extracted features, in other words it does not provide a large number of matching points, however they insensitive to geometry variations (rotation). On the other hand the area matching techniques provide large number of matching points but they fail when there are some rotation and scaling between the images.

A combined matching technique has been developed to solve the previous problem. The developed technique takes the advantages of the different matching approaches. This technique minimizes the difficulties made by the nature of the aerial images. First it matches the significant straight edge segments in the images to provide a best estimation for rotation and translation parameters. The next step is to use these parameters to guide the area matching approach (correlation) to get the highest number of matched points.

ACKNOWLEDGMENT S

The author would like to thank Dr. Stanley Ipson and Mr. Jhon Haigh from university of Bradford for providing the test images and valuable discussions during this research.

REFERENCES

[1] Price, K. and R. Reddt , Matching Se gme nts O f Image , ' IEEE

T ransactions On Pattern Analysis and Machine Intelligence, Vol. Pami-1, No.1,Jan. 1979, pp. 110-11,

[2] Nevatia, R. and K. Ramesh , Line ar Fe ature Extraction and De scription, Computer Graphics and Image Processing 13, 1980 ,PP 257-269.

[3] Ouk Choi, In So Kweon, Robust fe ature point matching by pre se rving local ge ome tric consiste ncy, Computer Vision and Image Understanding, Volume 113, Issue 6, June 2009, Pages 726 -742

(6)

[5] Alexander T homas, Vittorio Ferrari, Bastian Leibe, T inne T uytelaars, Luc Van Gool, Shape -from-re cognition: Re cognition e nable s me ta-data transfe r, Computer Vision and Image Understanding, Volume 113, Issue 12, December 2009, Pages 1222-1234.

[6] Nevatia, R. and K. Price 'locating Structure s In Ae rial Image s'

pattern recognition 4th intrnational conference 1987 PP 686 -690[Brand1980]

[7] Barnard, S. and T hompson W. 'Disparity Analysis Of Images' IEEE T ransactions On Pattern Analysis and Machin Inelegance Vol 1 No 4, Jul 1980, pp 333-340

[8] Chi Hau Chen, Pei-Gee Peter Ho, Statistical patte rn re cognition in re mote se nsing, Pattern Recognition, Volum e 41, Issue 9, Septem ber 2008, Pages 2731-2741.

[9] Peli, T , An Me thod For Re cognition and Localiz ation of Rotate d and Scale d O bje ctsProceeding Of T he IEEE , Vol. 69, No. 4, Apr. 1981, Pp 483485

[10] Dudani,S and Luk A, Locating Straight-Line Edge Se gme nts O n O utdoor Sce ne sPattern Recognition Vol. 10 pp 145 -157 23-Hsia,T . C.

[11] Hongming Zhang, Wen Gao, Xilin Chen, Debin Zhao, O bje ct de te ction using spatial histogram fe ature s

Im age and Vision Com puting, Volum e 24, Issue 4, 1 April 2006,

Pages 327-341.

[12] Yonghuai Liu, Automatic re gistration of ove rlapping 3D point clouds using close st points, Im age and Vision Com puting,

Volum e 24, Issue 7, 1 July 2006, Pages 762-781

[13] Edgar Arce, J.L. Marroquin, High-pre cision ste re o disparity e stimation using HMMF mode ls, Im age and Vision Com puting,

Volum e 25, Issue 5, 1 May 2007, Pages 623-636.

BIO GRAPHIES

Khalid M. AlRaje h , Recived the B. Sc. in computer engineering from college of computer science and information, King Saud University , Riyadh, Saudi Arabia in 1987. M.Sc. in Real T ime Electronic Systems in 1992, Ph.D. in image processing in 1995 from University of Bradford, England. Currently he is working in computer Dept., community college, King Saud University, Saudi Arabia.