Towards License Plate Recognition: Comparying Moving Objects Segmentation Approaches
V. J. Oliveira-Neto, G. C´amara-Ch´avez, D. Menotti UFOP - Federal University of Ouro Preto
Computing Department Ouro Preto, MG, Brazil
Email: {vantuiljose,gcamarac,menottid}@gmail.com
Abstract—This paper reviews the most important works in the literature focused on motion detection and feature extraction from video. It also may support a project that aims to create a new perspective on the license plate location and recognition extracted from an image sequence. We also implement three approaches for locating the moving vehicle in order to facilitate the process of extracting its license plate, such that the accuracy of methods for automatic license plate recognition could be im- proved. Three different strategies to segment the moving objects, i.e., background subtraction, temporal differing, and optical flow, are implemented and tested in videos (still images) using three different databases. From the results of the experiments the usefulness of these strategies for a license plate recognition system is discussed.
Index Terms—detection of moving objects, background sub- traction, temporal differing, optical flow, license plate recognition
I. INTRODUCTION
Object detection and tracking is an important research area in computer vision. One of its application is traffic scene analysis. Cameras are less costly and easier to install than most other sensors, many are already installed on the roadsides, particularly at intersections. Resultant video images are used to estimate traffic flows, to detect vehicles and pedestrians for signal timing, and to track vehicles and pedestrians for safety applications [1]. The tracking problem can be stated as how to know where are the objects in the space and where they are going. This type of information can be used for surveillance systems, traffic analysis and many others fields in computer vision. This paper is part of a major project that aims to improve the accuracy of a License Plate Recognition (LPR) system. The first step of this system consists in segmenting and tracking the moving objects, i.e., the vehicles. The idea here is not to recognize the license plate, but to locate the vehicle in the image sequence, such that the area for searching the license plate is reduced. In a second step, license plate location methods are used to find the license plate in this small portion of the image. Using this approach, it is expected to increase the accuracy and speed up the full system.
The main idea of the entire project is shown in Fig. 1.
As we can see in this flowchart, the proposed project aims to provide a segmented part of the image containing the interesting tracked object, the vehicle, such that only the best
Fig. 1: A License Plate Recognition System
frame from those ones tracked undergoes the recognition, avoiding unnecessary computational resource trying to identify the same vehicle several times.
In this work, three methods are studied and used to segment the moving objects followed by a simple tracking method. The segmentation methods are: Background Subtraction, Temporal Differing and Optical Flow. These segmentation methods allow us to use several tracking methods. In this paper, we perform the tracking by following the center of the moving objects to the nearest center in the next frame.
All of these methods are extensively used for object tracking, due to their simplicity to implement and running time. Many works implement them in real time. Background Subtraction is based on a frame model, which is subtracted from the frame with movement, and it is observed a differ- ence between the background model and the current frame.
This is one of the most used and simplest methods to find movement. Temporal Differentiation is based on the difference among consecutive frames, and is very adaptive to illumination
University of Ouro Preto (UFOP), which composes the main database for the entire project. Other images from vehicle traffics provided with the MATLAB installation and from PETS2000 [3] are also used in our experiments.
This paper is organized as follows. In Section II, three meth- ods for moving object segmentation are presented. Section III explains the tracking method used in this work. In Section IV, we discuss the performance of the evaluated methods. Finally, Section V exposes the conclusions of this work and future works are pointed out.
II. SEGMENTATION
Segmentation is one of the crucial tasks in object tracking.
When the moving objects are well defined, it is easier track and find them, otherwise a bad segmentation may preclude the tracking, since it is not possible to determine what is a object of interest and what is not.
Three approaches for object segmentation are presented. In all cases, the moving object is separated from the background.
Background Subtraction technique is widely used, and is based in a predetermined background model that represent the static scene, so the moving objects can be found by subtracting the background from the target frame. Temporal Differing is similar to Background Subtraction, but uses the difference among consecutive frames. The Optical Flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene. With this technique is possible to know the velocity of any pixel between two images.
A. Background Subtraction
Background Subtraction is one of the most common ap- proaches to detect moving objects in image sequences and is a widely used for detecting moving objects from static cameras. Many different methods have been proposed over the recent years [4]. This approach is commonly used for situations where the background is relatively static.
The segmentation process starts with the selection of the background at the beginning of the video, this background should represent a static image from the place where the tracking is performed. In order to find the moving objects, the background is subtracted from the current frame. The operation is performed pixel to pixel. Therefore, this method is highly dependent on the choice of a good background model [5].
There are several strategies that use the subtraction of objects in [4]. In this work, we implemented a simple version of this technique, where the background is determined by the average of n frames calculated from corresponding pixels.
(a) First video frame (b) Background model obtained
Fig. 2: Background subtraction
Thus, let V be the video, Vn n-th frame from this video. So, we can determine the background, B, as:
B = (v1+ v2+ ...vn)
n .
An example of a generated background using this approach is presented in Fig. 2b. The number of frames used for computing the average is about 60 frames. The images were resized and converted to grayscale.
Once the background is calculated, it is possible to deter- mine the motion for each pixel video by subtracting pixel image that one want to measure the movement by the back- ground. After computing the difference, the next step consists in determine which pixels are considered as moving ones, since slight movements or changes in the camera can cause a small difference between the background and the current frame. For this purpose, a threshold is defined. Thus, all differences greater than the threshold are considered as moving pixels. A common problem consists in how to determine this value, because low values of the threshold L can detect false moving pixels. The movement from Mn is given by:
Mn= Dn> L, (1)
where L is a predetermined threshold, and Dnis the difference
Dn= Vn− B. (2)
Based on these calculations, it is possible to establish a binary image, where the difference between the background and current frame after applying the threshold is similar to Fig. 3c.
With this, we get an image with less noise and where the objects are better delimited, so it is easier to target them based in the vicinity of pixels.
Fig. 3a shows the difference between one of the frames of the video and background calculated and shown in Fig. 2b.
Fig. 3b shows the thresholded binary image, and Fig. 3c shows the image after passing through morphological operations. The result of the method, a frame, can be seen in Fig. 3d.
However, there are some characteristics that influence neg- atively the separation of object in the resulting image, such as small noises, that can cause a single object is divided into two or more objects during the movement location. In order to remove the noise, the morphological closing operation is
(a) Difference between some frame and background.
(b) Binarization based on threshold.
(c) Morphological operations to de- crease the noise and increase the size of moving objects.
(d) Result of the segmentation of one car in one frame
Fig. 3: Background subtraction
adopted. More information about this type of operation on images can be found in [6], [7].
Even though background subtraction is a very common strategy to segment and track moving objects, many papers have discussed and proposed better solutions to this method.
In [8], they propose a complex method using an adaptive background, where each pixel is modelled as a mixture of Gaussians and an online approximation is used to update the model. In [1], the background subtraction approach is also used, but he use the conected component analysis to determine which parts belong to each object. The method implemented to our paper use of morphological operations, such as opening and closing (methods based on basic morphological operators, i.e., erosion and dilation). Many others techniques are shown in [4].
B. Temporal Differing
Temporal differing uses the pixel displacement among two or three consecutive frames. This method is adaptive to dynamic environments, but generally does a poor job of extracting all the relevant pixels, e.g., there may be holes left inside moving entities. As an example of this method, Lipton et al. [9] detect moving targets in real video streams using temporal differing. After the absolute difference between the current and the previous frame is obtained, a threshold function is used to determine changes.
[9] detect moving in real videos subtracting the actual frame by the previous, and use a threshold to determine the changes, some better versions uses three frames instead of two. In this work, we propose a technique using the previous and the next frames. So, the basic idea is similar to Background Subtraction, once the difference among some frames are calculated, the binary image, which contains holes that corresponds to the moving objects, can be calculated as follows:
(a) The current frame (b) The previous frame
(c) The next Frame (d) The Differing apply- ing threshold = 10
Fig. 4: Temporal Differing
First, the differences are calculated for previous and follow frames:
Dp= ft(x, y) − ft−1(x, y)) Df = (ft(x, y) − ft+1(x, y))
And now, is applied the threshold function, where t is a previously specified threshold:
If(x, y) =
1 : Dp> t or Df > t 0 : otherwise
Note that doing this calculation what is being calculated is the contour of the moving objects, this is shown Fig. 4. Then, morfological operations are performed in order to enlarge the available area and concatenate parts of the object.
C. Optical Flow
An important technique for estimating motion in sequence of images is called Optical Flow. Optical flow speed distri- bution is the apparent movement of the intensity patterns in an image. It might be a relative motion of objects and views.
Consequently, optical flow can provide important information about the arrangement of objects and the rate of changing them. Like the discontinuous optical flow can help in targeting images into regions that correspond to different objects [10].
Besides optical flow, some of its applications include tracking, parametric and layered motion estimation, mosaic construc- tion, medical image registration [11], and face coding[12].
Fig. 5 shows an example of optical flow. In Fig. 5a the arrows indicate the movement direction of the objects and Fig. 5b shows the objects after the displacement.
The optical flow is always interesting when the movement of objects within a scene has relevance. The optical flow can be used in different applications. Among the applications, we can highlight a few: interpretation of the scene, exploratory navigation, tracking objects, evaluation time of one against
(a) Image before moving (Arrows indicate the movement)
(b) Image after moving
Fig. 5: Optical Flow representation
the other body and object segmentation. But its applicability is even more extensive and can also be used in other fields, such as robotic vision and surveillance applications[13].
III. TRACKING
Object tracking is an important task within the field of com- puter vision. In its simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene. In other words, a tracker assigns consistent labels to the tracked objects in different frames of a video. Additionally, depending on the tracking domain, a tracker can also provide object-centric information, such as orientation, area, or shape of an object [14].
We use a simple tracking strategy. In all methods it is possible to know the bounding box of moving objects. So we take the center of each object, and in the next frame we can presume that the nearest detected object is the same object of the previous scene. When the distance of one object is too far from the other, we just discard it, creating a new hypotheses of object, and assume that the object from the previous frame leave the scene.
For future work we plan to study others tracking strategies and compare them with whe using the entire licensa plate recognizer. So, different segmentation methods, license plate location and tracking methods can be compared, trying to improve their performance.
IV. EXPERIMENTS ANDRESULTS
A. Database
The focus on this work is to review the main methods to segment moving objects in image sequences. Here, we present the result of segmentation in many videos. One of them is very easy, i.e. it is a video from MatLab. The second one is a little more difficult. It is taken from a workshop based on tracking and surveillance systems, i.e., PETS [3]. We also construct our own dataset, which is more challenging and realistic for our purposes, specially for occlusion cases. And it is taken from the main entrance of the Federal University of Ouro Preto (UFOP), where there is an avenue behind the gate. This avenue presents a heavy traffic, but they are not coming into the university direction, so it is not necessary recognize them.
In contrast, these vehicles disturb the tracking of vehicles
Tests were carried out using these three different bases, and the result are shown in Section IV-B. Some analysis and discussion are presented in Section IV-C.
B. Experiments
Fig. 6 shows the result of background subtraction method.
For this test in PETS [3] database, we used the first 60 frames do determine the background.
Our implementation of Temporal Differing is similar to that proposed in [9]. We use the difference between three frames to determine the move.
For the implementation of Optical Flow segmentation method, we use the Horn-Schunck method [2]. Some results of this segmentation process can be found in Fig. 8. The first row (Fig. 8a) shows the original frames from the video, while the second row (Fig. 8b) shows the optical flow result.
The third row (Fig. 8c) shows the thresholded Optical Flow images, where only the pixels that have a minimum velocity are considered. And, finally, the fourth row (Fig. 8d) shows the detected moving object.
The minimum velocity of our method is calculated as 0.5 times the mean of all velocities. This guarantees that only pixels with relevant moving are considered.
C. Analysis
The three methods are powerful enough to detect the moving objects. In the three cases no one moving object was not detected, but the methods do not deal with occlusions, so, some moving objects are detected together, what means that, sometimes, two or more objects are detected as just one object, however, tracking methods may solve that problem.
As we have already mentioned, the UFOP dataset is more challenging. Thus, the methods we have tested presents some problems, like: tracking two disconnected objects and consider them as single object. We can see an example in Fig. 8d where a motorcycle and a car are considered as an unique object, even though they are not occluded.
The background subtraction method does not work with illumination changes, consequently, an object could be split into several objects. Some adaptations can deal with this prob- lem, however, this can create flat objects, tracking unnecessary areas from the image, what is not good to the license plate recognizer.
In MatLab Video, the bounding box generated for the detected object are bigger due to the low frame rate and the fast motion of vehicles, so the movement between the pixels is to abrupt. The background subtraction method can not be tested with this dataset because the video is so short that it is not possible to generate a stable background.
Fig. 6: Background Subtraction. From the first to the last rows, we have: a) Originals frames; b) Difference between original frame and background image; c) Thresholded image after morphological operations. d) Segmented image
Fig. 7: Temporal Background. From the first to the last row, we have: a) Original frames; b) Ddifference beteween original frame and tempground image; c) Thresholded image after morphological operations; d) Segmented image
Fig. 8: Optical Flow. From the first to the last rows, we have: 1) Original frames; 2) Optical flow; 3) Thresholded optical flow;
4) Segmented image
Some objects that are not interesting for our project can be found, like peoples walking through the scene, and cars that are passing in other direction, e.g., cars from UFOP’s video, where some cars pass in a beside street, but do not come to the entrance, so, they do not need to be recognized. This kind of problems must be eliminated when the entire recognizer be made. We can see what need to be recognized, like objects that grows, and moving objects that have shapes similar to vehicles, and just discard other objects, that really do not interest to be recognized.
All three methods look at fitting to the propose of the new license plate recognition approach. Background subtrac- tion showed many problems related mainly with adaptive to changes in the scene. In contrast, temporal differing dealt very well with this adaptive problems, but at same time it can be less sensible to small movements. Optical flow was very effective, and it can deal with camera movements, but with a high computational cost.
V. CONCLUSIONS
Tracking is an important task in computer vision, and when segmentation fails, tracking usually also fails. So, segmenta- tion is an important task as well. As aforementioned, there are many approaches for moving object segmentation in images sequence. In this paper three methods were reviewed for such purpose. They are very useful and efficient to segment moving objects, where each one of them has its pros and cons.
Background subtraction is very easy to implement and also easy to understand its basis. With a simple model that represents the empty place, we can compute the movement by just subtracting this model from a frame that we are interested to estimate the motion regions. However, it is not
invariant to illumination changes. If illumination variations are not detected, they can be treated as moving regions thus, this method is not appropriate for dynamic environments.
Temporal Differinghas almost the same advantages of Back- ground Subtraction, however with a great benefit, it is adaptive.
This method can deal with small illumination changes, and discard objects that stop in the scene. So, due to its simplicity and efficiency, Temporal Differing is one of the best methods for finding moving objects in the sense of this project.
Optical Flow is one of the most used techniques to find movements in videos. It is able to find the velocity of the pixels we want to follow, so it can be used in many other tracking strategies, as the one proposed in [15]. It is also adaptive to illumination changes and can still track objects where the camera moves.
The next steps to conclude this project, i.e. a full License Plate Recognition System, is to track the objects, extract and recognize the license plates. The simple tracking method discussed in Section III seems to be effective to this objective, however other methods should be studied. If we are able to do a good tracking and so extract the area of the scene that corresponds with the moving vehicle, then we can use a method to locate the license plate, as the one proposed in [16], in a single image. And finally, we would apply some Optical Character Recognizer as the methods proposed in [17], [18]
for obtaining the license plate characters and so recognizes the vehicle.
VI. ACKNOWLEDGEMENTS
The authors would like to thank FAPEMIG, CAPES and CNPq for the financial support.
REFERENCES
[1] Z. Kim, “Real time object tracking based on dynamic feature grouping with background subtraction,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
[2] B. K. P. Horn and B. G. Schunk, “Determining Optical Flow,” Artificial Intelligence, vol. 17, no. 1–3, pp. 185–203, 1981.
[3] U. of Reading, “PETS (performance evaluation of tracking and surveil- lance) 2000,” available at ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/test images/.
[4] M. Piccardi, “Background subtraction techniques: a review,” in IEEE International Conference on Systems, Man and Cybernetics (SMC), vol. 4, 2004, pp. 3099–3104.
[5] J. McHugh, J. Konrad, V. Saligrama, and P.-M. Jodoin, “Foreground- adaptive background subtraction,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, pp. 390–393, 2009.
[6] P. Soille, Morphological Image Analysis: Principles and Applications, 2nd ed. Springer-Verlag, Inc., 2003.
[7] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed.
Pearson Prentice Hall Inc., 2008.
[8] C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 1999, pp. 637–663.
[9] A. Lipton, H. Fujiyoshi, and R. Patil, “Moving target classification and tracking from real-time video,” in IEEE Workshop on Application of Computer Vision (WACV), 1998, pp. 8–14.
[10] A. W. C. Faria. (2007) Fluxo optico. Last access on June 14, 2011.
[Online]. Available: http://www.verlab.dcc.ufmg.br/ media/cursos/visao/
2007-1/alunos/alexandrewagner/optical flow article.pdf
[11] K. Cuppens, L. Lagae, B. Ceulemans, S. Van Huffel, and B. Vanrumste,
“Automatic video detection of body movement during sleep based on optical flow in pediatric patients with epilepsy,” Medical and Biological Engineering and Computing, vol. 48, no. 9, pp. 923–931, 2010.
[12] S. Baker and I. Matthews, “Lucas-kanade 20 years on: A unifying framework,” International Journal of Computer Vision, vol. 56, no. 3, pp. 221–255, 2004.
[13] A. Mitiche and A. reza Mansouri, “On convergence of the horn and schunck optical-flow estimation method,” IEEE Transactions on Image Processing, vol. 13, no. 6, pp. 848–852, 2004.
[14] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM Computing Survey, vol. 38, no. 4, pp. 1–45, 2006.
[15] W. Du and J. Piater, “Tracking by cluster analysis of feature points using a mixture particle filter,” in Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on, 2005, pp. 165 – 170.
[16] P. R. Mendes, J. M. R. Neves, A. I. Tavares, and D. Menotti, “Towards an automatic vehicle access control system: License plate location,”
in IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2011, pp. 2916–2921.
[17] X. Pan, X. Ye, and S. Zhang, “A hybrid method for robust car plate char- acter recognition,” Engineering Applications of Artificial Intelligence, vol. 18, no. 8, pp. 963–972, 2005.
[18] R. Smith, “An overview of the tesseract ocr engine,” in IEEE Inter- national Conference on Document Analysis and Recognition (ICDAR), 2007, pp. 629–633.