Fork Project
The ring on the hook
Ushanthan Jeyabalan - [email protected] [030883] Computer Engineering The Mærsk Mc-Kinney Møller Institute University of Southern Denmark Autumn semester, 2007 7. January, 2008
Supervisor: Assoc. Professor Norbert Kr¨uger
ii
Abstract
This report covers the theoretical -and experimental work done in the FORK part of the master project. The aim of the FORK is to do preliminary investigation of an engineering problem defined by the student and the supervisor. The end results of the FORK should be a definition and a limitation of the problem and the choice of the methods to solve this problem in the master thesis.
The engineering problem investigated in the FORK is to put a ring on a hook with a manipulator Robot. The hook is hanging from a conveyor belt, which is attached to the ceiling and has a constant speed. To solve this problem a close interplay between computer vision and robotics is needed.
Contents
Contents iv
1 Introduction 1
1.1 Goals for the Fork project . . . 2
2 Camera modeling 3 2.1 Camera representation . . . 3
2.1.1 Calculating the camera calibration matrix . . . 5
2.2 Stereo geometry . . . 5
2.2.1 Epipolar geometry . . . 5
2.2.2 Retrieving the depth information . . . 7
2.3 Lens distortion . . . 7
2.4 Rectification . . . 9
2.5 Camera setup . . . 9
2.6 Conclusion . . . 11
3 Kalman filter 13 3.1 The tracking filter . . . 13
3.2 State space modeling . . . 14
3.3 Kalman filter . . . 14
3.3.1 The derivation of the Kalman filter . . . 16
3.3.2 The Kalman Filter Algorithm . . . 17
3.3.3 Advantages and problems with Kalman filter . . . 17
4 Vision based 3D tracking 21 4.1 Marker based tracking . . . 21
4.2 Model-based tracking . . . 22
4.3 Multi-modal visual primitives . . . 23
4.4 Early marker experiments . . . 24
4.4.1 The experiment . . . 24
4.4.2 The marker . . . 25
4.4.3 Conclusion . . . 26
5 A mathematical model of the hook and the conveyor belt 29 5.1 The motion model . . . 29
5.2 The hook . . . 31
5.2.1 Discretizing the pendulum model . . . 33
5.2.2 Simulation of the pendulum . . . 34
iv CONTENTS
6 Experiments 39
6.1 Robot-camera calibration . . . 39
6.2 Stereo vs. single camera . . . 41
6.2.1 Hardware setup . . . 41
6.2.2 Test . . . 42
6.2.3 Evaluation of the results . . . 44
6.2.4 Sub conclusion . . . 47
6.3 Tracking with Kalman filter . . . 47
6.3.1 The precision of the prediction . . . 48
6.3.2 The convergence of the Kalman filter . . . 49
6.3.3 The influence of different noise values on the Kalman filter . . . 49
6.3.4 Sub conclusion . . . 49
6.4 Simulating in RobWork . . . 50
6.5 Conclusion . . . 50
7 Conclusion 57
Chapter 1
Introduction
Manipulator robots have been used in the industries for the last fifty years and the numbers are growing rapidly every year. These first generation systems are characterized by the fact that the workflow is pre-programmed which means that the system can’t adopt to changing environments. To accommodate this, much research has been done in intelligent robotics systems that are flexible and versatile.
The Cognitive Vision and Robotics Group has the main objective to develop future vision/robotic systems that are flexible and adaptable to the dynamic environment. The methods to solve the ring on hook problem will also use this philosophy.
The ring on hook project is the result of real industrial problems. Hooks are often used to transport material e.g. pigs or cows in meat production or metal frames in painting halls. In most cases manual labour is used to hang the objects. Sometimes also robots are used but in an old-fashioned way in that the speed of the conveyor belt is known and then the hanging procedure can be timed precisely. This requires on the other hand that the hook doesn’t swing back and forth.
The ring on the hook project is a visual servoing problem, where a robot is assisted by a vision system to sense the environment. A visual servoing system is an iterative process which tries to reduce the distance between the robot and the object to be manipulated. Since most modern manipulator robots can operate within a millimetre precision, the success of solving the ring on the hook problem depends on a fast, precise and robust visual system. To track the moving hook very precisely, a suitable object tracking algorithm has to be developed.
The report describes the results obtained so far, thus having some parts that are not fully elaborated.However the results obtained in the FORK will be used extensively to solve the ring on the hook problem in the master thesis.
Chapter 1: An introduction to the ring on the hook problem will be given.
Chapter 2: The mathematics behind the camera model will be described. Undistortion and rectification will be discussed, as well as problems regarding the camera setup in an environment.
Chapter 3: An introduction to tracking filters will be given and is followed by a description of the Kalman-and particle filter.
Chapter 4: Different methods to find the object of interest in a tracking application will be described. A marker based and a marker-less object detection algorithm will be proposed.
Chapter 5: A mathematical model for the conveyor belt and the hook will be described.
2 1 - INTRODUCTION
Chapter 6: Experiments will be done to verify the theory. Chapter 7: The conclusion of the FORK project will be given.
1.1
Goals for the Fork project
The main emphasis in the FORK is on the visual tracking part. The huge amount of literature on the different kinds of tracking methods has to be sorted and the methods that best fit the problem requirements has to be chosen. The hardware related considerations, such as choosing suitable camera type and the ideal placement to view the environment should also be briefly discussed.
The most important goal of this FORK project is to examine, whether the stereo vision can be used to determine the position of the tracking object in millimeter precision. The success of the project depends on this crucial statement since other parts of the project can’t be done if the precision is too low.
When tracking is involved, like here in a vision context or in signal processing, different filter techniques exits. These filters have to be closely examined and experiments have to be done using the Kalman filter.At least one publicly available marker detecting system has be tested using Kalman filter.
A simulation of the conveyor belt and the hook has to be done in the robotic simulation environment RobWork. To accomplish this, a CAD model of the conveyor belt and the hook has to be drawn and the back and forth movement of the hook, must be simulated by mathematically modelling the hook as a pendulum.
Chapter 2
Camera modeling
This chapter is meant to be a collection of commonly used mathematical theories in camera modeling. The chapter starts with a mathematical description of a camera. Then the attention is turned to the multiple view geometry, particularly the stereo vision. The limitation of the mathematical model is addressed in the form of distortion and how rectification eases the vision process. Finally some proposals are given when dealing with camera setup in a real environment.
It should be said that an in-depth treatment of the camera modeling is beyond the scope of this report and only the necessary theory needed for the project is provided.
2.1
Camera representation
The pinhole camera model is often used to mathematically represent a real camera. This model describes the relationship between the captured 3D point Xcam and the perspective projection of it on the 2D image plane,
namely x. The image plane is placed in front of the camera centre at z=f. A point x is the intersection of the line connecting camera centre and point Xcam [9]. See fig. 2.1 and 2.2.
From fig. 2.2 it can be seen that the projection which is a scaling/linear mapping can be expressed as: (X, Y, Z)T 7−→ (f X/Z, f Y /Z, f )T (2.1)
This linear mapping can be in homogeneous coordinates expressed as in equation 2.2
4 2 - CAMERA MODELING X Y Z 1 7−→ f Xf Y Z = f f 00 1 0 X Y Z 1 (2.2)
The equation 2.2 assumes that the origin of the image plane is at the principle point P=(px,py) (See fig. 2.3),
which is often not the case. Normally one of the corners of the image is chosen as the origin when working with images. Therefore a translation of the origin is needed as in fig. 2.3.
The linear mapping with the translation of the origin becomes: (X, Y, Z)T 7−→ (f X/Z + px, f Y /Z + py, f )T X Y Z 1 7−→ f X + Zpf Y + Zpyx Z = f f ppxy 00 1 0 X Y Z 1 (2.3)
Figure 2.3: Translation of the origin [9]
Until now the image plane is modelled as having Euclidean coordinates, but a CCD camera chip is made of an array of light sensors which are represented by pixels. Therefore a transformation from Euclidean - to pixel coordinates is needed and is done by multiplying with the number of pixel per unit, mxand my. See equation
2.4
αx = f mx
αy = f my
x0 = mxpx
y0 = mypy (2.4)
The resulting matrix K is called the calibration matrix and contains the intrinsic parameters of the camera. See equation 2.5. When these parameters are known the camera is said to be calibrated.
2.2. STEREO GEOMETRY 5
x = K [I|0] Xcam (2.6)
The equation 2.6 describes the projection of the 3D point in the camera coordinate system which means that the camera coordinate system is regarded as the world coordinate system. However, dependent on the application, other coordinate systems could be used as the world coordinate system e.g. the robot coordinate system. A transformation matrix containing the rotation and translation of the camera in the world frame, called the external parameters matrix, is multiplied to equation 2.6 and forms the projection matrix P [9]. This transformation matrix is important when dealing with visual servoing, since it changes if the camera is mounted on a robot in a so-called eye-in-hand configuration.
x = K [R|t] X
P = K [R|t] (2.7)
With the projection matrix P, a point in the world frame can be projected to the image plane. See equation 2.7.
2.1.1
Calculating the camera calibration matrix
It is normally assumed that the internal parameters of the camera (focal length) are fixed which means that the camera is not able to zoom. This is often the case because it is difficult to distinguish a change in the focal length from a translation in the z-axis in the camera coordinate system. The fixed calibration can be found in an offline calibration process. The classical method is to use a grid pattern of a known size which is seen from several views. This method is simple and flexible, since the pattern is only needed to be printed out and attached to a planar plate1.
2.2
Stereo geometry
Humans are able to perceive depth because of the spatially separated eyes viewing the environment with a slightly different angle. The same idea is duplicated in the stereo vision by viewing the scene with two cameras with a certain disparity (distance between the cameras) and an angle.
The stereo vision is often avoided in tracking methods since cameras people have at home are monocular cameras e.g. web cameras and you can’t expect them to buy a stereo camera when commercializing a tracking application. A stereo camera is also thought to be difficult to calibrate. In this project a stereo camera will be used to calculate the depth information since The Cognitive Vision group has used them in several previous applications(see section 4.4) and the depth acquired from stereo vision is much more accurate and roboust than single camera vision, after all the evolution has given humans two eyes as the visual sensing organ.
2.2.1
Epipolar geometry
The stereo camera delivers two images; right and left respectively. The relationship between these two images is expressed through the Epipolar geometry. The epipolar geometry of a stereo image is shown on fig. 2.4 (Adopted from2) The points (C,C’,M) forms the epipolar plane and the intersection of this pane with the two
images are called the epipolar lines (l,l’). The line connecting the two centers of projection (C,C’) intersects the image planes at the conjugate points e and e’ which are called epipoles [9].
1http://www.vision.caltech.edu/bouguetj/calib doc/
6 2 - CAMERA MODELING
Figure 2.4: Epipolar geometry
The points x and x’ are the projection of the 3D point M on to the image planes. The epipolar geometry yields this relationship between x and x’:
x0TF x = 0 (2.8) l0 = F x (2.9) l = FTx0 (2.10) F = [K0t]xK 0 RK−1 (2.11)
The matrix F is called fundamental matrix and has rank two. F is found by using the internal parame-ters(K,R,t,C) of the right and left camera. The relation says, that any point x’ in the second image matching the point x in the first image, must lie on the epipolar line l’. This can be done by searching along the line l’ and the point x’ can then be found by using a unique characteristics of this point and its surroundings, which is not always possible e.g. when the point is on a homogeneous surface without any texture.
Since every epipolar line goes through the epipole, the following relation yields:
F e = 0 FTe0
= 0 (2.12)
This means that the epipole is the null-vector of F.
2.3. LENS DISTORTION 7
2.2.2
Retrieving the depth information
The calculation of dept in computer vision is called triangulation. If a pair of correspondences in a set of stereo images is known, the depth can be calculated by intersection of the two lines of sights from the left and right image. First an equation for these two rays has to be found. The ray goes through the camera center C and the point P+x, where the P+ is pseudo-inverse of projection matrix P and x is an image point. When
P+x is projected on the image plane it can be seen that it corresponds to x as expected (P (P+x) = Ix = x)
[9]. With these two points the equation for the line can be formulated.
When the equations for the two rays from the right and left image are calculated, the intersection can be found. Because of uncertainties, the probability that the lines will intersect at a point in the 3D space is very small, therefore the midpoint of the perpendicular line connecting the rays will be considered as the intersection as shown in fig. 2.5.
Figure 2.5: Retrieving the depth
2.3
Lens distortion
The pinhole model earlier described doesn’t count for the distortion from camera lens and the distortion is especially visible in wide angle cameras. Fortunately the distortion can be removed by using the radial distortion model. In this model the distortion is described by using the radial- and tangential distortion. The first type is a displacement(dr) in the radial direction from the image centre and the latter(dt) is tangential to a circle with origin in the image centre [11]. See fig. 2.6.
The distortion parameters can be expressed like this:
8 2 - CAMERA MODELING
Figure 2.6: Distortion model [11]
Where the first two are the radial distortion and the next two are used for the tangential distortion while last parameter is often neglected. These are given as output from the Camera Calibration Toolbox for Matlab. The undistortion is done using the forward distortion model which describes the mapping from the undistorted-to disundistorted-torted image using the radial -and tangential disundistorted-tortion models. Then the mapping is reversed undistorted-to get mapping for the undistortion. For a detailed mathematical description of this process see [11].
Figure 2.7: The figure shows the radial and tangential distortion model. Each displacement of the pixels represented by the arrows is due to the lens distortion. As expected the distortion is largest at the periphery of the image
2.4. RECTIFICATION 9
cameras can be considered if the working space is limited and the camera can’t be moved further away to have the object of interest in the tracking application to be longer time in the field of view.
2.4
Rectification
The epipolar geometry reduces the correspondence problem from a 2D- to 1D problem. The corresponding epipolar lines have a certain slope depending on the camera setup which makes the searching for corresponding features difficult. The searching could be done much easier by having parallel epipolar lines so the y-coordinate becomes constant and one only has to consider the x-coordinates. This procedure is called rectification and can be done by first mapping the epipoles to infinity [1 0 0]T which result in parallel epipolar lines and then transforming image points so they have the same y- coordinate [9, 11].
The distortion and rectification processes can be combined to one single mapping which reduces the used memory and rounding errors.
2.5
Camera setup
As earlier mentioned, the external parameters of the camera can be changed unlike the internal parameters. In stereo vision, not only the placement of the camera in the working environment is important but also the mutual position of the stereo cameras which can greatly impact the depth calculation.
There are several parameters that contribute to uncertainty in triangulation and most of them are mutually exclusive.
• The resolution of the cameras.
• Disparity - The distance between cameras. • Vergence angle- The angle of view.
• Distance to the object from cameras.
• The intersection of the principal rays in the 3D environment.
A camera captures the world in a 2D image plane. The plane is discretized into pixels with a certain height and width e.g. 2048x2048. Consequently this limits the precision of a feature point that can be found on the left image and right image which again contributes to an uncertainty in the depth calculation. One way to increase the precision is to increase the resolution of the camera. However this is not possible when the camera is already bought. The solution is to find the features with subpixel precision which is a method to increase the resolution by interpolating between the pixel values.
The vergence angle θ describes the angle of view of the camera setup as shown in fig. 2.8. The vergence angle is defined by a rotation about the y-axis of the camera coordinate system and the rotations around the remaining two axis namely x- and z-axis is normally zero [11].
The vergence angle with the disparity creates the stereo effect of seeing an object with slightly different views. The resulting areas, where stereo is possible, is called the stereo field of view.
together-10 2 - CAMERA MODELING
Figure 2.8: The vergence angle [11]
reducing the disparity. Another advantage of this procedure is that the correspondence problem( epipolar geometry) becomes easier to solve since the left and right image will look similar as the disparity decreases. Here the mutually exclusive nature of the parameters comes in, when disparity is low, the depth inaccuracy becomes higher because of the increased similarity of left and right images and the stereo effect is lost(see fig. 2.9). The increased field of view comes with another price, now every pixel has to cover a greater area thereby reducing the resolution witch again affect the accuracy of the depth calculation as described earlier.
Figure 2.9: Uncertainty of reconstruction, The gray uncertainty region depends on the angle between the rays. [9]
2.6. CONCLUSION 11
image [14]. See fig. 2.10.
Figure 2.10: The left figure shows a plot of the uncertainty for x- and y image coordinates. The uncertainty is lowest in the middle of the image and increases when moving to the periphery of the image. The right figure indicates that the error becomes smaller when the object moves closer to the camera (decreasing z-value). [14]
2.6
Conclusion
Chapter 3
Kalman filter
Visual tracking is a field in the computer vision where the location of an object is tracked over time using image sequences. Visual tracking has a wide range of use e.g. monitoring traffic and people using surveillance cameras or in robotic systems as a part of visual servoing. There are two main types of visual tracking -namely online and offline tracking. Online tracking, which is often called real time tracking, gets its image sequences directly from the camera and the object is tracked with a certain frame rate depending on the application. This form of tracking is the most challenging one, since it sets limits on the processing time available for every frame, demanding specialized tracking algorithms and hardware. The offline tracking is used on already recorded image sequences e.g. to find shoplifters in a surveillance video. The offline tracking is easier to handle since a delay of couple of seconds or even minutes is tolerable compared to the online tracking. There exists also methods which uses both the offline and online information to track an object. The offline image sequence is used to identify properties and behavior of the object and these are then used during the real time processing [15].
In this project, a real time tracking algorithm is needed to track the moving hook. A real time tracking algorithm has usually two parts -the tracking filter and the feature extraction algorithm. The tracking filter helps to find the object in the global image space and then a feature extraction method is used to find the exact position of the object. In the following sections two different tracking filters will be described and then methods for feature extraction. By combining a tracking filter and a visual extraction method, an appropriate visual tracking algorithm suitable for this project will be described and experiments are made to examine the limitation of these algorithms.
3.1
The tracking filter
The tracking filter is a recursive algorithm which tries to follow an object of interest. The main purpose of the tracking filter is to reduce the amount of image region used during the tracking. Typically, a rectangular region around a point where the object is believed to be, is chosen. This reduction in data is necessary for real time tracking and makes the feature extraction process easier, which else could be distracted by similar looking objects in the image.
The simplest tracking filter approach is to search for the object in the last known position but this method fails if the object has moved outside the rectangular area e.g. when the object moves too fast, if the frame rate is too low or if the environment changes. The problem could be avoided by increasing the frame rate but due to other constraints in the algorithm and hardware, it is not always possible. In addition the environment plays a vital role in the form of changing orientation, illumination and occlusion which changes the appearance over time.
14 3 - KALMAN FILTER It is clear that using the last known position gives poor results and the solution is to predict the next position of the object. An object that is moving in an environment obeys certain physicals laws e.i. gravity, motion, torque or momentum. These laws can be used to mathematically model the object and predict the position at a given time. The prediction can not stand alone for tracking the object, since the mathematical model uses the past knowledge and assumes some condition about the object to make a qualified guess. It could be that some condition changes e.g. the speed of the object and therefor the model has to be updated through measurements on the actual position of object. This recursive loop of predict and update (correct) is a vital part of modern tracking systems. see Fig. 3.1
Figure 3.1: Predict and update
3.2
State space modeling
Most of the tracking algorithms uses a mathematical model of the object physics to predict the position. State space modeling is a well-known method to model a physical system . The model consists of an input vector, output state vector and a transition matrix which maps the input to the output. The transition matrix (see equation 3.2) is often constant but it could also change with time depending on the physical system. The state vector (see equation 3.2), consists of state variables which are the parameters that completely describes the system, e.g. position or speed.
The state can be described as continuous system but if the system is to be implemented on a computer a discrete time linear system model is necessary. In the discrete system, the input will be the state of the model in time t and the transition matrix A will transform the model to time t+1. When modeling a system, the task will then be to choose the appropriate state variables and use physical laws to define the transition matrix.
xt = x1 t x2 t x3 t .. . xn t (3.1) xt+1 = Axt (3.2)
3.3
Kalman filter
3.3. KALMAN FILTER 15
if the object one wants to track is a car. The position, velocity and acceleration can be modelled. However there are also other parameters - like air friction, which depends on the form of the car and friction in wheels. These parameters can’t be modelled so easily and is often omitted. Not only the prediction but also the measurement will be corrupted by noise/uncertainty e.g. in the camera sensor, camera calibration or in the depth calculation during stereo vision. These deviations in the prediction and measurements can be modelled as being system noise and measurement noise which have Gaussian distribution. The Kalman filter does exactly this by combining the noisy physical model and the noisy measurements in an optimal recursive data processing algorithm to predict the next state and then use the measurement to correct this prediction. The corrected prediction is used as input to the next prediction.
The signal graph for Kalman filter is shown in fig. 3.2 and a detailed description will be given in the following sections [16, 5].
Figure 3.2: The signal graph for Kalman filter. Adopted from [5]
The above described properties of the Kalman filter can be mathematically expressed followingly The system equation:
xk= Axk−1+ εk (3.3)
Explanation of the parameters:
1. xk is a nx1 state vector, which is to be estimated
2. Ak is a nxn system matrix
3. εk is the Gaussian system noise
The measurement equation:
yk = Cxk+ δk (3.4)
Explanation of the parameters:
1. yk is nx1 measurement vector
2. xk is a nx1 system vector which is to be estimated
16 3 - KALMAN FILTER
4. δk represents the Gaussian measurement noise
The variables εk and δk are expressed as covariance matrices Q and R respectively. The error is assumed to
have zero cross correlation and is white, which means only the diagonal elements of the covariance is non-zero. The Gaussian/normal probability distributions of errors with zero mean:
p(εk) ∼ N (0, Q)
p(δk) ∼ N (0, R)
The system matrix A and measurement matrix C is regarded as to be constant. However, it is possible that both matrices could change with time. The covariance matrices Q and R are also assumed to remain constant.
3.3.1
The derivation of the Kalman filter
In this section the internal parameters of the Kalman filter algorithm is explained [16]. The priori estimate
The filter starts at time t0 and is provided with an initial estimated state vector ˆx0 or generally ˆxk which is
then used to predict ˆx−
k+1. Since the prediction is made before incorporating the measurements, it is called a
priori estimate. The hatˆindicates that it is a prediction and the ’-’ tells that it is a priori estimate: ˆ
x−k+1= Aˆxk (3.5)
The error term of the priori estimate is called the error covariance matrix Pk− and it calculated using Q
P−
k+1= APkAT + Q (3.6)
The posteriori estimate
The priori estimate described earlier must now be updated/corrected using the measurement yk and resulting
estimate is called the posteriori estimate. ˆ
xk = ˆx−k + Kk(yk− C ˆx−k) (3.7)
The posteriori estimate ˆxk is calculated as linear combination of a priori estimate and an weighted difference
between the actual measurement yk and a measurement prediction C ˆx−k. The difference (yk− C ˆx−k) is called
the residual. It reflects the discrepancy between the actual measurement yk and the predicted measurement.
If the residual is zero then the predicted measurement and the actual measurement are the same.
The Kalman gain matrix Kk expresses to what degree the measurement and the prediction should be trusted
and calculated according to equation:
Kk= Pk−CT(CPk−CT + R) −1
(3.8)
From this it can be seen that when the measurement error covariance matrix (R) reaches zero, the Kalman gain will weight the residual more heavily:
3.3. KALMAN FILTER 17
Inserting in the equation 3.7.
ˆ
xk = ˆx−k + C−1(yk− C ˆx−k) = C−1yk (3.10)
If this is the case, the actual measurement yk is trusted more and more and the predicted measurement C ˆx−k
is trusted less and less.
If the a priori estimate error covariance matrix Pk−reaches zero, the Kalman gain will weight the residual less heavily: lim Pk−→0Kk = P − k CT(CPk−CT + R) −1 = 0 (3.11)
Again inserting in the equation 3.7. ˆ
xk = ˆx−k + 0(yk− C ˆx−k) = ˆx−k (3.12)
When covariance matrix Pk− approaches zero, the actual measurement is trusted less and less, on the other hand ˆx−k is trusted more and more.
The last update equation is the error covariance update equation.
Pk= (I − KkC)Pk− (3.13)
The error covariance matrix Pk reflects a statistical measure of the uncertainty, more precisely variance of the
state distribution of xk around ˆxk.
3.3.2
The Kalman Filter Algorithm
In the previous sections the different steps of the Kalman were described and in this section the steps will be combined to a recursive algorithm. As mentioned earlier the Kalman filter is an ongoing process where first the state is estimated then corrected using the feedback from the measurement.
On fig. 3.3 an expanded version of the recursive loop is shown. In the prediction step, the priori estimate and the priori error covariance matrix is calculated. If the Kalman filter is just started, an initial value for the state and error covariance must be given.
The newly obtained priori estimates are used in the measurement update equations. First the Kalman gain has to be calculated to weight the residual. When the Kalman gain is known, it is possible to obtain a new a posteriori state estimate. The last step of algorithm is to obtain the posteriori error covariance.
In a 2D visual tracking algorithm the prediction/priori estimate is used to obtain image coordinates where the object is believed to be. Then the feature extraction is used to get the measured position of the object and this value is used in the measurement update to correct the prediction.
3.3.3
Advantages and problems with Kalman filter
18 3 - KALMAN FILTER
Figure 3.3: The Kalman Filter Algorithm. Adopted from [16]
resolution image e.g. 2048x2048. Consequently, even if the initial position is successfully localized, the tracking task cannot start because the target object may already have moved away from the initial position. Another disadvantage, that comes with the localized behavior of the Kalman filter is that it can’t automatically recover from a tracking failure, which describes the situations where the tracking process loses track of the target object.
The Kalman filter uses an unimodal Gaussian distribution to represent a single state of the system. The above mentioned problems i.e. initial position and tracking failure could be solved by having multimodal Gaussian distributions where several states exist at same time. One of the multimodal methods is Markov Localization. Markov Localization uses a probabilistic algorithm which means that instead of maintaining a single hypothesis as to what is the best estimate of the pose of an object; this technique keeps a probability density over the space of several possible position of the object. In other words, while the Kalman filter has one prediction, the Markov Localization has multiple predictions. Not only it solves the initialization and local failure problem, but it is also possible to track two or several objects of the same type e.g. - multiple persons. However, in recent times another probabilistic algorithm has gained much interest, namely the particle filter. In contrast to the Markov Localization, the particle filter approximates the density into a discrete state space by sampling from the continuous state space (See fig. 3.4 and fig. 3.5). This significantly reduces the memory used during the computation [10].
3.3. KALMAN FILTER 19
20 3 - KALMAN FILTER
Chapter 4
Vision based 3D tracking
We assume that a tracking filter e.g. Kalman filter is giving a predicted position of the object in the 3D world which then could be projected to the image plane. The next step in the tracking procedure is to make the measurement by finding the object in image plane and calculate pose via the stereo vision. In the following sections, the different ways of finding the pose of the object will be described.
4.1
Marker based tracking
Identifying an object in an image is not an easy task. However, it is in some cases possible to mark the object with a reliable and easy to find feature, which makes the task of finding the object and the pose easier and more precise than natural features of the object e.g. edges. Markers, also called fiducials, have been used for many years and they come in two different forms. The point/circular fiducials gives a point wise identification but they can also be arranged in a distinctive geometric pattern. The professional solutions use retro reflective spherical fiducials and they are commonly used during motion capture for CGI effects [13, 12, 4]. See fig. 4.1 and 4.2.
The point marker can be extended to a planar marker with corners and a pattern inside. With a planar marker, it is possible to retrieve all six parameters of the marker pose- namely the 3D position and the rotations around xyz-axes. The planar markers has become very popular for AR( augmented reality) applications since they are robost against partly occlusion of the marker, illumination changes and provide a low cost solution for real-time tracking. The AR defines the research field where the computer generated data (virtual reality) and real world is combined to form a 3D interactive platform.
22 4 - VISION BASED 3D TRACKING Most of the markers have a black border on a white back ground and an inner pattern to identify different types of markers, if there are more than one in the environment. See fig. 4.3.
Figure 4.3: Planar markers [4]
The marker is found in the image by, first thresholding the image to find the black and white regions of the image. To find the black blobs, the algorithm looks for connected regions of black pixels. Since the marker has a rectangular contour, the black regions are fitted by four line segments. These rectangular regions are corrected to remove the perspective distortion and a template matching is done using a stored marker. See fig. 4.4.
The mapping between the corner coordinates (M) of the marker and the template (m) is done by calculating the homography matrix H. The pose of the marker can be retrieved from H and the internal parameters of the camera [12]. See equation 4.1.
m = H · M (4.1)
The marker and the pose estimation run in real time above 30 frames/s for a 512x512 resolution image, but it is of course dependent on the hardware platform, the image resolution and the number of markers. There exits many free AR libraries, like ARToolkit, ARTag and MXRToolkit. When installed, AR libraries can be used with a web camera to project a 3D computer model on the moving marker.
4.2
Model-based tracking
The model-based tracking is a method to track an object using a 3D CAD model of the object. As the fig. 4.5 1) shows, a 3D model is used to simulate the object in a virtual environment. Then the edges of the model are sampled to get a set of 3D points ( 2) ). These points are projected to the image plane using the projection matrix and the real image is read from a video feed. The edges are detected by searching for edges perpendicular to the model points. When the edges are found, the pose of the object can be calculated using stereo vision ( 3) ). The pose is fed into the tracking filter e.g. the Kalman filter and used as the measurement to correct the prediction ( 4) ). Then a new prediction of the pose is made and the 3D model is rendered according to this [2, 6].
4.3. MULTI-MODAL VISUAL PRIMITIVES 23
Figure 4.4: Fiducial detection algorithm [7]
However the correct edges can be confused with other edges which occur due to shadows and illumination changes.
4.3
Multi-modal visual primitives
orien-24 4 - VISION BASED 3D TRACKING
Figure 4.5: The model-based tracking algorithm. Adopted from [6]
tation, phase, colour and optical flow. The depth of the 2D primitives in the scene is calculated by finding the correspondences in the left and right image [1]. These extracted visual modalities represent the 3Dprimitive with the following representation:
Π = {Λ, Θ, Ω, (cl, cm, cr)} (4.2)
where Λ is the 3D position, Θ is the 3D orientation, Ω is the phase (contrast transition) and to the left side (cl), the middle (cm) and the right side (cr) colour of the primitive.
This compact encoding means that the primitives have the further advantage of compressing the amount of information in the scene. The MoInS (Modality Integration Software) library implements the Multi-modal visual primitive framework and is implemented in C++ using Linux.
4.4
Early marker experiments
The first experiment done in FORK project was to test a marker algorithm developed by phd. student Lars Baunegaard With Jensen. The purpose of this experiment was to get an initial feeling for the Moins software. The Moins doesn’t only have the Multi-modal visual primitive framework but also commonly used image processing function. In the following section the marker and the results gained from this experiment will be discussed.
4.4.1
The experiment
4.4. EARLY MARKER EXPERIMENTS 25
Figure 4.6: The primitive extraction process [1], The left and right image in a) and b); The extracted 2D primitives in c), d) and e); f) The 2D primitive with 1.Orientation 2.Phase 3.Colour 4.Optic flow; g) From the left and right 2D primitives, the 3D position Λ and orientation Θ is calculated; h) The end result with 3D primitives.
4.4.2
The marker
The marker self is shown on fig. 4.7 A) and B). The colors of the marker at the three axes are chosen to have high contrast so the axes can be distinguished. The axes of the marker are found by finding the primitives with black on one side and a one of other three colors on the other side. At least two axes are needed to calculate the pose and the marker has three axes since if one of the axes is horizontal, then it can’t be reconstructed in 3D due the correspondence problem in stereo vision. When at least to axes is available, the center of the marker can be calculated. A plane is fitted to the 3D primitives and the normal of the plane is regarded as the x-axis of the marker. The 3D center of the marker can be found by calculating intersection of line going through 2D center in the left image and the planes. The 3D black/green primitives are chosen to be the z-axis. The missing y-axis is found as the cross product of x- and z-z-axis. As the author also mentions, the biggest drawback of this marker detection is that the algorithm uses only color to identify the markers (See fig. 4.7).
26 4 - VISION BASED 3D TRACKING The pose calculation part of the algorithm worked fine and it is suitable for using in further development of the ring on the hook project.
4.4.3
Conclusion
The marker based tracking algorithms were discussed and different marker types were described. One of the public available marker algorithms can be used to track the hook as a preliminary step before design-ing a marker-less trackdesign-ing algorithm. The model-based algorithm was proposed as the marker-less trackdesign-ing algorithm.
4.4. EARLY MARKER EXPERIMENTS 27
Chapter 5
A mathematical model of the hook
and the conveyor belt
The Kalman filter needs a mathematical model of the object to be tracked. In the ring on the hook scenario, two separate systems are clearly distinctive. The first one is the conveyor belt with a constant velocity. The belt is easy to model since the physical equations for a line motion is well known. The second one is the hook, which has a very similar physics like a pendulum. The dynamics of a 2D pendulum have been studied for centuries and is mandatory in every physics class. In our case the hook is moving in 3D which means that the pendulum system is chaotic thus can’t be modeled so easily. However, in our scenario, the movement of the hook is limited by its suspension point on the conveyor belt (See fig. 5.1). The movement of the hook can be approximated to two directions or planes and even if hook gets into the chaotic state, it will quickly stabilize to these two planes. The hook moves because of the movement of the conveyor belt. However in this project the hook and the belt are modelled separately which is believed to be a satisfying approximation of the system.
5.1
The motion model
As described in section. 3.3, the Kalman filter requires a mathematical model represented in state space. Therefore the first task must be to indentify the individual state variables. The essential parameter of any 3D tracking is the position (px
t, pyt, pzt) of the object [5]. The more precise the model is, the more precise is the
Kalman prediction. So in addition to the position, the velocity (vx
t, vyt, vtz) and the acceleration (axt, ayt, azt) of
the object is also included. The state of the model in any given point can be defined with the vector:
xt= (pxt, pyt, pzt, vxt, vyt, vtz, axt, ayt, azt) (5.1)
The mutual relation between the position, velocity and acceleration can be expressed with the equation 5.2.
pt+1 = pt+ vt∆t + at∆t
2
2 (5.2)
vt+1 = vt+ at∆t (5.3)
at+1 = at (5.4)
Note that the equation for the velocity is a differentiation of the position regarding ∆t, like wise the acceleration is a differentiation of the velocity, confirming the general theory of motion.
30 5 - A MATHEMATICAL MODEL OF THE HOOK AND THE CONVEYOR BELT
Figure 5.1: The movement of the hook can be approximated to two directions
The modal assumes that the acceleration is constant, which is also obvious from the last equation of the motion model. This doesn’t have any affection for our scenario since the velocity and the acceleration of the belt is constant.
The ∆t is the time between the samples (samplings frequency) and in the experiments it is assumed to be 1 second.
The final transition matrix of Kalman filter can be derived by collecting the constants in a matrix (See equation 5.5). xt+1 = Axt px t+1 pyt+1 pz t+1 vx t+1 vyt+1 vz t+1 ax t+1 ayt+1 az t+1 = 1 0 0 1 0 0 1 2 0 0 0 1 0 0 1 0 0 1 2 0 0 0 1 0 0 1 0 0 1 2 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 · px t pyt pz t vx t vyt vz t ax t ayt az t (5.5)
5.2. THE HOOK 31
system can be included in the prediction equation through system noise. As earlier described the system noise is very difficult to model, but a appropriate value can be found by running the Kalman filter with different noise covariance matrices.
5.2
The hook
As earlier mentioned, the hook is modeled as a pendulum. A pendulum is an object with a mass m attached to the end of a mass less rod of length L. When the mass is moved away from the resting position and released, the nonlinear pendulum swings back and forth. See fig. 5.2
Figure 5.2: The forces affecting the pendulum. Adopted from [8]
The mass experience a gravitational force in the downward direction and the rod exerts a torque force, which force the mass to fallow a circular path. If a damping force is present in the form of friction from air or at the pivot, the period time, which is defined by a single back and forth motion, will get smaller and smaller. At some point the pendulum will stop its motion.
Here a model for the pendulum will be derived by describing how the angle theta φ varies with time [8] . First the equilibrium of forces has to be found for the pendulum. According to Newton’s second law, the resulting inertia force FI is defined as mass times acceleration. The inertia force FI is cancelled by an equal
apposite restoring force FR caused by the gravitation (See fig. 5.2):
FR= −G · sin(φ) = −mg · sin(φ) (5.6)
To obtain the inertia force FI, Newton’s second law is used again:
32 5 - A MATHEMATICAL MODEL OF THE HOOK AND THE CONVEYOR BELT Where aT is tangential acceleration. Since the pendulum is modeled using the angle φ, the aT must be
expressed using the angular acceleration.
First the arc length S traveled by the pendulum from the resting position must be calculated. The circumference of a circle of radius L is:
O = 2πL (5.8)
The arc length for a given angle φ can be found as:
S = φ
2πO =
φ
2π(2πL) = Lφ (5.9)
The tangential acceleration aT can be found by differentiating the equation 5.9 two times (see equation 5.10).
aT = Ld
2φ
dt2 (5.10)
Then the FI becomes:
FI = mLd
2φ
dt2 (5.11)
The final equation describing the undamped pendulum is given as follows:
FI = FR mLd 2φ dt2 = −mg · sin(φ) d2φ dt2 + g Lsin(φ) = 0 (5.12)
The last force to be added to the pendulum model is the friction. The damping force is proportional to the velocity, so the equation for the damped pendulum is given as:
d2φ dt2 + D dφ dt + g Lsin(φ) = 0 (5.13)
The D is the damping constant and is dependent on the friction at the pivot point and air friction.
The above described pendulum model only yields for a point mass where the the center of gravity (C.G) is the middle of the point mass (See fig. 5.2). The hook will have distributed mass so the model must be altered by finding the C.G of the hook and some effective length L, which is smaller than the distance between the center of mass and the pivot point. This will be examined in the master project when the hook available. From equation 5.13 lot of valuable properties of the pendulum model can be learned. It can be seen from the equation, that the model is described by a second order differential equation. So a unique solution can only be found when the angle and the velocity are available at a given time. This suggests that the state variables, which describe the state of the pendulum model, must be the angle and the velocity.
xt= (φt, vt) (5.14)
Before the differential equation can be discredited, it has to be linearized because the Kalman filter only works for liner models. The nonlinear term of the eqution is Lgsin(φ). For small angles the nonlinear term can be
5.2. THE HOOK 33
The linearized version of equation 5.13:
d2φ dt2 + D dφ dt + g Lφ = 0 (5.15)
5.2.1
Discretizing the pendulum model
The continuous linear differential equation for the pendulum was derived in the last section. To use the model in the Kalman filter, one needs the discrete version. The state variables of the state space model were also chosen to be angle and velocity in the last section. The discretation can be derived from the equation 5.12. First the FR and FI have to be rewritten.
FR is linearized as described in the last section:
FR = −mg · sin(φt) FR = −mg · φt (5.16) FI is discretized as follows: St = Lφt vt+1 = St+1− St ∆t at+1 = vt+1− vt ∆t FI = mat+1 FI = mvt+1− vt ∆t (5.17)
The resulting equation for the velocity is gives as equation 5.18. Note that the damping force FDis also added.
FI+ FD = FR mvt+1− vt ∆t + Dvt = −mg · φt vt+1 = vt−D mvt∆t − gφt∆t vt+1 = (1 −D m∆t)vt− g∆tφt (5.18)
This equation tells that, the velocity at the next time step can be found by using the current velocity and angle.
Next the last state variable- namely the angle, has to be predicted. Again the angle can be derived by using one of the previous equations.
St = Lφt
vt+1 = St+1− St
∆t
vt+1 = Lφt+1− Lφt
34 5 - A MATHEMATICAL MODEL OF THE HOOK AND THE CONVEYOR BELT φt+1 = φt+ vt+1∆t L φt+1 = φt+ ((1 − D m∆t)vt− g∆tφt) ∆t L φt+1 = ((1 −D m∆t) ∆t L)vt+ (1 − g ∆t2 L )φt (5.19)
The angle at the next time step can be found using the velocity and angle. Now the transition matrix can be found using the equation 5.18 and equation 5.19:
xt+1 = Axt (5.20) · φt+1 vt+1 ¸ = · 1 − g∆t2 L (1 −Dm∆t)∆tL −g∆t 1 −D m∆t ¸ · · φt vt ¸ (5.21)
5.2.2
Simulation of the pendulum
The pendulum model was simulated with damping and without damping.
g = 9.81m s2 ∆t = 0.05s M = 1.0kg D = 0 L = 1m
Initial position: φ = 1.0 radian
This is a pendulum with a mass of 1kg and 1m rod length. The samplings frequency is 1/0.05= 20Hz. The damping coefficient D is set to 0.
When damping isn’t present; the period time can be calculated. The period time for a pendulum can be calculated by this equation:
T = 2π s L g T = 2π s 1m 9.81m s2 = 2sec. (5.22)
From fig. 5.3, it can be seen that the period time is indeed 2 sec and as expected the pendulum don’t lose energy since no friction is present.
The fig. 5.4 shows the velocity. It can be clearly seen that the velocity is maximum in the resting position of the pendulum and is zero in the extremum positions.
The undamped pendulum has the same parameters except the damping coefficient (D) is set to 0.5.
5.3. CONCLUSION 35
Figure 5.3: x (red) and y (blue) coordinates of the undamped pendulum
In master project, experiments have to be made to verify the model. One could think of having the hook fitted on a robot and record some images or sequence of images while the robot is moving. Since the exact robot position is known, the tracking precision of the hook can be found.
5.3
Conclusion
36 5 - A MATHEMATICAL MODEL OF THE HOOK AND THE CONVEYOR BELT
Figure 5.4: Velocity of the undamped pendulum
5.3. CONCLUSION 37
Chapter 6
Experiments
Different experiments was done to verify the theory described in the earlier sections. These following tests have been chosen, as they are vital preliminary investigations of the ring on the hook problem.
• Robot-camera calibration • Stereo vs. single camera • Tracking with Kalman filter • Simulating in Robowork
In the first test, the transformation matrix from the robot- to camera coordinate system is found. During visual servoing the pose of the hook is found in the camera coordinate system, thus one need to transform these Cartesian camera coordinates to robot coordinates in order to control the robot. Another advantage of knowing the transformation is to be able to versify the precision of the found hook/marker pose, since the movement of the robot is known in millimeter precision from the robot controller. The second- and third test is to compare the pose of a marker obtained from single camera and stereo vision in different image sequences and then apply the Kalman filter to track the marker in these sequences. The last experiment is to draw a CAD model of the whole hook/conveyor scene and then simulate it by approximating the hook as a pendulum.
6.1
Robot-camera calibration
The transformation matrix from robot- to camera coordinate system, is found using one of the programs in Moins. The program needs a set of 3D points in the camera coordinate system and the same 3D points in the robot coordinate system. Then the rigid body motion is found between these points and a transformation matrix is returned. Eight points were chosen so they formed the corners of a cube. A marker, which was designed as a cross, was attached on the top of the robot tool (see fig.6.1). The eight points in the camera coordinate system can be found using the cross and the same points in robot coordinate system can be obtained by making a translation to the top of the robot tool from flange points read from the robot controller. With the transformation matrix, an error value describing the precision of the transformation matrix was also returned. The lowest error value obtained so far by the Covig group is about 4, and the error value for this case is 2, which is considered to be a satisfying result.
The robot-camera transformation matrix:
40 6 - EXPERIMENTS
Figure 6.1: The robot-camera calibration, a) one of the eight points of the cubic movement; b) The marker used for the calibration. The red and blue artificial crosses are drawn by the program. The blue cross indicated the center of the marker which was found manually. The corresponding position of the tool in robot coordinate system is transformed to camera coordinate system using the calculated transformation matrix and projected to the image plane afterwards. This point is shown with a red cross. As it can be seen, the red and blue crosses are very close to each other.
TC
6.2. STEREO VS. SINGLE CAMERA 41 TC R = 0.9512610.21047 −0.6763530.308166 −0.0116405 −441.051−0.705868 −226.211 −0.225397 0.669015 −0.708248 1118.58 (6.1)
6.2
Stereo vs. single camera
The idea with this experiment is to show that the depth derived from stereo vision is more precise than using a single camera. At the same time, the precision of the stereo vision will be examined. A marker (see fig. 6.2 )is fixed on a robot and a stepwise movement is made.
The AR library MXRToolkit is used to obtain the pose of the marker using single view. The general mode of operation of a AR application is already described in section 4.1. The current version of the MXRToolkit works under Windows visual studio C++ and has a detailed user guide. The MXRToolkit returns the rotation matrix and the 3D coordinate of the middle of the marker.
The pose of the marker using stereo is obtained using the Moins library. From MXRToolkit it is possible to get eight corner points of the marker see fig. 6.2. When the corner points are known in the right and left image, one can reconstruct the eight coplanar points in 3D. The plane fitting algorithm described in section. 4.4 is used to fit a plane to the eight points and from the plane one can calculate the rotation of the marker.
Figure 6.2: The marker used in the experiments. The eight corner points (red circles) returned by the MXRToolkit.
6.2.1
Hardware setup
The robot used in the experiments is a six axes Staubli RX60 robot. The robot can be controlled using a client-server program, where the server runs on the robot controller and the client is on a PC. The robot is moved by changing the tool position of the robot.
A high resolution camera pair arranged in a stereo configuration is used to record the pictures in 2048x2048 resolution. The camera is calibrated, so the external- and internal parameters are available. Furthermore the images are undistorted (and rectificated) after recording (see fig. 6.3).
42 6 - EXPERIMENTS
Figure 6.3: The high resolution camera
separated, therefore a socket communication was programmed between the programs so the image capturing and the robot controlling could be automated.
6.2.2
Test
In this test the marker is moved in a stepwise movement towards the camera. The distance between the steps was set to 29,34mm and the rotation of the marker is kept constant. The movement was done only in the y-direction of the robot coordinate system. In total 21 images were recorded. (See fig. 6.4 and fig. 6.5) Fist the rotation is calculated in camera coordinate system as described above and compared with the reference rotation obtained from the robot controller. The rotation in the robot coordinate system is transformed to camera coordinate system using the robot-camera transformation matrix.
The calculated 3x3 rotation matrices using stereo vision and single view (MXRToolkit) were transformed to quaternion [12]. A unit quaternion is a way represent rotations in the 3D space and it is often used to avoid the gimbal lock problem. The complex parameters of the quaternion can be expressed as a + bi + cj + dk or as a scalar plus a vector:
q = a, bc d (6.2)
The four parameters of the quaternion don’t give any intuitive understanding of the rotation contrary to the Euler angle. However it is possible to transform the quaternion to a rotation about the unit vector ~ω by an angle θ [12]: q = µ cos µ 1 2θ ¶ , ~ωsin µ 1 2θ ¶¶ (6.3)
6.2. STEREO VS. SINGLE CAMERA 43
Figure 6.4: The first image of the sequence Figure 6.5: The last image of the sequence
Figure 6.6: Rotation obtained by using stereo is plotted for every image in the sequence. The red plot is the reference rotation from the robot. a) The x-coordinate of the ~ω; b) The y-coordinate of the ~ω; c) The z-coordinate of the ~ω; d) The angle θ
44 6 - EXPERIMENTS
Figure 6.7: Rotation obtained by using single view is plotted for every image in the sequence.The red plot is the reference rotation from the robot. a) The x-coordinate of the ~ω; b) The y-coordinate of the ~ω; c) The z-coordinate of the ~ω; d) The angle θ
as the movement will only be in the y-axis of the robot coordinate system. Moreover the distance between the steps is calculated. The distance gives a measure of the uncertainty without involving the error in the robot-camera calibration and the stability of the stereo and single view method can be directly compared. All position measurements are in millimeters. (see fig. 6.8 and fig. 6.9)
To get an overview of the plots, the mean and the standard deviation of the error were found. The error is the absolute difference between the reference- and the calculated value. If the calculated-and the reference values are close to each other, the mean and the standard deviation of error will be close to zero. Standard deviation measures the spread of data about the mean which means that it can be used to analyze the stability of a method. This procedure is applied to the rotation- and position plots separately (see table 6.1 and 6.2).
6.2.3
Evaluation of the results
6.2. STEREO VS. SINGLE CAMERA 45
Figure 6.8: The 3D position obtained by using stereo is plotted for every image in the sequence. The red plot is the reference position from the robot. a) The x-coordinate of position; b) The y-coordinate of position; c) The z-coordinate of the position; d) The distance between the steps.
Rotation calculated in camera CS. Rot. using stereo Rot. using single view
Mean of x error 0.038 0.050
Standard deviation of x error 0.020 0.061
Mean of y error 0.020 0.033
Standard deviation of y error 0.013 0.034
Mean of z error 0.132 0.069
Standard deviation of z error 0.007 0.003
Mean of angle error 0.557 0.611
Standard deviation of angle error 0.289 0.674 Mean of mean error (x,y,z and angle) 0.187 0.191 Mean of Standard dev. error(x,y,z,angle) 0.082 0.193
Table 6.1: The mean and standard deviation of error for rotation obtained by using stereo and the single view method. The error for x, y, z coordinates of the ~ω and the angle θ is shown. The last two rows gives the mean and standard deviation of error for all 4 parameters together. Cs. stands for coordinate system.
acquired from single view.
46 6 - EXPERIMENTS
Figure 6.9: The 3D position obtained by using single view is plotted for every image in the sequence. The red plot is the reference position from the robot. a) The x-coordinate of position; b) The y-coordinate of position; c) The z-coordinate of the position; d) The distance between the steps.
Position calculated in robot CS. Pos. using stereo Pos. using single view
Mean of x error 1.164 6.528
Standard deviation of x error 0.186 0.277
Mean of y error 3.194 63.692
Standard deviation of y error 1.302 16.321
Mean of z error 2.752 32.110
Standard deviation of z error 0.867 0.918
Mean of mean error (x,y,z) 2.370 34.110
Mean of Standard dev. error (x,y,z) 0.785 5.838
Mean of distance error 0.637 2.693
Standard deviation of distance error 0.567 1.056
Table 6.2: The mean and standard deviation of error for position obtained by using stereo and the single view method. The error for x, y, z coordinates of position and distance is shown. The 7. and 8. rows gives the mean and standard deviation of error for x, y, z coordinates together.
6.3. TRACKING WITH KALMAN FILTER 47
Figure 6.10: A 3D plot of the movement in the robot coordinate system. The positions are calculated using stereo.
seen on the y- direction (3. row) since the greatest displacement is in the y-direction.
The position acquired from single view is far from accurate compared to single view. It seems like x, y and z coordinates are subjected to a constant displacement which is clearly visible on the plot figures and the large mean error of 34.11mm. However the positions are relatively stable with a standard deviation of 5.83 mm. This constant displacement doesn’t affect the distance between the steps, which has a mean error of 2.693 mm and a standard deviation of 1.056 mm. The distance error for the stereo is 0.637mm for mean and 0.567mm for standard deviation which better than position calculated from single view. An interesting point seen from the analysis for the rotation and the position is that the rotation for single view is far more accurate than the position. This is again due to the fact that there is constant displacement in every axis and the rotation is oblivious to this displacement.
It can be concluded from above analysis that the stereo vision is more accurate and stable than the single view method.
6.2.4
Sub conclusion
The analysis of rotation and 3D position obtained from stereo- and single view method shows that the stereo vision is indeed the best choice for accurate depth perception. The decision of using the stereo vision in the ring on the hook project is well argued.
6.3
Tracking with Kalman filter
From the last test, the pose of the marker in the sequence is available. Since the mathematical model for a continuous motion is already known (see section 5.1), the Kalman filter can be applied to track the marker in the image sequence.
48 6 - EXPERIMENTS • The precision of the predicted value compared with the robot value.
• The convergence of the Kalman filter
• The influence of different noise values on the Kalman filter A brief description of the three cases:
The most important property one expect from the Kalman filter when used in a tracking application is that it predicts the next position of the object very precise so the object is inside the tracking rectangular area. Since the precise value of the marker is known through the robot position, one can evaluate the precision of the predicted value.
In the beginning of the tracking process, the Kalman gain is a predefined value. For every subsequent iteration, the Kalman filter learns the dynamics of the system one is tracking and adjusts accordingly the Kalman gain value. As described earlier(see section 3.3.3), the initialization of the Kalman filter is one of the critical moments when starting a tracking procedure. Consequently, the faster the Kalman filter converge, the greater is the chance of tracking.
The noise modeling is a very difficult task as described earlier since it’s depends on nature of the application and there are often several noise sources. The literature on this subject is also sparse. In most occasions the authors just mentions that the noise is Gaussian and don’t describe how to model the noise. In this project a noise model wasn’t developed. However an appropriate noise value can be found by trying out different noise values and then choose the one which gives the best result.
6.3.1
The precision of the prediction
This test is used to analyze the precision of the Kalman filter predictions. The fig. 6.11 shows the plots for all nine parameters of the Kalman filter. The x, y, z - coordinates are close to the reference value. The plots for the velocities show that the Kalman filter correctly predicts the velocity of the y- direction to be about 30 mm. It clearly seen on the plots for the accelerations that they are close to the zero as expected. If a comparison is made for the velocity and acceleration in the x- and z- direction, one can see that the both the velocity and the acceleration is more stable in the x-direction than the z- direction. This is due to the fact that the x-coordinate of the position ( a) of fig. 6.11) is stable throughout the whole sequence while the z-coordinate is decreasing. This decreasing is seen as fluctuations in the velocity- and acceleration prediction of the z-direction. Table 6.3 shows the mean and standard deviation of the position error from stereo and Kalman filter. The values from both cases are very close to each other. The prediction has a mean error of 2.5 mm, which is very close to zero. The result has a satisfying precision, specially realizing the fact that the prediction is a qualified guess using the system model and the last measurement.
Kalman filter Pos. using stereo Pos. predic. by the Kalman
Mean of mean error (x,y,z) 2.370 2.499
Mean of Standard dev. error (x,y,z) 0.785 1.243 Mean of mean error
- 1.117
(x, y, z, vx, vy, vz, ax, ay, az)
Mean of Standard deviation error
- 1.014
(x, y, z, vx, vy, vz, ax, ay, az)