Re-sampling and updating - Object tracking

CHAPTER 3. METHODOLOGY

3.3 Adaptive Particle Filter using Optical Flow based Sampling (APF-OFS) tracking

3.3.1 Object tracking

3.3.1.3 Re-sampling and updating

Among all N samples existing in each frame, T Ns_i samples with high probabilities (weights)

are selected.

Thus, current sample set S_t is determined from N samples centered on _T c_it with probability

t i  at time t,





NT i t i t i t i t i t c h wid S  ( , ),  _₁ (3-17)

where c_itis the ith sample coordinates with height and width h_itand wid_it at time t and _itis the related weight (score) to ith sample. Sample set S_t is an approximation of posterior distribution of the target state at time t.

The particle filter state in two consecutive frames does not change significantly if the camera does not move. Here, it is a translation of sample coordinates around its previous position and scaling of previous sample size. No rotation is applied in our application. It is because people are walking upright.

In each time t, the motion of the target is assumed to correspond to a dynamical first order auto-regressive model given by:

t t t P

P  _₁ (3-18)

P and P_t_₁ are the particle filter states at time t and t-1 respectively. _t is a multivariate Gaussian random variable and it related to random translation of the sample center coordinates and scaling of previous sample size.

Thus, in the resampling step, N samples are generated by a Gaussian random function in a circular region with radius of rg (e.g. 1/5thof image height length) around the centroid of ROI

using the motion assumption of equation 3-18. The size of these samples (width and height of the ellipses) is varied randomly ss (e.g5%) of the previous target size in the case that the target

approaching or moving back from the camera. This choice is based on experiments and on the normal speed of people walking toward or away from the camera. This value is changed proportionally to the camera zooming parameter (e.g. two times zooming results (25%) scaling changes). Indeed for our particle filter we are using scaling from the previous target size and translation around a pixel of interest. The scaling and translation values are obtained experimentally and are explained in the results section.

In our APF-OFS, current sample set S_t is composed of two sample sets which are S_ttarget and

motion t S : motion target t t t S S S   (3-19) target t

S is the current sample set that is composed of previous sample set at time t-1 and

motion

S is the current sample set composed of moving areas extracted by optical flow. It is because of using only previous sample set is not appropriate in our application since the camera is moving

and the position of the previous state is changed. Indeed, in this work, we do not make any 3D position estimation of the scene, we do not know how the samples should be translated or scaled if the camera is moving or zooming. Thus, we do not have the 3D position of the samples and the camera motion vector is also an approximation in 2D, we cannot distinguish the sample position transformation from one frame to another accurately if the camera displacement is large. So we adjusted the particle filter to our application. Therefore from the previous sample set we only take the target sample with highest probability, sf (e.g. N = 1), if the camera does not move. But if s_i

the camera moves, since it is supposed that the camera centers on the target, for the particle filter we only take the image center position with the last target sample size. The image center is used during the camera movement to generate N samples around it for the current sample set.

Therefore, we sample with ellipses the image around two types of ROI and model them: 1. Area around the previous target position coordinates or around image center,

2. Moving areas extracted by optical flow.

Figure 3.10 Target sample in two consecutive frames (a) before camera movement (b) after camera movement.

Thus, re-sampling is done based on two types of samples: previous target position based samples and motion-based samples. If the camera starts to move, during the movement, the sampling process is done around the image center and moving areas extracted by optical flow. During the camera movement, image center may be a prediction of target position since ideally target should always be at the image center.

Figure 3.10 shows two consecutive frames during camera movement. If we still do re- sampling around previous target position instead of image center we will wrongly do the sampling. In the following the re-sampling process for the motion-based samples is explained.

The second type of samples, Stmotion, is detected by estimating the motion of the target from

two consecutive images It and It+1, using pyramidal Lucas Kanade optical flow (Bouguet, 2000).

In optical flow (Shi & Tomasi, 1994), strong corners in the image which have large eigenvalues are detected for comparison. To solve pixel correspondence problem for a given pixel in It, we

look for nearby pixels of the same color in It+1. The basic concept in optical flow is to define the

motion vector d for two consecutive images It and It+1 by minimizing the following residual

function ε:

 

            x x x x y y y y w u w u x w u w u y y x t t y x d I x y I x d y d d d) ( , ) ( ( , ) ( , )) (  ₁  (3-20)

where (2w_x1)(2w_y 1) is the integration window size to evaluate the ε value. Usually

w and w_y are equal to 2,3,..7 pixels (Here we select 3, which is recommended in (Bouguet, 2000). For the pyramidal representation of the images, in each level this residual function will be minimized. We use 4 pyramid levels with 10 iterations. The threshold on stopping the iteration for minimizing of the residual function is 0.3. Optical flow extracts motion-based pixels with their related motion vectors resulting from camera movement or object movement.

As found experimentally, the detected motion vectors are noisy. In addition, the camera motion vectors have effect on object motion vectors. Thus, to remove this effect, camera motion vectors are extracted. To calculate the camera motion vectors, a radial histogram of motion vectors is calculated. In a radial histogram, each bin is based on the quantized length (r) and angle (θ) of the motion vector. Our radial histogram, h(r,θ) has 36180 bins. r has 201 values,

and is varied based on the image size between 0 and image diameter (e.g for an image size 640 × 480: the radial bin number, Nr =201 , radius is quantized by factor 4 pixels). θ has 180 values and

is varied between 0 and 360(e.g. the angular bin number, Nθ =180, angle is quantized by factor

2). These values are obtained experimentally and are explained in the results section. The r and θ of the bin that has the maximum number of vectors is assigned to be the camera‘s motion vector length and angle. The detected motion vectors that have this length and angle are removed and then the camera motion vector is subtracted from the rest of the motion vectors using estimated bin values. The bin value is computed based on the low limit range value of the quantized values. For example for r between 0-3 the bin value is 0. Motion vectors are then grouped according to their distances from each other and their lengths and orientations (Chung et al., 2005). Motion- based samples are extracted around object motion vectors groups.

In document UNIVERSITÉ DE MONTRÉAL LOW AND VARIABLE FRAME RATE FACE TRACKING USING AN IP PTZ CAMERA PARISA DARVISH ZADEH VARCHEIE (Page 83-87)