Position prediction - Adaptive Particle Filter using Optical Flow based Sampling (APF-OFS) trac

CHAPTER 3. METHODOLOGY

3.3 Adaptive Particle Filter using Optical Flow based Sampling (APF-OFS) tracking

3.3.2 Position prediction

Position prediction block shown in Figure 3.1 is explained here. As discussed before, because of the three delays in the control loop, the system does not put the camera center on the target at the right time using only the determined object position found by the object tracking block. To compensate for motion of the target during the delays, a position prediction based on the D last motion vectors of the target (i.e. when the camera is not moving) has been designed. This motion predictor considers the angles between D consecutive motion vectors. If the angle difference is smaller than (i.e.  is obtained experimentally), it is assumed that the target is moving in the

same direction and its average speed can be estimated. The average speed of the target, v , during the D last target motion vectors is thus, equal to:



     _D i i i D i i x v 1 3 1 1 ) (  (3-21)

Where Δxi, is the ith target displacement vectors (i.e. target motion vectors). Motion vectors are

calculated when the camera is not moving.

 , is the ith delay ₁ and ₃i, is the ith delay ₃for the D last captured images. The system will put the camera center on the predicted position which is:

v c

cp  E 2 (3-22)

Where cp is the predicted target coordinate and c is the extracted target coordinate from the E

object tracking block. ₂ is the average delay time ₂obtained proportionally to displacement required to reach c . For example each camera movement creates a delay of _E 2i where i is the i

movement of the camera with displacements of c_c-c , where _Ei c_cis the image center. ₂is calculated based on average of all previous camera displacements (e.g. all cc-

i E c ).



    i i i i E c x c c 2 2 ( )   (3-23)

Thus, the camera is controlled and moved adaptively with the speed of the network and camera response time, and the motion vector of the target.

The face tracking algorithm using APF-OFS tracking can be summarized as the following steps:

1. Initialization

 Automatically fit an ellipse over the target head and torso. 2. Resampling

 To construct Sttarget :

- If the camera does not move, choose best sample with highest score from samples at time t-1, sf, based on the probabilities determined by equation (3-15).

- If the camera is moving, choose the image center with previous target size.

 To construct motion

S :

- For two consecutive images, apply optical flow and extract the group of object motion vectors from camera motion vectors.

 Compose the current sample set of S_t from S_ttargetand S_tmotion using equation (3-19) .

 Reproducing samples using translation and scaling changes equation (3-19) . 3. Update

 Update new samples weights from obtained observation measurement using equation (3- 15).

4. Feature scoring

 Select the best sample using equation (3-16). 5. Position predication

 Predict the position of the target based on the D last motion vectors of the target.

 Send the predicted position to the camera. The predicted target coordinate is now ready to send to the camera if the camera is free for moving.

 Process next frame and iterate from step 2 to step 5. The camera control is presented in the next section.

3.4 Camera control

Then the predicted position is sent to camera control block to center the PTZ camera on the target. Camera control block shown in Figure 3.1 adjusts pan and tilt values based on the

predicted position and zoom on the target with an appropriate zooming value. Figure 3.11 shows the perspective transform from 3D world coordinates to the 2D image plane coordinates.

Figure 3.11 Perspective transform geometry

To obtain which image coordinate cP=(Cx,Cy) corresponds to a world point, P, the following

relations are used:

z z x x P C P C   _(3-24) z z y y P C P C   _(3-25)

In our case, the camera is controlled based on the target motion on the 2D image plane. That is, we assume that the object is moving on a plane parallel to the image plane (

z z

P C

is assumed constant). Because of the low frame rate, this is not exact, but it is an acceptable approximation for a walking human.

To follow the target, the PTZ motors are commanded based onc . As discussed earlier, in our p

servo control loop, cp is computed and camera is moved to keep well tracking of target in the

case of variable delays. Sometimes, we have good tracking and sometimes not. Indeed sometimes the camera continuously tracks the true target with a high score but sometimes the target score is small. Thus, the zooming on the target is done if the target is well tracked. We should be careful about moving the camera or zooming to avoid losing the object from observed camera FOV. To achieve this goal, some criteria should be verified to ensure we are correctly moving the camera to follow the target or to zoom the camera on the target. We will describe the moving and zooming criteria with their corresponding camera functions in the following.

3.4.1 Moving criteria

As a first criterion, a moving command is sent to the camera if it is not moving or zooming. This is to avoid mixing two different commands that is sent consecutively to network. Camera performs commands sequentially and not in parallel. In addition, each camera zooming or moving corresponds to a delay because a mechanical task needs time to be done. If we want to send commands consecutively without considering these time periods, the probability that we lose the target is high. Thus, communication with camera is controlled by a mutex. The word mutex comes from MUTual Exclusion, which assigned to an object, arranges MUTual EXclusion between threads. A mutex is applied between threads to make sure that only one of the threads is simultaneously permitted to execute a specific application code at a time. Therefore, a moving or a zooming command will be sent to the camera if the mutex shows that the previous task of the thread has been finished and now it is ready for another command.

As discussed before, sometimes we have good tracking and sometimes not. Thus, the moving on the target is done if the target score is at least  percent similar to the previous target scores. The tracking is fine if the sample score remains the same. To evaluate the similarity of samples, the mean of the sample scores are calculated and compared with the current sample score. Therefore, in each frame we are calculating the average

 of all target sample scores that are available from the first frame up to the current frame. After approbation of the criterion, a

moving command is sent if the score of target sample, sf that is obtained by APF-OFS, is higher

than η% of the average score, f

 . The value of  is obtained experimentally and is explained in the results part (e.g. no zooming, = 85).

f s f s  if _(3-26)

3.4.2 Zooming criteria

Similarly, zooming should be done under certain zooming criteria to prevent losing the target. Zooming is done when the program is sure about tracking results. Like moving criterion, a zooming command is sent to the camera if it is not currently moving or zooming.

As explained earlier, to avoid losing the object from observed camera FOV, zooming should be done if the target is well tracked. Therefore, the zooming is done under a more precise constraint on target score similarity. The same as for moving, in each frame, the average

f s  and variance f s

 of all target sample scores from first frame up to current frame are calculated. Then, a zoom in command is sent if the following constraint is respected:

f f s

s f

s   _(3-27)

Also the zoom out command is sent if this constraint is true: f

f s

s f

s   _(3-28)

The target size is changed adaptively to the zooming parameter. For example if we do zooming two times, the target size is enlarged two times (e.g. h_it andw_it will be

2 and

2h_it w_it respectively). Also zooming has effect on the score value. For example for four times zooming  is 70% instead of 85% which is explained in the results part.

3.4.3 Stopping criteria

After doing zooming, moving and tracking it is required that the camera tracking is stopped somewhere. As discussed earlier the goal is to find the target face, track it, zoom on it, and take

the target face image. Therefore, some criteria should be applied to verify that if the camera correctly zooms on the target face or not. To do so, after camera zoom in, the Viola and Jones face detector (Viola & J. Jones, 2004) is applied on a search region on the top part of the target sample to verify if a face is found or not. If a face is found the target face image is captured and tracking process is stopped but if not the camera continues the tracking process until the stopping criteria is satisfied.

3.4.4 Camera functions

To control the camera, to move/zoom it, we have implemented three methods. The angles and zoom can be computed by the camera using the on-board AreaZoom function, or the camera can be controlled by computing the angles and zoom on a workstation and sending computed values to the camera using the RelativePanTiltZoom function. For zooming case, we have specifically a function which is called MultipleZoom that only does the zooming m times (e.g. 1X, 2X, 3X, etc) larger than the original target size. In the following only the best choices are explained in details.

3.4.4.1 RelativePanTiltZoom function

In this case, the angles required to move the camera are computed by the workstation. This way, computations are faster than using the on-board function. The target pixel coordinates (x,y) in the current frame is converted in pan-tilt values for centering the camera. The pan and tilt values are obtained according to the FOV of the lens and the range of pan and tilt angles for each camera (i.e. it is dependent on the camera model.). The angles and zoom values are sent to the camera by sending an HTTP POST request using the CGI scripts of the camera (Sony corporation, 2005, Online accessed 15-February-2010) and the RelativePanTiltZoom function. For this function, it is possible to specify the speed of motion. It is set to the maximum value.

3.4.4.2 MultipleZoom

MultipleZoom is used only for the zooming case. It will be used if we are interested to enlarge the target m times (e.g. 1X, 2X, 3X, etc.) larger than the original target size. m is the input of this function which is send by an HTTP POST request using the CGI scripts to the camera.

In document UNIVERSITÉ DE MONTRÉAL LOW AND VARIABLE FRAME RATE FACE TRACKING USING AN IP PTZ CAMERA PARISA DARVISH ZADEH VARCHEIE (Page 87-94)