Visual-Tracking Results - Experiment Results

5.3 Experiment Results

5.3.1 Visual-Tracking Results

In the image-domain, the ground-truth visual-tracking trajectory was established using human-annotation of the image sequence. An alternative approach would be to project the Qualisys 3D world trajectory onto the individual camera frames via the estimated camera matrices, detailed in Section 5.2.2. However, in doing so the uncertainty in the camera matrix estimation would enter the image-domain benchmark.

The image-domain tracking results for the proposed visual-tracker (Fig. 5.2) are per-formed using the IMM-variant with the integrated MeanShift algorithm applied in HSV color-space [103].

Since, the ground-truth (i.e. Qualisys system ) is not synchronized with the PGR XB3, the visual-tracking trajectories needed to be aligned with the ground-truth results in post-processing in order to benchmark the algorithms. The performance of the three visual-trackers tested is shown in Table 5.1. The column labeled Delay refers to the optimal delay (i.e. delay giving the smallest error with ground-truth) in terms of periods of the camera sampling rate between the estimated visual-tracking trajectory and ground-truth trajectory.

The delay is computed independently for each visual-tracker.

Visual-Tracker

Table 5.1: RMSE performance of the visual-trackers on the real-world image sequence.

The image-domain tracking results for the proposed tracker, STRUCK [63], and TLD [78], are shown in Fig. 5.2, 5.3, and 5.4, respectively. The tracker results are shown using the red curve alongside the ground-truth human-annotated trajectory in blue. The proposed tracker performs the best out of the trackers tested in terms of image-domain RMSE, as defined in ( 3.37). The tracking results are able to handle the fast motion segments of the sequence as well as out-of-plane rotations during the start of the stationary period at index 400 due to the homogeneous make-up of the target. STRUCK has strong results during the majority of the sequence, with minor lagging during the fast motion segments of the sequence. The lagging corresponds to the tracking window moving parallel to the target motion, albeit slightly behind the centroid. The STRUCK tracker is applied with the de-fault settings, notably a search-radius of 30 pixels and SVM support-vector budget of 100.

The search-radius parameter dictates the extent of the search-space for translational trans-formations from the previous frame’s estimated image-domain position. Since, STRUCK uses a polar grid to explore the set of transformations, enlarging the search radius would increase computational time as well as the positional accuracy decreases with increasing radial distance. The online learner of STRUCK is able to handle the out-of-plane rotations to a better degree than TLD which locks onto a false positive target detection during the stationary period of the sequence. TLD is unable to find suﬃcient features to track over the target surface; hence, the results are slightly skewed to a sub-region of the target object.

The triangulation of the world-domain position from the visual-tracking results uses the camera matrices computed using the scheme outlined in Section 5.2.2. An array of 60 retro-reflective dots are positioned on two 3D calibration targets in view of each camera.

A cascade optimization approach is then performed to compute the camera matrices. The RMSE between the measured image positions of the dots and the projection of the Qualisys

0 100 200 300 400 500 600

Figure 5.2: Proposed Visual-Tracker Image-Domain Trajectory (in pixels): the proposed IMM-variant visual-tracker (red) is compared against the ground-truth trajectory estab-lished via human-annotation of video sequence.

0 100 200 300 400 500 600

Figure 5.3: STRUCK [63] Image-Domain Trajectory (in pixels): the state-of-the-art STRUCK visual-tracker (red) is compared against the ground-truth trajectory established via human-annotation of video sequence.

0 100 200 300 400 500 600

Figure 5.4: TLD [78] Image-Domain Trajectory (in pixels): the state-of-the-art TLD visual-tracker (red) is compared against the ground-truth trajectory established via human-annotation of video sequence.

3D position of each point using the estimated camera matrices, is calculated as 0.58570 and 0.71193pixels.

The world-domain tracking results are a direct by-product of the image-domain results, since the tracker operates independently on each view prior to triangulation. TLD’s world-domain tracking results are consistent with its diﬃculty on certain image-world-domain tracking segments. STRUCK and the proposed tracker perform similarly in the world-domain, no-tably the world-domain results in the y-z plane are much stronger than along the x-axis.

The y-z plane of the navigation reference frame is almost parallel to the camera image-plane; hence, the expected improvement in results. The x-axis is aligned along the depth direction (i.e. optical axis) of the camera which explains the magnified uncertainty in the world-domain tracking results. Qualitatively speaking the x-axis displays an offset between the Qualisys position and the triangulation of the visual-tracking results. The offset may be attributable to a number of factors. Foremost, the systematic error discussed in Sec-tion 3.6.1 would explain the possible bias since the visual-trackers are in effect tracking the center of the apparent contour of the object and not the centroid. The enclosing sphere of

0 100 200 300 400 500 600

Figure 5.5: Proposed Visual-Tracker World-Domain Trajectory (in m): the triangulation results of the IMM-variant of the proposed visual tracker (red) compared against the ground-truth trajectory established by Qualisys system (blue).

the Xsens IMU device has an approximate radius of 0.053 (m), which would account for a large degree of the error. The camera calibration would also serve as a source of error for the triangulation results. Any inaccuracies during the camera matrix computation or the lens distortion correction would be compounded by the triangulation process resulting in erroneous position estimates. A possible solution would be to incorporate a model-based tracking approach to get a better centroid estimate.

In document Visual-Inertial Sensor Fusion for Tracking in Ambulatory Environments (Page 146-150)