Ground plane constraint validation - Exploiting the ground plane constraint

5.5 Exploiting the ground plane constraint

5.5.3 Ground plane constraint validation

For our evaluation of the ground constraint, we use three datasets, and evaluate

multiple detectors on them. Figure 5.12 visualises the accuracy of multiple

detectors on the Town Centre dataset using a Precision vs. Recall curve. We obtain a consistent improvement in accuracy by using the ground constraint. The same result holds for the ETH dataset and the Caltech dataset, as shown

in figure5.13and5.14respectively, using the constraints we discussed in section

5.5.2.

Note that the accuracy gain is obtained by the pruning of false detections that fall outside the cropped regions. Each point on an accuracy curve (Precision

vs. Recall or Miss rate vs. FPPI ) corresponds to the use of a certain threshold.

Note that the use of the ground plane constraint will have no effect on the

Recall or the Miss rate, such that when more detections need to be found, we

still have to lower the detection threshold, possibly at the cost of processing speed.

100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250 300 350 400 y−position (px) pedestrian height a (px)

Ground plane constraint on Town Centre dataset Height margin

Ground plane estimation

(a) Ground plane constraint

(b) Example frame

Figure 5.8: The ground plane estimation on the Town Centre dataset. The black line is our estimation function, while the other solid lines indicate the variance allowed on the scaling. The further to the right of the graph, the lower in the frames (higher y-position), and the larger the pedestrians.

EXPLOITING THE GROUND PLANE CONSTRAINT 101 150 200 250 300 350 400 450 0 50 100 150 200 250 300 350 400 y−position (px) pedestrian height a (px)

Ground plane constraint on Caltech

Horizon

Tilt margin Height margin Ground plane estimation

(a) Ground plane estimation

(b) Example frame

Figure 5.9: The ground plane estimation on the Caltech dataset. The black line is our estimation function, while the other solid lines indicate the variance allowed on the scaling. The further to the right of the graph, the lower in the frames (higher y-position), and the larger the pedestrians.

200 250 300 350 400 450 0 50 100 150 200 250 300 350 400 450 500 y−position (px) pedestrian height a (px)

Ground plane constraint on ETH

Horizon

Tilt margin Height margin Ground plane estimation

(a) Ground plane estimation

(b) Example frame

Figure 5.10: The ground plane estimation on the ETH dataset. The black line is our estimation function, while the other solid lines indicate the variance allowed on the scaling. The further to the right of the graph, the lower in the frames (higher y-position), and the larger the pedestrians.

EXPLOITING THE GROUND PLANE CONSTRAINT 103

Figure 5.11: The region we crop to search on for a pedestrian size of 150px. The dashed lines indicate the boundaries on the ground plane, on top of this the expected object height has to be taken into account. An example detection is indicated with a bounding box.

We see that we obtain a considerable accuracy gain by using our ground plane constraint for most detectors, but it is not guaranteed. For the Deformable

Part model detector on the ETH dataset for example, the accuracy gain is quite

limited. This tells us that this detector has already most detections according to our estimated ground plane. The same holds for the expected processing

time of the detectors (which we discuss later using table 5.3 and table 5.4).

When most of the processing time is already spend according to the ground plane estimation, pruning parts of the search space will have less benefit for the processing speed.

In our experiments on the Caltech dataset, we added the results we obtained

using an ACF-model we trained on the Caltech training dataset. These

results where also published in our recent paper "On-board real-time tracking of

pedestrians on a UAV" [22]. As a first observation, we can see a large accuracy

gain by training on a dataset similar as the target dataset, a statement that is supported by literature [8, 94, 98].

As we described in chapter 2, the Aggregate Channel Features and SqrtICF

implementation make use of feature approximation to obtain a higher processing speed. Instead of calculating all the layers of the feature pyramid from the

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision

Accuracy comparison of ground plane constraint on Town Centre dataset

DPM−fullscale DPM−groundconstraint ICF−fullscale ICF−groundconstraint ACF−fullscale ACF−groundconstraint SqrtICF−fullscale SqrtICF−groundconstraint

Figure 5.12: Accuracy improvement when using the ground plane constraint on the Town Centre dataset.

10−3 10−2 10−1 100 101 .20 .30 .40 .50 .64 .80 1

false positives per image

miss rate

Accuracy comparison of ground constraint on ETH

57.65% Ours−ICF 54.17% Ours−ACF 53.54% Ours−SqrtICF 51.46% Ours−ICF−Ground 48.46% Ours−ACF−Ground 48.31% Ours−SqrtICF−Ground 42.15% Ours−DPM 41.75% Ours−DPM−Ground

Figure 5.13: Accuracy improvement when using the ground plane constraint on the ETH dataset.

EXPLOITING THE GROUND PLANE CONSTRAINT 105 10−3 ₁₀−2 ₁₀−1 ₁₀0 ₁₀1 .20 .30 .40 .50 .64 .80 1

false positives per image

miss rate

Accuracy comparison of ground constraint on Caltech−30

56.55% Ours−ICF 53.83% Ours−DPM 51.88% Ours−SqrtICF 51.26% Ours−ICF−Ground 51.04% Ours−ACF 46.91% Ours−DPM−Ground 46.41% Ours−SqrtICF−Ground−Approx 45.99% Ours−ACF−Ground−Approx 44.95% Ours−SqrtICF−Ground−NoApprox 43.74% Ours−ACF−Ground−NoApprox 43.49% Ours−ACF−Cal 37.32% Ours−ACF−Cal−Ground−Approx 34.64% Ours−ACF−Cal−Ground−NoApprox

Figure 5.14: Accuracy improvement when using the ground plane constraint on the Caltech dataset.

corresponding image data in the scale-space layer, only a fraction (originally one layer per octave) of the layers is calculated this way, while the others are approximated to improve speed. For our ground plane experiments, we make a distinction between using this approximation or not.

To integrate feature approximation in combination with our ground plane constraint, we cluster a number of scales. For each cluster we determine the surrounding region. For this region we can fall back to the original method to calculate one time the real features per octave, and approximate the others from this layer. For our experiments, we cluster five scales, such that 80% of the feature layers can be approximated. Note that the larger the clusters, the less effective the ground plane constraint becomes in pruning the search space, but the more approximation we can apply. When applying this technique for an application, a balance needs to be made between these options. At one side the clusters contain only one layer, which means that no approximation is used, while on the other end all scales belong to the same cluster such that we have fallen back on full scale evaluation with no search space pruning.

In table5.3and table5.4we compare the detection speed of our implementations

on the Town Centre dataset and the Caltech dataset respectively. As we can observe, the ground plane constraint brings a significant improvement in

evaluation speed considering the combined accuracy improvement. The choice of using approximation of features is dependent on the importance of speed versus accuracy. Note that the speed can be further improved by combining a tracking- by-detection framework with our ground plane constraint. This allows for each location, determined by a running tracker instance, to accurately determine the expected pedestrian’s size. In the tracking-by-detection framework we described in section5.4.2, the pedestrian’s size is a tracked variable, but this is not always as accurately determined as when using a ground plane estimation. The more accurate the scale can be determined, the less scales need to be evaluated for each track, and so the less computation time is consumed. This combination

is applied in the case study we describe in chapter6, which uses the Warping

Window approach we describe briefly in section5.6.

Technique Full scale Ground constraint Speed-up Ours-DPM 0.237 fps 0.778 fps 3.28×

Ours-ICF 0.414 fps 1.27 fps 3.07×

Ours-ACF 1.12 2.33 fps 2.08×

Ours-SqrtICF 1.17 fps 2.27 fps 1.94×

Table 5.3: Comparison of the detection speed on the Town centre dataset when using the ground plane constraint.

Technique Full scale Ground constraint Speed-up Ground constraint

+ approximation Speed-up Ours-DPM 0.32 fps 0.95 fps 2.97× / / Ours-ICF 0.68 fps 2.4 fps 3.53× / / Ours-ACF 6.9 fps 8.2 fps 1.19× 11.6 fps 1.68× Ours-ACF-Cal 8.1 fps 26.1 fps 3.22× 31.7 fps 3.91× Ours-SqrtICF 6.78 fps 8.4 fps 1.24× 12 fps 1.77×

Table 5.4: Comparison of the detection speed on the Caltech dataset when using the ground plane constraint.

In document Persoonsdetectie voor reële toepassingen (Page 127-134)