3.4 The Harmony Filter
3.4.6 Computational complexity
The primary loop of the harmony filter is the improvise-and-update loop that is re- peated once every iteration. Only one evaluation of the objective function is done in each iteration so the time spent optimising is almost fully dependent on the number of iterations until convergence. The number of iterations needed for convergence cannot easily be determined since it is dependent on the state of the target and its surround- ing environment at that time. This also means that the time until convergence will be different for each frame and that the tracker’s frame rate will not be constant.
An example of this inconsistent convergence speed is shown in Figure 3.10. It shows the number of iterations needed for convergence at every frame in a challenging tracking example that will be investigated further in Section 3.4.8. Notice that the tracker converges quickly at the beginning of the sequence but later frequently only converges after 500 generations which was set to be the maximum number allowed in this example. This is due to the target starting in an area where it is relatively easy to track and then later moving away from the camera to an area where it is often occluded
and where there are more visual distractors. Moving away from the camera also causes the target to become smaller which means a smaller histogram is created and less data is available to identify the target. As the conditions for accurate tracking become worse the number of iterations needed to converge increases.
Figure 3.10: The graph illustrates the inconsistent number of iterations required for convergence that is typical of the harmony filter. In many frames less than 100 iterations are required but under more challenging conditions the maximum number of 500 is reached.
It is therefore important that the criteria for convergence are chosen carefully so that the number of iterations until convergence is kept to a minimum. We know that three tests are used to detect convergence and this example illustrates all three. When the target is easily identified, as was seen in the first few frames, convergence is usually detected with the first test since all the candidates in the HM quickly converge to the same solution. This test often correctly detects convergence in less than 100 iterations as seen in Figure 3.10. However, when tracking becomes more difficult all the candi- dates will not converge to the same point and the first test will not be able to detect convergence. This is seen later in the example when the target starts moving away from the camera. If the target is occluded or has moved out of frame the tracker will not find any good solution and the second test will eventually detect this state and
3.4 The Harmony Filter
converge due to no progress being made for a prolonged number of iterations. This always leads to slower convergence since the algorithm will only decide that no further progress can be made after numerous idle iterations. In the example of Figure 3.10 the maximum number of idle iterations was set at 100 and we often see points in the graph where convergence took more than 300 iterations. These points are likely examples of frames in which the search was abandoned due to reaching the maximum number of idle iterations.
However, when multiple local distractors are present it may appear to the algorithm that it is slowly making progress by converging to multiple weaker local optima. This is the worst possible situation and it leads to the maximum number of iterations reached before the search is terminated. We see this happening a few times in the example of Figure 3.10 when 500 iterations are needed for convergence.
It is clear that accurately detecting convergence is the key to minimising the com- putational complexity of the harmony filter algorithm. However, when the tracker is lost and cannot find any good solutions it is better to continue on with the next frame in the hope that the situation improves rather than wastefully searching for a solution that does not exist.
Consider the example of Figure 3.11. It illustrates the evolution of the HM by plotting the weight of the best candidate for every iteration. First, notice that the initial best weight is less than 0.36 and the final weight is less than 0.48. These low weights indicate that the tracker is clearly lost and that even after 500 iterations of searching the target could still not be found. The target is most likely occluded and the best strategy would be to continue on to the next frame.
According to the graph the search should have been abandoned after about 120 iterations since no further progress to the best hypothesis was made. However, this is not as simple as it seems. Consider the more detailed view of the HM shown in Figure 3.12. It shows the pixel location of both the best and worst hypothesis in the HM as pixel coordinates. Notice that the smallest weight hypothesis changes frequently and is still changing after 450 iterations. This indicates that the HM was still being optimised even thought the final result, the best hypothesis, reached its final value after less than 100 generations.
Contrast this result with the example of figures 3.13 and 3.14. In this case the HM is initialised with good hypotheses indicated by the high initial weight. After 207
0 100 200 300 400 500 generat ions 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 w e ig h t
weight of best hypot hesis in t he HM
Figure 3.11: In this example the tracker is lost but continues to search until the maximum number of iterations is reached. This is not the desired behaviour and it would be better to abandon the search much earlier.
iterations the HM converges to a solution with a weight higher than 0.7 indicating that the target was most likely identified correctly. Notice also from Figure 3.14 that the final location of the best and worst hypotheses are less than 5 pixels away from each other indicating that it was the first convergence test that correctly detected convergence here.
Deciding when to stop the search process and continue on to the next frame is a trade-off between speed and accuracy. The harmony filter would benefit greatly from a better way of detecting occlusion and when there is no possibility of further improving the hypotheses in the HM. The overall frame rate of the tracker could be significantly improved if better convergence tests were provided but this is a topic of future research.