Particle exchange - Multicore Parallel PF

4.3 Multicore Parallel PF

4.3.2 Particle exchange

The particle exchange step is very similar to the one introduced in [27], with the exception that here, full advantage is being taken of theStarburst MPSoC. First, before exchanging of particles can begin, produced particles are sorted in a descending order according to their weights, such that

wi≥wj ; 1≤i < j≤Nlocal.

The idea is to separate “weak” particles from “strong” particles, with higher likelihood of the state. The top D “strong” particles are then sent through a dedicated software FIFO buffer to A forward neighbors, as was explained earlier. Once a PF task finishes sending the particles (which usually costs a small time overhead), it goes on to receive a set of exchanged particles from all of its previous neighbors. The received set is “merged” with the local particle set into a new one, which is used for resampling.

Assuming that the particles are received fromAneighboring tasks, there are several exchange strategies that a local PF algorithm can employ when merging the local set with the exchanged set:

1. Append the particles to the local set, such thatNnew=Nlocal+A·D. 2. ReplaceA·Damount of particles from the local population with a newly

arrived batch.1

3. Do both, by introducing an overlap ofDoverlapparticles, such thatNnew= Nlocal+A·D−Doverlap.

All of these strategies are graphically illustrated in Fig. 4.17. In the latter two strategies, one may wish to replace only the “weakest” particles, which is another reason to introduce the sort in the beginning of the exchange step.

Since only the topD particles are exchanged, one may not need to sort the whole particle set completely. Here, a partial sorting algorithm can be used to extract the firstD high weighted particles directly, such as partial heap sort or partial Bitonic sort.

1_{Note that}_A_·_B_≤_N

Figure 4.18: Particle exchanging by passing particles around the ring topology. The purple boxes represent the topD particles of taskτi, i= 1, ..., P, while the green boxes – exchanged particles from a neighbor.

Algorithm 8Algorithm for particle exchange function

function{x∗_k,j, w∗_k,j}j=1,...,Nnew = xchg({xk,i, wk,i}i=1,...,Nlocal, D, A)

Input : Local particle set of the current iteration, the amount of particles to exchange and the amount of neighbors.

Output: New set of particles

1: Partially sort local particles: {xk,i, wk,i}= psort({xk,i, wk,i}, D) 2: fori= 1 :Nlocal−Doverlap do

3: Copy particles: {x∗_k,i, w_k,i∗ }={xk,i, wk,i} 4: end for

5: Letp∈_N<Ptot be the processor this task is executed on

6: Letq=p−1 when p >1, elseP, be a previous neighbor processor 7: Send top Dparticles: write fifo({xk,i, wk,i}i=1,...,D, mod(p+ 1, P))

8: Start from overlap location: a=Nlocal−Doverlap+ 1 9: fori= 1 :A−1do

10: Read particles from previous neighbor:

{x∗_k,j, w∗_k,j}j=a+(i−1)∗D,...,a+i∗D= read fifo(q) 11: Forward particles to closest neighbor:

write fifo({x∗_k,j, w∗_k,j}j=a+(i−1)∗D,...,a+i∗D, mod(p+ 1, P))

12: end for

13: Read final set of particles from previous neighbor:

{x∗

From Fig. 4.16, one may notice that only one FIFO data channel from a PF task τi, i = 1, ..., P, is formed only to its closest neighbor1, which con- tradicts the intuitive notion that a task is connected to all of its “exchange” neighbors. One of the reasons for this arrangement, is that a processor may run out of FIFO memory when allocating space for the channels. The other reason is the additional time overhead introduced, due to the larger distance to further neighbors. Thus, during particle exchange, a PF task not only sends its own best particle set to the closest neighbor, but also sends the received sets from its previous neighbor(s). In essence, each task “propagates” sets of exchanged particles around through its closest neighbor, until each task receives allAbatches ofD particles. This scheme is illustrated in Fig. 4.18, where arrows represent particle exchanges. It is assumed that an appending exchange strategy is used, although any other strategy is possible. Each of the exchanges are preformed by its respective source taskτi, i= 1, ..., P, in a top-to-down sequence, through a single FIFO channel.

The the operation of the exchange algorithm is thus described in 8. This description takes into account any exchange strategy, by setting the Doverlap parameter accordingly. For example, forDoverlap= 0, an appending strategy is used.

1_{An exception to this is task}_τ

Chapter 5

Analysis and experimental

results

So far, the internals of the HOG detector and PPF implementations have been described and discussed in detail. This chapter will focus on their analysis and evaluation of the two algorithms. In particular, the functional and temporal behavior of both implementations will be evaluated. Specific algorithm performance criteria like detection miss rate, hardware resource usage and tracking accuracy are also discussed.

5.1 HOG-SVM detector evaluation

The HOG-SVM detector is difficult to analyze in hardware onStarburst. Since the detection performance of the HOG-SVM algorithm has already been ex- tensively studied and documented in various works, such as [1, 3, 8, 11, 4], this section will focus on findings related to the previously described hardware implementation, such as hardware resource usage and simulation. Therefore, the results presented here don’t focus on the detection accuracy, but rather on more specific features of the hardware implementation. Yet, detection results, computed from the simulation of the hardware implementation are presented, demonstrating its operation.

In document Implementation and Analysis of Real time Object Tracking on the Starburst MPSoC (Page 73-77)