Evaluation on computational complexity - A parallel windowing approach to the Hough transform f

3.4.1 Parallel implementation of the MLFRFT

As noted in Section. 3.2.3, computing each of the fractional layers of the MLFRFT is an independent process and can be carried out in parallel with the other layers using multi-core CPUs. Although parallel computation of the MLFRFT is theoretically feasible and has been highlighted as one of the notable strengths of this method in both [109] and [2], empirical studies con- ducted in this research show that the parallel implementation could increase the computational time significantly.

Given the pseudo-code for the MLFRFT in Algorithm. 2, operations inside the for loop can be carried out in parallel using a multi-core computer system. Depending on the number of available cores, the client splits the task within the loop and allocates a number of parallel workers to take over the iterations. Each iteration is carried out by one worker that receives the required data from the client. Here the number of iterations depends on the number of fractional layers, i.e. L, that is usually 3 or 4. Using MAT- LAB’s Parallel Computing Toolbox and running the code for a (512_{× 512)} greyscale image on a CPU with two physical cores, a four-layer MLFRFT requires 0.194 seconds. This is when the serial implementation for the similar configuration requires 0.051 seconds. The difference is due to the following two reasons: 1) having a small number of iterations inside the for loop, e.g. four iterations in this case; 2) the code executed within the for loop is not

computationally heavy enough to overcome the communication overhead between the client and the workers. We should keep in mind that data transfer from the client to the workers normally has a high computational cost. To observe a real difference between parallel and serial computing, either the number of iterations or the complexity of the task inside the loop has to be increased. In our case, the complexity of the code cannot be increased as it is a Fourier transform operation. Only by significantly increasing L, e.g. L > 100, can we start observing improvements of parallel computing over serial computing. However, it is unnecessary for the MLFRFT to have more than 100 layers of FRFT.

Algorithm 2 Pseudo-code for the MLFRFT.

Require: : I(x, y) /* An input image

Require: : γ = [γ₁, ..., γ_L] /* L = maximum number of layers

1: for all i , i = 1_{→ L do} /* given γ

2: Fγi_(k 1, k2)←P N 2−1 x=−N 2 PN 2−1 y=−N 2 f (x, y) e−j2π(xγik1+yγik2)/N 3: Pi ← Fγi 4: end for

5: P _←SL_i=1Pi /* MLFRFT frequency grid

It is easy to observe the communication overhead between a client and the workers in MATLAB by running a simple experiment and comparing the computational time in serial and parallel mode. One of the most fundamental operations for a compiler is adding numbers in a specified range. Suppose we want to run an experiment of adding all of the natural numbers from a range x1 to xn. For instance, running this experiment for x1 = 1 to xn= 1, 000, 000

requires 0.017 seconds in serial mode and 0.67 seconds in parallel mode with two workers. It can be seen that for such simple operations, there is a considerable communication cost between the client and the workers. This is a measure that denotes how complicated the code inside the loop has to

be in order to take advantage of the parallel mode.

Based on the above discussion, parallel computing cannot be suggested for the MLFRFT in practical applications, due to the small number of required FRFT layers, which is typically three or four.

3.4.2 Computational time comparison

Empirical studies have shown that the MLFRFT-based HT outperforms the SHT in both detection accuracy and computational time. Evaluation on detection accuracy of the 3-layer MLFRFT was reported in [2]. Here the MLFRFT method has been tested against the SHT in terms of the processing time. A set of Twenty-five greyscale images with equal size of (512_{× 512)} has been used for this experiment. The computing configuration for the experiments is as follows: MATLAB 8 running on Intel Core i5 CPU, 2.4 GHz with 8 GB RAM, Microsoft Windows 7. A 4-layer MLFRFT was used rather than 3-layer, with γ1 = 0.5, γ2 = 0.7, γ3 = 0.9, and γ4 = 1, to achieve

higher accuracy in the sinogram. Table 3.1 shows the results.

It can be seen that the whole process of a typical 4-layer MLFRFT-based HT requires approximately 0.125 seconds, including 0.059 seconds to com- pute four fractional layers, 0.065 seconds for Cartesian to polar mapping plus conjugate mirror, and 0.0012 seconds for the 1D-DFT−1. The SHT, by comparison, requires approximately 0.373 seconds, including 0.31 seconds to detect the edge map based on a Canny edge detector plus 0.063 seconds for the HT voting process. The results indicate that the MLFRFT is approximately three times faster than the SHT. This to be added to the four times greater accuracy than the SHT due to the 4-layer MLFRFT. The main cause of difference in computing time is the compulsory edge detection required prior to the SHT that takes the majority of computation. In addition to imposing a significant computational cost to the system, edge detection may increase false detections. When the noise-level in an image is high, some of the true feature points in the image can be ignored as noise and also some

Table 3.1: Computational time comparison between the MLFRFT-based HT (with four layers) and the SHT. Twenty-five greyscale image of size (512_× 512) has been tested on MATLAB. The MLFRFT-based HT is approximately 3 times faster than the SHT. The time unit is in seconds.

false points may be viewed as true feature points. Thus, it can be concluded that a MLFRFT-based HT with a sufficient resolution is more efficient than the SHT.

Although the HT-based line detection is mostly reported in off-line image understanding problems, it has also been suggested in many real-time applications. Lane marker detection in driver assistance systems is one of the real-time applications of the HT. The MLFRFT-based HT has been ap- plied to benchmark road images [13] to test its performance in response to real-world challenges.

In document A parallel windowing approach to the Hough transform for line segment detection (Page 62-66)