2.4 Summary
This chapter has described the basic concept of vision-based robot tracking systems and the related work in detail. In particular, it showed that most of the established computing systems for vision-based robot tracking focus on developing the software architecture and algorithm implementation on a general purpose processor (CPU) rather than investigating alternative hardware accelerators. Therefore, this chapter has shown the need for a computing system that uses the benefits of the CPU and hardware accelerators (e.g., FPGA and GPU) to enhance the computing performance of a vision-based multi-robot tracking algorithm. Additionally, this chapter has discussed the architectures of the multi-core CPU, GPU, and FPGA to give a complete overview of their costs and benefits. The qualitative comparison between them is summarized in Table 2.1. To obtain an appropriate comparison, similar technology processes were taken into account as important parameters during the comparison.
Table 2.1: Qualitative comparison between CPU, GPU and FPGA, based on[15; 27;
106].
CPU GPU FPGA
Parallelism Limited by the num-ber of cores
Table 2.1 shows that each device technology has different advantages and disadvan-tages in term of its parallelism, power efficiency, interfaces, and development time. The technology selection depends on the architectural design considerations and application requirements[15]. Therefore, the next chapter shows how heterogeneous computing
systems are used in vision-based multi-robot tracking applications. In heterogeneous computing systems, different types of processors or hardware accelerators cooperate to accelerate the computation tasks. The discussions in chapter 3 and the rest of this thesis will focus on the implementations of two distinct heterogeneous computing systems for vision-based multi-robot tracking applications, encompassing the use of the FPGA and GPU as hardware accelerators for a CPU.
3 Vision-based Multi-Robot Tracking with Heterogeneous Computing Systems
As presented in chapter 1, the development of a vision-based multi-robot tracking system using multiple cameras as the video input source began with the objective to deal with a larger robot arena and support various applications. Additionally, the computational intensity of vision processing algorithms increases with the number of tracked robots, video frame size, and number of used cameras. This chapter delineates FPGA-CPU and GPU-CPU heterogeneous (hybrid) computing systems as alternative approaches for vision-based multi-robot tracking applications. It includes a comprehensive elaboration on heterogeneous computing systems, both the FPGA-CPU and GPU-CPU architectures, and vision-based multi-robot tracking algorithms.
3.1 Heterogeneous Computing System
The mainstream computing system for a high-performance computer platform has been rapidly evolving to combine more than one type of processor or hardware accelerator over the last decade, changing from homogeneous into heterogeneous systems. This development means a dynamic breakthrough in the field of high-performance com-puting systems. For instance, conventional homogeneous comcom-puting uses only one or more processors with the same architecture; heterogeneous computing alternately combines the benefits from the different types of processors or hardware accelerators to enhance the computation tasks.
Heterogeneous computing systems have been widely used in many highly compu-tationally intensive applications and successfully provide significant improvements compared to conventional computing systems. At this point, the FPGA and GPU are the most prevalent hardware accelerators for heterogeneous computing systems. An FPGA offers customized design features and a parallel structure, while a GPU provides massively parallel processing using its thousands of cores. As a result, there have
been various implementations of FPGA-CPU heterogeneous computing systems, such as for big data applications[29; 108; 121], neural networks [69; 94], and image processing applications. These image processing applications include an avionic test application[2], a face detection algorithm [81], an optical flow algorithm [26], and medical image processing for ultrasound computer tomography[19; 20].
GPU-CPU heterogeneous computing systems are more adaptable and can be imple-mented in various applications. This is different from FPGA-based systems. One of the reason is because the development time in a GPU is relatively much faster than in an FPGA. Accordingly, many applications have been implemented on GPU-CPU heterogeneous computing systems, including data processing[13; 71], numerical method[3; 82; 109], chemistry [16; 49; 76], bioinformatics [25], electromagnet-ics[38], physics [51; 56], and image processing applications. These image processing applications include biomedical imaging[68; 73; 96; 110], face detection [47; 88], and optical flow algorithm[89].
The FPGA and GPU hardware accelerators possess their own unique features and advantages. To some extent, experts and researchers have been encouraged to investi-gate these factors, particularly to identify their benefits and drawbacks. Consequently, some prominent studies on various applications and algorithms have been conducted by experts and researchers, including a heterogeneous computing platform[27; 113], medical imaging[12; 18; 21], and pedestrians detection [22].
From the perspective of heterogeneous computing systems, the CPU and hardware accelerator (e.g., FPGA or GPU) work together to improve the computing performance.
As illustrated in Figure 3.1, the hardware accelerator enhances the computing per-formance of the system by executing massive parallel processing or computationally intensive tasks. Meanwhile, the CPU has an essential task within the application, espe-cially in executing complicated operations on a single or a few streams of data. A CPU also easily accommodates integration between the computing system and operating system (OS) and provides I/O port access to a sensor or device (e.g., camera, display).
3.1 Heterogeneous Computing System
Computationally intensive tasks Hardware accelerators
(FPGA / GPU)
CPU
Sequential tasks Application algorithm
Figure 3.1: Tasks partitioned in heterogeneous computing system.
At the architectural level, there are two types of architectures, both of which are based on the integration between the CPU and hardware accelerator[80]. They are defined as discrete and integrated heterogeneous systems. The former (discrete) consists of a multi-core CPU and hardware accelerator, both connected to a high-speed bus, and each of them has a distinct memory, as shown in Figure 3.2. In contrast, as illustrated in Figure 3.3, the latter integrates both a CPU and hardware accelerator in a single chip, and it shares the same memory between the CPU and hardware accelerator.
This integrated heterogeneous system is also known as a programmable system-on-a-chip (SoC). A low data transfer overhead is the main advantage of an integrated heterogeneous computing system[52; 100]. However, this system normally uses a
simpler CPU and hardware accelerator (GPU) architectural design. Hence, it generates a lower performance on computationally intensive tasks than a discrete system.
Hardware Accelerators
System Memory Memory
High-speed system bus CPU
Core-1 Core-2
Core-3 Core-4
Memory Controller Memory Controller
Figure 3.2: Discrete heterogeneous computing system architecture.
Hardware Accelerators System Memory
Memory Controller CPU
Core-1 Core-2
Core-3 Core-4
Figure 3.3: Integrated heterogeneous computing system or programmable SoC archi-tecture.
In discrete heterogeneous computing systems, designers can independently combine certain types of CPUs and GPUs to enhance the computing performance. Therefore, this work uses a discrete architecture for the implementation of vision-based multi-robot