Chapter 6: Parallelism Implementation in FPGA-based ES··································
6.3 Methodologies of Processing in Parallel ············································
6.3.3 Expansion on Parallelism Decomposition ································
In this section, two ways of expanding parallelism decomposition will be discussed. One is broad expansion, to see a parallel frame in distributed systems with the operators that are independent computers connected with the particular networking infrastructure, no matter whether it is wireless or wired, local or internet. The other is extending down to the low level of the system, the hardware platform, to see how to handle the parallelism among the hardware units.
The co-processors usually applied in ESs and FPGA-based systems are also discussed.
6.3.3.1 Expansion of Parallelism Decomposition to Distributed Systems
Parallelism processing can exist in distributed systems. The distribution computation or processing is widely used in database systems (Jiang et al 2006, Kallman et al 2008,
Chapter 6 Parallelism Implementation in FPGA-based ES 127
Mohan et al 1986, and Thomson et al 2012). In these systems, each computer does the same tasks as others but may be located in a remote area. For example, in transaction processing systems, many clients submit requests for the database service. All the client computers do transactions in parallel. Each of the client computers executes its independent transaction separately. Each client machine can be a simple but integral computer system with its CPU, memory, and operating system.
On the other hand, at the database server end, each query from the clients can be translated and batched up into finer jobs to be transmitted to the database server to process. This is just like streaming the tasks from each client into the database server, which also adopts parallelism technologies.
Heterogeneous distributed systems can provide even more chance of parallelism at the system level. They can induce not only co-processor architecture but also networking of various types of computers.
6.3.3.2 Expansion of Parallelism Decomposition to Hardware Building
As mentioned above, at the high level, computation decomposition techniques are based on the data-dependent relationships between sub-problems. For parallelism problems in the hardware building, the operation-dependent relationships between sub-units are another important factor that can influence decomposition. This means that the result of the previous sub-unit can determine the start of the next sub-unit. For digital circuits, the result may be a signal transition from high voltage level to low voltage level, or versa, or a predefined number of system clocks, rather than a meaningful numeral value.
In general, a parallelism hardware platform can be used in one type of problem perfectly but not in others. Sometimes, the effect of its speedup may not be obvious because the problems that are processed on the platform may not fit into the operator configuration of system. Therefore, for a given parallelism hardware platform, applications at the high level have to think about how to make the best use of the parallelism resources in system.
On the other hand, hardware builders usually face the problems on how to make the existent hardware units work well together with the parallelism technologies in a flexible way, rather than in a particular fixed model. The problems to be solved are more detailed, physical and practical.
In the low level of the system, parallelism decomposition needs to be done physically and exactly for each bit, each signal, and each system clock, rather than logically and roughly on a group of data in an accepted period of time. It is a detailed, fine, and accurate piece of work.
6.3.3.3 Co-processor’s Role in Parallelism Hardware Building
Chapter 6 Parallelism Implementation in FPGA-based ES 128
processors while the co-processors handle tasks that require a long execution time (El-Ghazawi et al 2008). Co-processors have designated hardware implementations, which can be fine-grained architectures: for instance, SIMD, engines, pipelines, or others. The system can invoke the co-processors to execute the specific tasks.
With ESs and FPGA technologies, it is to be expected that multiply processors exist in a system, and each of them has a designated assignment in the system in terms of assisting a host processor to fulfil a number of functions. Some processors may act as apparent or unobvious co-processors and do their specific tasks automatically without much intervention of the host processor. When these processors undertake their tasks, the host processor does not necessarily stop its own task to control and communicate with them.
The host processor may transmit a small set of instructions to a co-processor to launch it and then do its own main tasks. The co-processor starts doing its pre-designed assignment automatically. For example, there are co-processors that are designed for Fast Fourier Transforms (FFTs), two dimensional Discrete Cosine Transforms (2D DCTs), convolution filters, MPEG-4 main profile visual compositing, image processing, or image registration (Berekovic et al 2000, Dubois and Mattavelli 2003, Huang et al 2009, Kalomiros and Lygouras 2007, and MacLean 2005). These co-processors can share the pre-designated memory space on SRAM or DDR SDRAM banks with the host processor. The co-processors may send their processing results to the memory, and the host processor can access them if necessary. Alternatively, the host processor can put the data in the memory, and a co-processor can use them as input data.
Since co-processors do their tasks without interrupting the host processor’s task, the co-processors do their tasks in parallel along with the host processor. Even though they do not share the same large task with the host processor, they have procedures totally different from those of the host processor, and their structures of hardware units are different from those of the host processor as well. Co-processors belong to one of the heterogeneous parallelism architectures.
In this research, the graphics hardware sub-system works independently of the Nios II, the core processor of the FPGA-based ES. It acts as a co-processor in the ES.