FPGA Device Evolution and Applications ············································

Chapter 4: FPGA-based Embedded Hardware System for Graphics Applications ······

4.2 FPGA Device Evolution and Applications ············································

In recent years, the increasing density of chips has provided an opportunity for the development of complex high-performance ESs on FPGA devices. FPGAs have been considered an appropriate solution for many applications that are expected to give high performance at low cost.

4.2.1 FPGA Evolution History

In 1986, when the first commercial SRAM-based FPGA was developed by Xilinx Inc., the products of FPGA technology began to be put on the market (Awad 2009). Many manufacture companies were active in the FPGA field until the early 2000s, but after several acquisitions and mergers, only a few of them are left. Among the biggest ones are Altera, Actel, Lattice, Quicklogic and Xilinx. As the competition among companies is strong, these companies’ FPGA products cover a wide range for applications and functional architectures.

In the 1990s, FPGAs were small devices with low computational throughput, simple internal structure, and few components, and could not meet the needs of complex computation and functional applications (Constantinides and Nicolici 2011, and Qasim et al 2009). Along with the sustained progress of VLSI technology, FPGA devices have developed into ones that are composed of multi-million gates and diverse logic components. Thanks to the architectural innovations, the FPGA device density has been improved. Many hardware units specified for some operations, such as multipliers and embedded memory blocks, have been gradually integrated in FPGA devices. In the newest generation of FPGA devices, all the complex blocks, such as multipliers, microprocessors, embedded memory, and fast routing matrices, can be integrated in one silicon die.

4.2.2 FPGA Features and Reprogrammable Technologies

Chapter 4 FPGA-based Embedded Hardware System for Graphics Applications 56

technologies and each company has several series of FPGA devices. For example, Altera has Cyclone (low-cost), Arria (midrange), and Stratix (high-end) series. Thus, the range of FPGAs is wide and varied. They provide different solutions for their FPGA devices. But the basic idea is the same.

4.2.2.1 FPGA Features

An FPGA device consists of a matrix of configurable logic blocks, configurable input/output (I/O) banks and an interconnect network that is reprogrammable to connect logic blocks and I/O blocks according to a design target. The configurations of logic blocks and interconnection network are dependent on memory cells. By changing the contents of memory cells, the FPGA can be made to fulfill the required applications.

The configurable logic blocks can be used to make up combinatorial, sequential or mixed circuits. Each of them includes several logic elements (LEs) or logic cells. Each LE consists of a four-bit lookup table (LUT), which can be configured either as a combinatorial function, or a (16 X 1) RAM or ROM, as shown in Figure 4.1. A carry-lookahead data path is also included in order to build efficient arithmetic operators. A D-type flip-flop, with its control inputs of synchronous or asynchronous set/reset and enable, allows the output of the LE to be registered. When its registered output is configured as its input, a LE can function as a microstate machine.

Figure 4.1 LE’s Composition

The configurable I/O blocks can have different I/O elements. Each I/O block may contain a bidirectional I/O buffer, one input register, two output registers, and two output-enable registers. Each I/O element can be configured as an input, output, or bidirectional data path.

I/O pins support single-ended and differential I/O standards. Single-ended signalling uses only one signal line, and its voltage potential is referred to the ground. The signal line provides just the forward path and the ground offers the return path for the signal. The differential signalling uses two wires to send two complementary signals, which can improve resistance to electromagnetic noise compared with the single-ended signalling

D Q Input Output D-Type Flip-Flop Look-up Table (LUT) Carry Path Carry-out Carry-in

Input Data [0…3] _{Register Output}

Combinatorial Output

Clock

Chapter 4 FPGA-based Embedded Hardware System for Graphics Applications 57

but occupy one more pin.

The programmable interconnect network consists of switch matrixes and paths, as shown in Figure 4.2. An item of a switch matrix can be programmed to make a row path connect to a column path and make an output of one LE link to an input of another LE. A good layout of an interconnect network of a FPGA design can decrease area, delay and power consumption. Figure 4.3 shows a generic structure of an FPGA device.

Figure 4.2 Programmable Interconnect Network 4.2.2.2 FPGA Reprogrammable Technologies

Several configurable technologies exist, such as Flash, EPROM, SRAM and antifuse. The EPROM and SRAM technologies are just like the common memory uses of EPROM and SRAM in a microprocessor system.

Figure 4.3 Generic Structure of an FPGA

SRAM is by far the most widespread in the FPGA field. SRAM-based FPGA stores LE configuration data in its static memory. Since SRAM is volatile and cannot keep data without a power source, an SRAM-based FPGA reads configuration data from an external Flash memory chip, which is called master mode. It can also be configured by an external processor via a boundary-scan (JTAG, joint test action group) interface, which is called slave mode. LE I/O Block Switch Matrix Conduct Path LE Conduct Path Switch Matrix

Chapter 4 FPGA-based Embedded Hardware System for Graphics Applications 58

A Flash-based FPGA uses a flash memory as a primary resource for configuration storage. The advantages of Flash-based FPGAs are less power consumption and greater tolerance to radiation effects. When the power is off, the flash memory can preserve the configuration of FPGA. Flash-based FPGAs fit into applications of space and aircraft industries.

An antifuse-based FPGA adopts antifuse technology. An antifuse does not conduct current initially, but can be fused to conduct current. Once an anitfuse-based FPGA is programmed, the process cannot be reversed. Compared with the previous two technologies, which can program FPGAs several times, an antifuse-based FPGA can only be programmed once.

The above manufacturers share the FPGA product market with their different technologies for reprogramming. Altera, Lattice, and Xilinx tend to use the SRAM-based FPGAs; Altera, Latice, and Xilinx also use the Flash-based FPGAs; Actel and Quicklogic adopt the antifuse-based technology.

4.2.3 Architecture Diversity in FPGA-based Systems

The following are some of the current architectures of FPGA-based systems. 4.2.3.1 Stand-alone FPGA-based Systems

Stand-alone FPGA-based systems can be ESs, which are often employed in consumer electronics, industry control systems, and portable applications. They have a typical ES structure, which is described in Chapter 3. After the system power is switched on, they can function well by themselves or interact with their environments, including the user interaction. With a real-time operating system, they can also handle several tasks concurrently. These systems are complete and independent.

4.2.3.2 General-Purpose Computer Systems with FPGA Supports

General-purpose computer systems with FPGA supports have the tightly-connected or loosely-coupled co-processor architectures.

In the co-processor architecture, the general-purpose computer is the host CPU whereas the FPGAs can have their own processors that assist to do specific tasks without the host’s intervention. The tightly-connected co-processor model has a board connection between the host and FPGAs. It has a fast communication rate between the host and co-processors. The loosely coupled co-processor model allows the direct communication between the host and FPGAs by using some fast interconnection, such as point to point networks.

Since co-processors in FPGAs work simultaneously with the host CPU, this architecture can provide more parallelism than general-purpose computers. It can also lead to heavy

Chapter 4 FPGA-based Embedded Hardware System for Graphics Applications 59

traffic on the serial bus when the data are transmitted between co-processors and the CPU. As long as the serial bus is not overloaded, this architecture will be more effective computationally than the model of CPU with simple I/O peripherals.

4.2.3.3 Reconfigurable Computing Systems

Reconfigurable computing (RC) systems have become more and more attractive in these last two decades (El-Ghazawi et al 2008, Green and Edwards 2000, and Huang et al 2009). Reconfigurable computing architecture can provide parallel processing at instruction or task levels. A central microprocessor is connected to several FPGA-based boards. The architecture allows scalable connections between parallel systems on FPGA-based boards dynamically. They offer more flexibility in terms of system layout, but they need more complicated design techniques to generate a good target design and implementation.

In document A novel parallel algorithm for surface editing and its FPGA implementation (Page 80-84)