5. Hardware platform
5.2. Virtex-II Pro FPGA
The FPGA on the XUPV2P board is a Virtex-II Pro (xc2vp30ff896-7), see [123]. It contains as central building block 13,696 slices, which are important building blocks for the reconfigurable logic. Furthermore, the FPGA features on-chip memory arranged as block RAM, dedicated word multipliers, 3-state buffers, Digital Clock Managers (DCM) easily allowing different clock domains, and fast IO-ports called RocketIO with a speed of up to 3,125 GBit/s.
Not all of these basic components present in the FPGA were utilized in the prototype implementation. Therefore, only the required components are described in more detail below. On a higher abstraction level the basic components are utilized to form the elements of a SoC architecture, which are explained subsequently.
5.2. Virtex-II Pro FPGA
Figure 5.1.: XUP Virtex-II Pro Development System (from [122])
5.2.1. Basic components
The basic components are those components, which are native to the FPGA. Phrased differently, they are available independently from the concrete application. Here, only those components are explained, which are utilized in the prototype implementation.
Slice This is perhaps the most important building block of the reconfigurable logic. Each
slice consists mainly of two function generators realized as Look-up-Tables (LUT) and two storage elements, which may be used in different ways. Furthermore, it contains several 2:1-multiplexers (MUX) and special arithmetical logic.
Besides their basic application realizing Boolean functions, the LUTs may also be used as distributed RAM and shift registers. If configured as distributed memory, 16 bit of storage with a data bus width of 1 bit can be stored in 1 or 2 LUTs, respectively, depending on whether the memory should be single- or dual-ported. However, note that distributed memory does not allow two truly independent ports, as only one of them may be used for write access. Similarly, each LUT may be configured as shift register with a width of 16 bit.
Chapter 5. Hardware platform
respectively, with a size of 1 bit. The multiplexers may be used to interconnect the LUTs into larger function generators or to form larger multiplexers. The arithmetical logic allows the realization of a 2-bit full-adder in one slice3. As each
slice also contains special ports for the creation of fast carry-chains, several slices may be configured to compose fast and wide adders.
Finally, the reconfigurable building block on the next abstraction level is the con- figurable logic block (CLB). Each CLB consists of four slices together with two 3-state buffers, see below.
3-state buffer For the realization of on-chip buses, the FPGA contains 3-state buffers
with a width of 1 bit. Each 3-state buffer may be controlled independently, because each features its own 3-state control port and its own input port. As described above, a CLB contains two 3-state buffers together with four slices.
Block RAM The Virtex-II Pro features 2,448 Kbit of block RAM (BRAM) arranged in
136 blocks each with a size of 18 Kbit. Each of this blocks may be accessed via two truly independent ports. The access ports can be configured to exhibit different data word sizes ranging from 1 to 36 bit.
Dedicated multiplier The FPGA features 136 dedicated word multipliers. Each mul-
tiplier has two inputs with a width of 18 bit each and an output with a width of 36 bit. Thus, it may compute the product of two signed 18-bit numbers or of two unsigned numbers with a width of up to 17 bit each. If both inputs and the output are connected to registers, the multipliers can be used in a pipelined version, computing one multiplication in each clock cycle.
PowerPC core The Virtex-II Pro FPGA contains two embedded PowerPC PPC405
cores. Especially for SoC realizations, this is very advantageous, as it provides a fast general purpose processor, which does not take up resources of the reconfig- urable components.
Because of the dedicated word multipliers and the BRAM, which also allows memory access on a word-basis, the prototype implementation is realized on a word-basis. This allows the calculation of numbers with several bits in one clock cycle. Furthermore, the arithmetical logic in the slices together with the carry-chains also allows the creation of word adders.
5.2.2. Derived SoC architecture
The basic components of the FPGA may be used to implement nearly every functionality. For applications, where the FPGA plays the role of a fast computational unit mainly executing data flow operations, e.g., utilized as digital signal processor (DSP), it is
3Note that the arithmetical logic also contains elements to improve the efficiency of multiplier imple-
mentations. This is, however, not used in this work, because the dedicated multipliers are utilized instead.
5.2. Virtex-II Pro FPGA
possible to design the circuits in VHDL and to use the conventional design flow. But as mentioned above, today’s FPGA are large enough to realize complete (MP)SoC. For the design of such complex systems more powerful design tools with components on higher abstraction levels are needed.
Xilinx also provides such design tools, as described in Section 5.3. With them, it is possible to instantiate an architecture consisting of general purpose processors, their instruction and data storage, and additional pre-configured IP-cores like buses and in- terfaces. It is even possible to integrate custom hardware cores into this architecture. In the following, the two available processor types and the available bus types are described shortly. For more details and information about other available IP-cores, consult the documentation available on [121].
PowerPC
The Virtex-II Pro FPGA features two PowerPC PPC405 cores. These are powerful general purpose processors also suitable for complex application requiring a full-blown operating system. The PowerPC cores are a 64 bit architecture with a 32 bit subset and may be clocked with up to 300 MHz.
MicroBlaze
The second type of general purpose processor available for the design of SoC is a soft-core processor type called MicroBlaze. It is build up from the reconfigurable elements of the FPGA and may be instantiated in customized variants providing different amounts of computing power, e.g., with or without dedicated word multiplication. The MicroBlaze processors support 32 bit operations and may be clocked with up to 100 MHz.
Buses
The Xilinx design environment also provides several bus types allowing the general purpose processors to communicate with the memory, other cores, and/or each other. These are explained in the following.
On-Chip Memory Bus The On-Chip Memory Bus (OCM) may be utilized by a Pow-
erPC core to communicate with its instruction or data memory. Each PowerPC core possesses one port for the Data-Side On-Chip Memory Bus (DSOCM) with a width of 32 bit and one port for the Instruction-Side On-Chip Memory Bus (ISOCM) with a width of 64 bit.
Processor Local Bus The Processor Local Bus (PLB) is a general bus for the PowerPC
cores. Each PowerPC possesses one port to connect to a PLB, which may be used to communicate with other IP-cores or memory attached to the PLB. The PLB has a widths of 64 bit, which, however, is only fully exploited, if the bus is cached. To access components attached to the PLB, the PowerPC uses memory-mapped I/O.
Chapter 5. Hardware platform
Local Memory Bus The Local Memory Bus (LMB) may be utilized by a MicroBlaze
core to communicate with its instruction or data memory. Each MicroBlaze core possesses one port for the Instruction Local Memory Bus (ILMB) and one port for the Data Local Memory Bus (DLMB) with a width of 32 bit each allowing to access instruction and data storage, respectively.
Open Processor Bus The Open Processor Bus (OPB) is a general bus for the Micro-
Blaze cores. Each MicroBlaze core possesses one port to connect to an OPB and, thus, to communicate with other IP-cores or memory attached to the OPB. The OPB has a width of 32 bit. The access of components by the MicroBlaze is realized with memory-mapped I/O. Communication between cores attached to a PLB and cores connected to an OPB is possible using a PLB-OPB-bridge.
Fast Simplex Link The Fast Simplex Link (FSL) is a relative simple bus for communica-
tion between a MicroBlaze core and a custom IP-core. Each MicroBlaze possesses up to 8 FSL ports. This is a powerful tool to realize instruction code extensions. The communication is port-mapped and there are specific processor instructions for the access of the FSL. Those instructions allow the MicroBlaze to write and read 32 bit integers to and from an attached core. The communication may be synchronous or asynchronous, facilitated by a FIFO-queue.