2.2 SRAM Based FPGAs
2.2.3 Zynq-7000 All Programmable SoC
Zynq-7000 All Programmable SoC [6] is a remarkable example of the newest de-vices based on the SoC concept. It integrates an ARM-based processor system with the reconfigurable fabric logic of the Xilinx 7 series FPGAs. Two Zynq models are available: the single-core Zynq-7000S and the dual-core Zynq-7000.
The main difference of the Zynq architecture over previous FPGAs with em-bedded hard processors (i.e. the PowerPC) is that, while previous devices were FPGA-centric models, Zynq is a processor-centric platform. Moreover, thanks to the huge number of traces (over 10.000) that connect the Processing Sys-tem (PS) to the FPGA fabric, it provides the device with a wide bandwidth connection between the Processing System and the Programmable Logic (PL).
Hence, it requires less infrastructure. Zynq also supports several features includ-ing two 12 bit analog-to-digital converters, three Phase-Locked Loops (PLLs), two JTAG debug ports, an integrated block for PCI express designs, low-power serial transceivers, etc. In this way, the Zynq SoC makes it possible to exploit the logic as an auxiliary resource that may be used to increase the performance of deployed applications. Zynq also offers different strategies to reduce the power consumption: shutting down the PL, dynamically reducing the clock speed in the PS or standby the ARM processor.
As shown in Figure 2.15 extracted from [6], Zynq is divided in two main regions (PS and PL) with separate power domains. Thanks to the power detachment of both regions, when required, the PL can be selectively power down. Since the PL uses the same FPGA technology from Xilinx 7 series devices, it includes all their previously presented logic resources (CLBs, BRAMs, DSP48E1, etc.).
On the other hand, the PS allows to operate stand-alone programs and operating systems, such as Linux, and manages both boot and configuration processes. The PS can be divided into four main parts:
• Application Processor Unit (APU). It consists of a single or dual-core ARM Cortex-A9 MPCore that boots immediately at power-up and can work with several operating systems without dependence of the pro-grammable logic. It can operate in single processor and asymmetric or symmetric dual processor modes, and supports single and double precision floating point operations.
• I/O Peripherals. They support several industry-standard interfaces for external communication, including distinct General Purpose Input/Ouputs (GPIOs), two Gigabit Ethernet Controllers, two SD/SDIO Controllers, USB Controllers, two SPI Controllers, two CAN Controllers, two UART Controllers, two I2C Controllers and PS MIO I/Os. Up to 54 GPIO signals
are available for device pins routed through the Multiplexed I/O (MIO) and 192 GPIO signals communicate the PS and the PL via the Extended MIO (EMIO).
• Memory interfaces. It includes static and dynamic memory interface con-trollers. While the static memory controller enables to work with NAND and Quad-SPI flash, parallel NOR flash, and parallel data bus interfaces, the dynamic memory controller supports DDR2, LPDDR2, DDR3 and DDR3L memories.
• Interconnection elements. These elements provide communication be-tween the PL, the APU, the memory interface and I/O peripherals via a non-blocking multilayered ARM AMBA AXI interconnection, which sup-ports simultaneous master-slave operations.
Figure 2.15: Diagram of the functional blocks that constitute the Zynq-7000.
Zynq Reset
The system reset process is a sequence that initializes the system and executes the First Stage Boot Loader (FSBL) from the selected boot memory. This process provides the user with the possibility to customize the PS and PL. The Zynq supports several reset types whitin the PS. For instance, a peripheral reset that resets a subsystem controlled by software or a power-on reset which resets the complete system. These reset sources conform the system reset, which drives reset signals to each module and system. Various alternatives are available to initiate a reset process:
• A hardware reset driven by the system reset signal (PS SRST B) and the power-on reset signal (PS POR B)
• A a software reset able to generate both system-level or a sub-module reset.
• A reset generated by the JTAG controller, which can reset both the entire system or a debug portion of the PS.
• A reset generated by the three watchdog timers available.
Xilinx also provides the LogiCORE IP Processor System Reset Module that enables to reset the complete PS, including the processor and peripherals. This core allows to customize several parameters by enabling or disabling different features in order to adapt it to user’s specifications.
Zynq Boot
The boot of Zynq-7000 devices is a two-stage process managed by the PS. It enables to choose between a non-secure booting or a secure booting (JTAG dis-abled) that supports 256-bit AES, 256-bit SHA and 2048-bit public key decryp-tion/authentication. During the booting process the ARM (or one of the ARMs) reads the boot program from the on-chip ROM, executes it and copies a FSBL code from of the flash memories (or downloaded through JTAG) to the on-chip memory. FSBL boot code can be entirely controlled by the user, enabling the customization of the boot code. After loading the FSBL, it is executed by the ARM, providing the possibility of loading the bitstream to configure the PL.
Reconfiguration on the Zynq
One of the most remarkable features of 7 series FPGAs, like most SRAM-based FPGAs, is that they can be reconfigured dynamically (while system running)
[108]. The configuration and reconfiguration processes are done by loading spe-cific configuration bit stream files, named configuration bitstreams or .BIT files.
The reconfiguration process can be performed by external systems (e.g. a micro-processor or a PC) or the FPGA can load bitstreams himself from an external non-volatile memory module.
Xilinx provides two data paths connected to special configuration pins to config-ure its 7 series FPGAs: the low pin demanding serial datapath and the parallel datapath (8-bit, 16-bit, or 32-bit) that provides higher performance and access to several standard interfaces. The pins from these datapaths serve as an in-terface for different configurations: master and slave serial configuration, master and slave SelectMAP parallel configuration, JTAG/boundary-scan configuration, master serial and byte peripheral interface flash configuration. Moreover, the Zynq can reconfigure itself by the programmable logic, or by any other process-ing system through the device configuration interface (DevC). This interface is supported by a dedicated DMA controller capable of transferring bitstreams from an external memory through the Processor Configuration Access Port (PCAP).
The 7 series FPGAs also have an Internal Configuration Access Port (ICAPE2) [3], which provides the user logic with access to the 7 series FPGA configura-tion interface. In [109], both existing internal configuraconfigura-tion interfaces, PCAP and ICAPE2, have been extensively evaluated in other to select the most conve-nient alternative. An additional architecture has been implemented in order to dynamically reconfigure the FPGA without the Processing System at the maxi-mum bandwidth of 400 MB/s.
Besides reprogramming the entire device, most SRAM-based FPGAs feature par-tial reconfiguration, enabling to reconfigure a specific region while the rest of re-gions of the FPGA continue running. Due to this feature, it is possible to reduce both cost and area usage. Moreover, bearing in mind that a partial bitstream gathers only the information of the target reconfigurable zone, it contains much less information than a complete bitstream. Hence, the utilization of partial re-configuration schemes allows to reduce the bitstream storage requirements and reconfiguration time.
Designing partial reconfigurable implementations is analogous to designing vari-ous complete reconfigurable designs that share certain resources. The first step is to designate a reconfigurable region, which has to be defined determining both a proper placement and an adequate size of the reconfigurable region. A proper floorplanning requires to consider several aspects, such as the resources demanded by the different designs to be loaded, the size and placement of the static zone or the other reconfigurable regions, etc. A wide number of 7 series FPGAs resources, like CLBs, BRAMs, DSPs or routing elements are available for reconfiguration
purposes. Nevertheless, several components like clocks and clock management blocks (e.g. BUFG, BUFR, MMCM and PLL), I/O and I/O related components, serial transceivers and some dedicated elements (e.g. BSCAN, STARTUP, ICAP and XADC) do not support reconfiguration. These non-reconfigurable elements must be placed in the static region of the device.
The regular way to configure an FPGA is to load at first a complete bitstream that contains one of the partial reconfiguration designs. After configuring the entire device, the partial bitstream can be loaded to partially reconfigurable re-gions, while the rest of the FPGA remains uninterrupted. After reconfiguration, it is advisable to initialize reconfigurable modules in order to ensure a predictable starting situation. Another recommendable practice is to disconnect the recon-figurable and the static regions during the reconfiguration process by utilizing decoupling logic.
The partial reconfiguration can also be used to reduce power consumption by disabling certain region(s). In [110], partial reconfiguration is utilized to replace circuits during idle periods with power saving circuits. In a similar way, [111]
proposes to utilize empty bitstreams to blank partially reconfigurable regions.
Figure 2.16 illustrates an outstanding benefit of the partial reconfiguration, which is the feasibility of adapting the design in the field by loading distinct circuits within the reconfigurable region. This aspect enables to improve the fault tol-erance, accelerates the configurable computing and provides real-time flexibility making it possible to develop new techniques in design security.
FP GA
Figure 2.16: Replacing reconfigurable modules with the dynamic partial reconfiguration.
One of the most significant drawbacks of the reconfiguration process is its dura-tion, which is limited by the maximum frequencies of the configuration interfaces.
For instance, the PCAP’s maximum frequency is 100 MHz. However, although its theoretical speed of reconfiguration is 400 MB/s, the real speed decreases because of the internal ARM interconnect architecture [112]. In addition, the duration of the reconfiguration process varies depending on the size of bitstream to be
configured. Hence, partial bitstreams are less time demanding than complete ones. The common duration of this process goes from microseconds to millisec-onds [113]. A possible solution to reduce the reconfiguration time is to compress the bitstream, however it requires to previously process the bitstream and limits it usability. In [114], an alternative reconfiguration controller coined as ZyCAP was presented, which improves the reconfiguration throughput in Zynq when compared to standard methods. This controller allows overlapped executions, enhancing the system performance. ZyCAP can be used with soft-processors, but driver software modifications are required. In [112] an adaptive partial re-configurable system to maximize the output performance is proposed reducing the reconfiguration time to 12% over a full configuration time.