The first step of the design validation has been to simulate all elements of the PICDiY (registers, IDC, memories, etc.). Then, the entire processor design has been checked by simulating all the instructions have been validated in a phys-ical implementation by utilizing the ILA analyser. Finally, the processor has been validated for distinct applications implemented in ZedBoard, including the TestApp.
After the validation process the PICDiY has been compared with other 8-bit soft-core processors in order to evaluate it. Two processors have been chosen for this task: a widely used soft-core processor and a PIC based one.
The first selected soft-core processor has been the PicoBlaze, since is a prevailing IP for Xilinx based designs [58, 67, 69, 70]. Being optimized for 7-series devices, it presents a very reduced size (small as 26 slices depending on device family) and high performance (up to 240 MHz). In addition, it is highly integrated for implementing non-time critical state machine and presents a predictable fast interrupt response. All these characteristics make the PicoBlaze an ideal reference for a comparison.
Regarding the second soft-core processor, different IP cores based on the PIC16’s architecture are available (CQPIC, PPX16mcu, RISC5X, RISC16F84, MINI-RISC, Synthetic PIC, UMASS, etc.). All of these cores differ slightly in their architectures, but they all implement the main features of a PIC16 microcon-troller. The selected PIC16 based soft-core processor has been the PPX16mcu [331]. The main reason to select the PPX16mcu has been that it is the closest alternative to the PICDiY, since PPX16mcu is a minimalist open source version that does not has neither Watchdog nor EEPROM. PPX16mcu is a single cycle and, four times faster, VHDL implementation of the 16F84.
For the comparison tests, the three soft-core processors have been implemented with the same timing constraints (100 MHz) and without including the logic re-sources necessary to implement the Testapp. Additional implementations have been also performed in order to determine the maximum frequency achievable with each soft-core processor. Implementation results (at 100 MHz) are shown in Table 5.1. As it can be observed, PICDiY presents a significantly lower device utilization compared to PPX16mcu, while PicoBlaze is the one with the lowest overhead. As this table shows, those results are closely related with the dynamic power consumption of each soft-core. Due to the limited resource usage of the analysed soft-cores, there is only a slight difference between them. In any case, these results prove that the power consumption of the three soft-cores is moder-ate. Regarding the maximum achievable frequencies, the tests have shown that PicoBlaze provides the best results (150 MHz). On the other hand, the maximum frequencies for PICDiY and PPX16mcu are similar.
Furthermore, Table 5.2 shows a detailed list of the primitives utilized by each im-plementation. Apart from the resource overhead information given by this table, it is interesting to note that PICDiY uses distributed memory resources (RAMS64E and RAMS32 primitives) instead of BRAM blocks to implement the data memory module. This is because Vivado changes the initial BRAM based implementation defined by the VHDL design in order to meet the timing constraints. Additional
tests with more moderate timing constraints (e.a. 90 MHz) have shown that Vivado implements the data memory by using a RAMB18E1 primitive. A similar situation has been observed in case of PPX16mcu. While in a 100 MHz design all memories are implemented with distributed memory resources, a design with lower frequency utilizes a RAMB18E1 primitive.
Table 5.1: Implementation results summary of the soft-core processors (@100MHz).
Resource PICDiY PPX16mcu PicoBlaze
Slice LUTs 266 478 130
Slice Registers 82 223 114
F7 Muxes 17 2 16
F8 Muxes 8 - 8
Block RAM Tile 0.5 - 0.5
fmax(MHz) 107 110 150
Dynamic p. (W) 0.129 0.120 0.123
Static p. (W) 0.125 0.125 0.125
Table 5.2: Primitive utilization of the soft-core processors (@100MHz).
Primitive PICDiY PPX16mcu PicoBlaze
FDRE 80 119 111
FDCE - 66 3
FDSE 2 -
-FDPE - 38
-CARRY4 8 9 10
LUT1 1 16 1
LUT2 32 53 1
LUT3 11 54 1
LUT4 18 55 1
LUT5 37 100 42
LUT6 131 173 76
MUXF7 17 2 16
MUXF8 8 - 8
RAMB18E1 1 - 1
RAMD32 - 12 24
RAMS32 7 4 8
RAMD64E - 24
-RAMS64E 32 - 32
Further aspects must be considered when comparing three soft-core processors.
Although the functionality and programming of PPX16mcu is quite similar to PICDiY’s, PPX16mcu requires extra instructions to configure ports, since they can work as input or output. In this way, compared to PICDiY’s, the program code of PPX16mcu needs six extra instructions to configure both, input and output ports. Besides, while PPX16mcu needs eight clock cycles to execute instructions with conditional jumps, PICDiY needs only 4 clock cycles for all the
instructions, which increases the performance. This aspect is especially relevant in FSM based designs.
On the other hand, there are several differences between PICDiY and PicoBlaze that deserve to be remarked. One of the most relevant is that PicoBlaze is platform dependent and is only suitable for Xilinx’s devices, while PICDiY is platform independent and can be implemented in FPGAs from any vendor. Con-sidering that the design of PicoBlaze is highly optimized for Xilinx devices, any attempt to export its architecture results in an increase of resource utilization and a lower maximum frequency. This aspect can be observed by analysing the different efforts made to obtain platform independent soft-cores based on the Pi-coBlaze [26, 322, 323]. Another interesting aspect is that, as Figure 5.4 shows, commonly the output port of PicoBlaze has to be registered in order to synchro-nize the reading with the strobe signal, causing an increase in resource utilization.
Analysing programming aspects, PicoBlaze only uses two clock cycles to execute each instruction and the data loading is done with a single instruction (LOAD), while PICDiY needs two (MOVW + MOVWF). However, PICDiY offers better char-acteristics to deal with multiple conditional branches. This is because none of PicoBlaze instructions gives access to the program counter. The example pre-sented in Table 5.3 demonstrates PicoBlaze needs more instructions than PICDiY to carry out these programming structures. In this example, PicoBlaze needs to go through the complete list of instructions to reach the last case. Depending on the number of programmed cases, this can be a critical aspect especially in certain applications. For instance, a program containing a FSM with many states would demand a huge number of instructions, affecting the performance.
m
8 n Register sX 8
Register sY / Literal kk
OUT_PORT[7:0]
WRITE_STROBE
PORT_ID[7:0]
PicoBlaze IP Core
D
En Q
Clk
FPGA Logic
Figure 5.4: Output operation and FPGA interface for PicoBlaze.
Considering these aspects, it has been concluded that the PICDiY is an ade-quate candidate to be used as a target processor for different approaches. Spe-cially taking into account that it is an adaptation-friendly processor thanks to its modularity.
Table 5.3: Comparison of FSM coding examples for PICDiY and PicoBlaze processors.
N. Instr. PICDiY PicoBlaze 1 MOVF state, w COMP state,0 2 ADDWF PLC, F JUMP Z, state0 3 GOTO state0 COMP state,1 4 GOTO state1 JUMP Z, state1 5 GOTO state2 COMP state,2 6 GOTO state3 JUMP Z, state2 7 GOTO state4 COMP state,3 8 GOTO state5 JUMP Z, state3 9 GOTO state6 COMP state,4 10 GOTO state7 JUMP Z, state4 11 GOTO state8 COMP state,5 12 GOTO state9 JUMP Z, state5 13 GOTO state10 COMP state,6
14 JUMP Z, state6
15 COMP state,7
16 JUMP Z, state7
17 COMP state,8
18 JUMP Z, state8
19 COMP state,9
20 JUMP Z, state9
21 COMP state,10
22 JUMP Z, state10