SORU simulation platform - Securing implementations of feedback-shift-register-based ciphers us

The main goal of this system is to obtain a simulator that lets us develop and test the design of the co-processor and evaluate the resistance against SCA of algorithms executed in it. The structure of the simulator we propose is as general as possible, and our particular case is derived from it.

In this section we give a description of the proposed structure, introducing the basic concepts of the complete system simulator. After showing the simulator general structure, we focus on the particular case of the SORU2 co-processor simulation.

4.3.1 General structure

Our solution allows to simulate any system with any co-processor and any main processor, having the structure given by figure 4.1 and consisting of a main processor, an independent co-processor, and the system memory shared by both. As we mention above, SORU2 communicates directly with the system memory through an arbiter, which solves the possible accessing conflicts. The co-processor communicates with the main processor only to receive orders and configurations.

4.3. SORU simulation platform

In order to design the simulation system, we use three basic tools: SystemC 2.2, TLM 2.0 libraries and LLVM. The SystemC library allows us to easily design the low level hardware blocks of the co-processor, offering cycle-accurate simulation results. The TLM 2.0 standard represents a higher level of abstraction suitable to design the remaining blocks of the system, which only need an approximated-time (not cycle- accurate) simulation.

Usage of TLM 2.0 considerably speeds up simulation time, as we only use SystemC cycle-accurate simulation for the critical components of our system. We use LLVM to define the instructions the compiler needs to translate to orders that will be sent to the co-processor on runtime.

The general simulator implementation defines the following modules and its connec- tions. The behavior of each module, though, is not fixed, and can be described in any way and with the functionalities chosen by the designer. More modules can be added to the system depending on the particular implementation, but the following should always be present:

• Main Processor: gives orders to the co-processor, configuring the execution. It is simulated with the LLVM interpreter and communicates with the co-processor through the arbiter.

• Arbiter: this module grants the main processor access to memory when the co- processor is not accessing it and forwards the configurations received from the main processor to the co-processor. This module acts as an interface between high-level LLVM and lower level modules. It is described in TLM 2.0.

• System bus: this is the interconnect component that routes all TLM 2.0 transactions through the system from their source to their destiny, according to the specified addresses.

• System memory: memory shared by both main processor and co-processor, also described in TLM 2.0. As this module is not likely to need a low-level description, system memory is implemented as a memory mapped file, from which other modules read and write.

• Co-processor: this module is implemented in SystemC with cycle-accurate simulation. The top level of the co-processor is described in TLM 2.0, so that it can connect to the system bus. Also, a translation layer is included, to convert TLM 2.0 to Sys- temC.

Given the simulator general structure, designers would have to describe the behavior of all modules and design the co-processor, with the particular solutions for their cases. To check and debug the particular implementations, the general simulator provides the debugging tools explained next:

• Co-Processor: the low-level synchronization and behavior of the SystemC co- processor requires cycle-accurate debugging that can be accomplished by trace files.

Chapter 4. Countermeasure proposal I: reconfigurable co-processor

Trace files provide a chronogram with the signals of the SystemC modules the designer wants to check. The file follows the Verilog VCD format and can be viewed once the simulation has ended with GTKWave or any other similar tool.

• TLM 2.0 modules: TLM 2.0 reporting tools are used to generate reports. Reports can be configured and activated to print the execution time of every TLM 2.0 module at every simulation step, and information about the modules synchronization and behavior in a similar way to a log console. If a deeper level of debug is needed, GDB can be used for the whole TLM 2.0 simulator.

• LLVM and communication: Debugging of the LLVM virtual machine is accomplished by the use of LLVM debugging tools. Debugging of the communication between LLVM main processor and the SystemC co-processor and TLM 2.0 simulator is done by monitoring the socket activity between the main processor and the arbiter.

Our simulator platform also provides designers any information related to data change at any point of the architecture (memory, co-processor internal registers, etc.). This information is the base to estimate the power consumption of the system for energy or security reasons. With this scheme we separate the data from consumption model. During simulation, all modules will be able to generate traces that will be saved to text files. When the simulation is completed, a power consumption model chosen by the designer can be applied to those traces.

4.3.2 Particular structure for the SORU2 co-processor

To design and evaluate the SORU2 co-processor, we add to the described general simulator platform a new module: the VLSU mentioned in the SORU2 description. The reconfigurable vector co-processor SORU2 accesses memory through the VLSU. The VLSU is split into two different kinds of blocks, the Vector Load Unit (VLU) and the Vector Store Unit (VSU). SORU2 can work with any number of these units, although we have implemented a system with 4 VLUs and 1 VSU. VLUs are responsible for reading data from memory and loading it to SORU2, whilst VSUs store data from SORU to main memory. This block is also described in TLM 2.0 and is added to the system bus.

Our particular system consists of CPU, Arbiter, Memory, VLSU and SORU2 co- processor, as can be seen on figure 4.3. Particularization includes the definition and design of all modules according to our particular working principles. The simulator operation is as follows.

When the simulation is launched for the first time, the main processor (CPU) config- ures the internal memory of the SORU2 co-processor and VLSU through the arbiter. Then, the CPU gives the order to begin operation execution. At this point, the VLSU starts loading all data needed for the current operation to SORU2, while SORU2 begins its execution. Even though the SORU2 internal memory is filled with some operation configuration, this can be updated in runtime. Load units need to keep synchronization with SORU2, so that data arrive at SORU2 when needed. Operation results are stored in SORU2 internal

4.3. SORU simulation platform

Chapter 4. Countermeasure proposal I: reconfigurable co-processor

registers and synchronously read by the VSU. Store units must save data to memory in the right order, so that a consistent result is obtained. Once operation is finished, SORU2 signals the arbiter which, in its turn, notifies the main processor about the execution end- ing. The main processor can then read from memory the operation results or either reuse these results for the next operation. The simulator can execute any number of operations. The simulation will only terminate when the appropriate signal is sent by the main processor program. Reconfiguration of SORU2 internal memory can be accomplished at any moment during execution, being careful not to write to a particular position of memory when SORU2 is reading from it. SORU2 internal memory is also a memory mapped file, as is the system memory.

In order to evaluate the security and energy consumption of the SORU2 co-processor, we use the simulator platform tracing capabilities. In our case, changes in SORU2 internal registers, basic reconfigurable units and system memory will be traced, and a track of the execution time of each operation will be kept.

For security evaluation, we have presented in Section 2.2.1 existing proposals to protect combinational FPGA designs against SCA. Among the available countermeasures, we propose the use of BCDL [134], DPL-noEE [25] or PA-DPL [88], which can be generated automatically from standard synthesis. Therefore, we can ignore the BRU leakage information and its contribution to power consumption. Moreover, in FSR-based algorithms, the only memory access is loading the input data or storing the output data, which we consider the attacker already knows. Therefore, we can also ignore memory access contributions to power consumption. We only focus on the leakage of the BRU pipeline internal registers. As this registers are updated synchronously on the positive edge of the SORU2 internal clock, an attacker would be able to extract this information by preprocessing the power traces. When simulation is completed, we apply the Hamming Distance model on the traces to generate the input traces for the side-channel attack. The side-channel attack also uses the Hamming Distance model.

In document Securing implementations of feedback-shift-register-based ciphers using compiler optimizations and co-processors (Page 88-92)