• No results found

or alternatively the main datapath can be left in a hibernate state until the SORU2 unit finishes (leading to a better energy reduction). SORU2 is very suitable for being used in wireless nodes that must execute part of a distributed algorithm.

FSR-based algorithms are very different from the benchmarks used originally for SORU2. FSR-based algorithms consist of a loop that performs multiple operations in rounds; the data output of one execution is the data input of the next iteration. However, when the resource is available in a wireless node, we propose to use it in order to increase security of the implementation, as the the structure and reconfigurability are suitable to protect the implementations.

4.2

SORU2 organization

The SORU2 functional unit is a 32 bit dynamically reconfigurable datapath. The most prominent features of the SORU2 RFU are:

• It provides a flexible dynamically reconfigurable pipeline, with multiple configura- tion contexts, that can be easily configured to chain a large number of vector-oriented operations.

• An internal general purpose register file reduces the coupling with the main datapath and helps to reduce memory accesses.

• The whole operation execution control is embedded into the SORU2 functional unit, so the main processor is free to do other tasks.

The SORU2 internal structure is shown in figure 4.2. This reconfigurable datapath is designed as a three stage pipeline structure that is described below.

4.2.1 SORU2 Decode stage

The first stage of the pipeline reads the SORU operations from an internal control memory. An internal control unit manages all the operation execution aspects, directed by some operation-specific information stored inside an internal memory by the compiler or the run-time execution support system.

The operations of SORU are named instructions. Each instruction can be specified to be executed several times (rounds) and the instruction flow is specified with the jump attribute. The SORU memory specification enables the implementation of complex loops. In addition to the execution flow, an instruction specifies the Basic Reconfigurable Unit (BRU) context, the identifier of the BRU configuration, the registers to load as inputs and the write back process. An XML file describes the control unit configuration, including a set of instructions, and is compiled to be saved in SORU control memory. The format of an instruction in the XML configuration file is shown in 4.1

Listing 4.1: SORU internal memory specification

Chapter 4. Countermeasure proposal I: reconfigurable co-processor

EXECUTION STAGE

WRITE BACK STAGE DECODE STAGE BRU 1 BRU 2 BRU 3 BRU 4 CONTROLLER GENERAL PURPOSE REGISTERS OUTPUT BUFFER

4.2. SORU2 organization

<!ELEMENT soru_map (instruction+)>

<!ELEMENT instruction (rounds?, jump?, bru*, message?, end?)> <!ATTLIST instruction id CDATA #IMPLIED>

<!ELEMENT rounds (#PCDATA)> <!ELEMENT jump (#PCDATA)>

<!ELEMENT bru (conf, input0?, input?, output?)> <!ATTLIST bru id (1|2|3|4) #REQUIRED>

<!ELEMENT conf (#PCDATA)> <!ELEMENT input0 (#PCDATA)> <!ELEMENT input (#PCDATA)> <!ELEMENT output (#PCDATA)>

<!ATTLIST output mode (last|always) "always"> <!ELEMENT message (#PCDATA)>

<!ELEMENT end EMPTY>

Besides the control unit, this stage has a general purpose register file, that stores new data from the main processor and partial results from the running SORU2 operations. It contributes to significantly reduce the register pressure in the main register file and the coupling between SORU2 operations and normal operations.

4.2.2 SORU2 Execution stage

As shown in figure 4.2, the SORU2 execution stage is divided into four cascade- connected BRUs. Each BRU gets three 32-bit data operators: 1) the last result computed by itself, 2) the result from the previous BRU, and 3) a new data item from the SORU2 register file.

The result of a BRU operation can be used by the next BRU at the next clock cycle. Moreover, that result can be stored in the SORU2 register file.

Internally, each BRU is reconfigurable. We have used low-power Spartan-6 FPGA slices in this Ph.D. thesis, with 6-input LUTs. The control inputs of these elements and the FPGA configuration memories form a BRU context, which is replicated to enable the BRU to perform different operations at every clock cycle.

As mentioned before, BRU reconfiguration is done dynamically, according to the program needs. When explicitly called, the new SCONF operation of the main processor performs a block memory transfer from main memory to an inactive BRU context, changing the functionality of the SORU2 unit.

Different configurations for one-cycle operations can be prepared off-line and stored in an external memory. The compiler is in charge of inserting configuration code to write the required contexts while the program is loaded. Additionally, the run-time support system can use dynamic information to re-optimize parts of the program by using a different set of configuration contexts.

4.2.3 SORU2 Write back stage

The write back stage writes the intermediate results into the SORU2 internal register file. Data can be stored from any BRU, not necessarily from the last one.

Data can be stored from the SORU2 register file to main memory using vector store units that are configured from the main processor.

Chapter 4. Countermeasure proposal I: reconfigurable co-processor

Data is read from the register file in positive clock edges while the write back operation is done on negative clock edges.

4.2.4 Programming interface

The SORU2 co-processor can be used from the main processor through a very simple memory-mapped interface. We have defined SORU2 operations for:

• Configuring a context of one of the BRU stages of the SORU2 RFU (SCONF).

• Programming the vector load units to iteratively load a new element into a SORU2 register every n clock cycles (LDV).

• Programming the vector store unit to extract the output data from the SORU2 unit when it is ready (STV).

• Scalar data movement between the main register file and the SORU2 register file (overloaded as MOVE in assembler code).

• Executing SORU2 SIMD operations (EXECV).

The instructions include timing attributes to schedule the different instructions. Both VLSU instructions include attributes to specify the shape of the vector, similar to RSVP [46]: stride, span and skip.

If properly configured, the SORU2 co-processor can activate an interrupt line to notify the end of a complex SIMD operation.