3.5 Multiple Applications
4.1.2 OSIP Architecture
A typical ASIP design consists in finding repetitive patterns of consecutive instructions that might be later implemented as a custom instruction to improve the application run- time. Unfortunately, the instruction-level profile of the OSIP application does not display such dominant patterns, rendering the ASIP design more challenging. As a consequence, the architecture optimization focused on providing efficient memory access and reducing control overhead in combination with arithmetic operations.
An overview of the final OSIP architecture is given in Figure 4.3. In addition to the actual core, two interfaces are provided: a slave register interface and a master interrupt interface. Through the register interface, OSIP can be integrated as a standard peripheral on any memory-mapped communication architecture. The interrupt interface is composed of a set of interrupt ports that serve to trigger task execution on the processors. The core itself is a load-store architecture with a 6-stage pipeline, consisting of a prefetch stage (PFE), a fetch stage (FE), a decode stage (DC), an execute stage (EX), a memory stage (ME) and a
write-back stage (WB). The last five stages correspond to the original stages of the baseline
LTRISC architecture. The additional PFE stage serves to prefetch the Program Counter (PC) upon arrival of a system event, e.g., task creation or task synchronization.
When non-operational, OSIP stays in the idle state, potentially using a low power mode. Once a request arrives through the register interface, the PC generator in PFE stage determines the handler and sets a busy flag that protects OSIP from additional requests. In order to avoid requests loss, a software client must not issue requests while OSIP is in the busy state. Depending on the request, the handler decoder on the top of the program memory in Figure 4.3 selects a handler routine, e.g., task creation or task synchronization. During request processing, interrupt control signals can be generated at the execute stage in order to control system-wide task execution. Finally, after executing the handler, OSIP releases the busy flag, enters the idle state and waits for upcoming requests. The main new instructions are:
62 Chapter 4. MPSoC Runtime Management DC EX WB PFE Handler decoder Handler 1 Handler 2 ... OSIP Core Prog. memory Reg. interface Slave
interface Masterinterface
In te rr u p t g en er at o r Data memory OSIP-DT array Other data ME Idle state FE Request Arg. 1 Arg. 2 ... Busy PC- generator
Figure 4.3: OSIP Architecture.
Compare and Branch: Typically conditional branches are implemented with two in-
structions, one that operates on the registers and one that modifies the PC. In OSIP, these two steps are merged in a single instruction, with two possible syntaxes:
enh_b cond R[idx1], R[idx2]/imm4, BRANCH_ADDR
The second operand is either a register or a 4-bit immediate. The latter form is used to check the status or type of an OSIP-DT against some constants defined by the algorithms, e.g., scheduling policy type and mapper policy type.
Memory Access: Accelerating memory access is one of the key techniques to improve
OSIP performance. Single access is accelerated by performing index computation to fields in an OSIP-DT in hardware. To allow efficient address generation, all OSIP-DTs are al- located consecutively in a static array at the top of the memory (see Figure 4.3). This memory region is invisible to the OSIP C compiler and can only be accessed by special instructions. The syntax of the load/store instructions is:
sp_load/sp_store R[value], R[idx], W, W_MASK
where R[idx] holds the index to be accessed. W and W_MASK are a word offset and a bit mask within an OSIP-DT. R[value] contains the value to be loaded/stored.
Update and Continue: While traversing the hierarchy, it is common to update the in-
formation in the OSIP-DT of a node and then move to the next level of the hierarchy, e.g., loading its parent node. This is supported by a custom instruction with syntax:
update R[idx2], R[idx1], W, W_MASK, K
1 _start:
2 enh_b eq R[it], R[head], _end
3 update R[it], R[it], w=4, hw=0, 1
4 b _start
5 _end:
Listing 4.1: Update a list.
This instruction increments by K the field determined by (W, W_MASK) of the OSIP- DT indexed by R[idx1]. At the same time, the node at the next hierarchical level is pre-
4.1. OSIP Solution 63 (located at halfword 0 in word 4 of an OSIP-DT) is incremented by one within a cyclic list (see Line 3). The termination condition is checked in Line 2, by comparing for equality against the head of the list.
Compare Nodes: A basic node-comparison instruction is provided, with syntax:
cmp_node R[result], R[rule], R[idx1], R[idx2]
This instruction directly compares two in-memory OSIP-DTs with indexes R[idx1] and
R[idx2] according to a comparison rule given in R[rule]. This allows to take the comparison
operator from memory, according to the configuration of the scheduling algorithm. The result of the comparison is stored in register R[result].
Compare Nodes and Continue: Often, the best OSIP-DT within a list has to be found
according to a given rule. A typical implementation would use an iterator and a variable that contains the current best element. To accelerate this, an enhanced version of the cmp_nodeinstruction is provided, with syntax:
cmp_node_e R[result], R[rule], R[curr_best], R[it]
In addition to the comparison, this instruction automatically updates the index of the current best element R[curr_best]. An example for finding the best candidate in a list is shown in Listing 4.2. The code in Line 2 checks if the list has been entirely traversed. In Line 3, the iterator is compared against the current best descriptor. At the same time, the best descriptor is updated, in case the candidate results to be better. The code in Line 4 retrieves the index to the next candidate in the list. In this example, the index to the next element is located at the seventh word of the descriptor.
1 _start:
2 enh_b eq R[it], R[head], _end
3 cmp_node_e R[rslt], R[rule], R[best], R[it]
4 sp_load R[it], R[it], w=6, hw=0
5 b _start
6 _end:
Listing 4.2: Find the best candidate.