What area of Flash is available for the user program on the RX62N board? List the range of addresses and the exact size in bytes (not Kbytes).

䡲 I Bit (Interrupt Enable): Whenever this bit is set, an interrupt can occur If this bit is disabled, then even if an interrupt occurs an exception will not be accepted The

BIT SYMBOL BIT NAME DESCRIPTION R/W

2. What area of Flash is available for the user program on the RX62N board? List the range of addresses and the exact size in bytes (not Kbytes).

In RX62N, the memory addresses of the on-chip ROM or data flash ranges from 0010 0000h to 0010 8000h.The total number of bytes available for user program is 32768 bytes.

3.4 ADVANCED CONCEPTS

3.4.1 Pipelining

Pipelining is an important technique used to make fast CPUs. It is implemented to increase the instruction throughput (i.e., the number of instructions that can be executed in a unit of time) but it does not reduce the time taken to execute an individual instruction. The execution time of an individual instruction actually increases due to overhead caused by imple- menting the pipeline. The overhead in the pipeline is from register delay (setup time) and clock skew. Improvement in instruction throughput means programs run faster and have lower execution time, though no single instruction runs faster. The performance will be re- duced if there is an imbalance between the stages. For instance, if execution stage takes more time than memory access stage or, in other words, an instruction stays in execution stage longer than memory access stage, then the performance degrades because the clock cannot be faster than the slowest stage. Pipelining would provide optimal CPU performance CPU if every instruction was independent of every other instruction but, unfortunately, most of the instructions are dependent on each other.

The RX CPU is based on the classical five-stage pipeline. During the execution of an instruction, it is converted into various micro-operations. The five stages of the pipeline are described below. Only the Instruction Fetch (IF) stage is executed in the terms of instructions while others are executed in terms of micro-operations.

The operation of pipeline and respective stages is described as follows:

1. Instruction Fetch Stage (IF Stage):

In the IF stage, the CPU fetches 32/64 bit instructions from the memory. The Program counter (PC) fetches the instruction and then the PC is incremented by

4 or 8 since the instructions are 4 or 8 bytes long. The RX CPU has four 8-byte instruction queues. The RX CPU keep fetching instructions until the queues are filled.

2. Instruction Decode Stage (ID Stage):

The main function of this stage is decoding. Instructions are decoded in the ID stage and are converted into micro-operations. In addition to the decoding of the instructions, the values of registers (operands) are also read from the register file. If the value of a register needed is the result of the preceding instruction, then the CPU executes a bypass process (BYP). This process is also called forwarding.

3. Execution Stage (E Stage):

Two main types of calculation take place in this stage. One is normal ALU operations and the other is memory address calculations for memory access stage. Normal ALU operations are register to register operations which includes add, subtract, compare, and logical operations. The other calculations are memory ref- erence operations which includes all load operations from the memory. During the execute stage, the ALU adds the two arguments; i.e., a register and a constant off- set given in the instruction to produce an address by the end of this stage.

4. Memory Access (M Stage):

Memory is accessed either for fetching an operand from the memory or storing an operand in the memory. The address of an operand is calculated in the precious execution stage. This stage (M Stage) is divided into two sub-stages, M1 and M2. The RX CPU enables respective memory accesses for M1 and M2.

䡲 M1 stage (memory-access stage 1):

In this sub-stage, operand memory accesses OA1 and OA2 are processed. A store operation is processed when a write request is received via the bus. During the Load operation, the operation proceeds to the M2 stage only when a read request is received via the bus. In addition to the request, if the load data is received at the same timing (i.e., no-wait memory access), then the operation proceeds to the WB stage.

䡲 M2 stage (memory-access stage 2):

In this sub-stage operand memory access OA2 is processed. In this sub-stage the CPU waits for the load data, and once received the operation proceeds to the WB stage.

5. Write-back stage (WB stage):

The last stage of the pipeline writes data into the register file. In this stage, the operation result calculated in the execution stage and the data read from memory in the memory access stage are written to the register (RW). The data read from memory and the other type of data, such as the operation result, can be written to the register in the same clock cycles.

One cycle

IF stage D stage E stage M1 stage M2 stage WB stage

IF DEC OP OA1 OA2 RW BYP RF Pipeline stage Execution processing M stage

Figure 3.18 Pipeline Configuration and its Operation. Source: Hardware Manual, Figure 2.10, page 2–26.

Pipeline Basic Operation

Ideally, each pipeline stage should take the same amount of time to process the instruction. Unfortunately, ideal conditions are hard to achieve and hence stalls are created in the pipeline and performance is degraded. The slowest stage which takes the maximum amount of time becomes the bottleneck. Once the pipeline is filled, each instruction will come out of the pipeline after one clock cycle. In a non-pipelined processor, if there are n tasks to handle and each task takes m clock period , then total time taken to process ntasks is n*m clock periods. In a pipelined processor, when it has m stages, ideally n tasks will take (m (n 1)) clock periods. So, if we calculate the speed-up gained in this sce- nario, it will be n * m/{m (n 1)}.

Hazards prevent the next instruction from executing at the next cycle. They reduce the performance and speed-up gained by pipelining. There are three types of hazards:

1. Structural hazard:

It arises from resource conflicts when the hardware is not capable of supporting multiple instructions simultaneously in an overlapped manner.

2. Data hazard:

It arises when an instruction depends on the result of previous instruction. For instance, in the pipeline, an instruction I 1 depending on the result of instruction I will cause a data hazard because one of the operands of the instruction I 1 is the result of instruction I. In such a case, instruction I 1 cannot execute, as the data is not available.

3. Control hazard:

It arises from pipelining of branches and other instructions that change the program counter; i.e., when a set of instructions are control dependent on the branch condition, and what value the PC will take is not known until the execution stage or decode stage.

Hazards in the pipeline cause stalls in the pipeline which simply means stalling the pipeline. Allowing some instructions to proceed and delaying other instructions helps in avoiding the stall. When an instruction is stalled, the instructions issued (from the instruction stream) later than the stalled one are also stalled. And the ones which were issued earlier than the stalled instruction are allowed to proceed, so that the hazard goes away with time.

The following figures show typical cases that can occur in the pipeline. (In the following section, the abbreviation “mop” stands for micro-operations.)

1. Pipeline flow with stalls:

In Figure 3.19 the first instruction is a division operation and its execution stage takes more than one cycle to complete. The other instructions, such as the ‘ADD’ instruction, if allowed to execute will complete execution before the division operation. Since out of order completion might create a problem, the pipeline has to be stalled.

IF D stall IF stall stall D E WB stall E WB E WB DIV R1, R2 ADD R3, R4 ADD R5, R6 . . . . . . . . . . . . (mop) div (mop) add (mop) add IF D E E

Figure 3.19 Stalls created due to an Instruction that requires multiple cycles to execute in the E Stage.Source: Hardware Manual, Figure 2.20, page 2–32.

IF D E stall IF D stall stall E WB stall M WB M WB MOV [R1], R2 MOV [R3], R4 ADD R5, R6 . . . . . . . . . . . . (mop) load (mop) load (mop) add IF D E M M

Other than no-wait memory access

Figure 3.20 Stalls created due to an Instruction that requires more than one cycle for its operand access to execute in the E stage.Source: Hardware Manual, Figure 2.21, page 2–32.

In Figure 3.20, the first instruction is a load instruction which takes more than one cycle to complete its operand access from memory. Again since in-order completion is important, therefore the later instructions have to be stalled and no more fetching of instructions takes place. As soon as the memory access completes, the pipeline is no longer stalled and instructions are passed to their next stage.

Figure 3.21 shows how control dependencies in the code cause stalls in the pipeline. For instance, consider a simple “if statement.”

Branch instruction (mop) jump IF D E IF D E WB Branch penalty Two cycles Branch instruction is executed

Figure 3.21 Stalls created due to control dependency on a Branch Instruction. Source: Hard-

ware Manual, Figure 2.22, page 2–33.

Code0 ... ... if(Cond1){ Code1 } Code2 ... ...

Code1 is control dependent on cond1. Code1 will only execute if Cond1 becomes true or is satisfied. Code2 is also control dependent on Cond1. When this code is converted into assembly code, a branch instruction such as jump executes. As the branch instruction goes into the instruction decode stage, it’s time to fetch the next instruction. But which instruction should be fetched, Code1 or Code2? Since the decision is dependent on cond1 (branch instruction), and the result of the branch instruction will be available after execute stage (E stage), therefore the pipeline has to be stalled for two cycles. By the end of two cycles the address of next instruction is available in the program counter (PC), and then the next instruction will be fetched.

In document RENESAS_RX62N_MICROCONTROLLER (Page 87-92)