Dezso¨ Sima
2.2.6 Basic Alternatives and Possible Implementation Schemes of Register Renaming
In the design space of register renaming, theoretically each possible combination of the available design choices yields one possible implementation alternative. Instead of considering all possible implementa- tion alternatives, it makes sense to focus only on those, which differ in relevant qualitative aspects from each other. We designate these alternatives the basic alternatives. Possible basic alternatives can be derived from the design space in two steps—first by identifying the relevant qualitative design aspects and then by composing their possible combinations. Concerning the selection of the relevant qualitative design aspects, we recall the design space of renaming, shown in Fig. 2.10. First, we can ignore two main aspects, the scope of register renaming, as recent processors typically implement full renaming, and the rename rate, because of its quantitative character. Thus, two main design aspects remain; the layout of the rename buffers and the implementation of register mapping. Furthermore, as Fig. 2.12 indicates, the layout of the rename buffers itself covers three design aspects: the type of rename buffers, the number of rename buffers, and the number of the read and write ports. Of these, only the type of the rename buffers is of qualitative character. From the design aspect layout of the register mapping (Fig. 2.16), we consider the method of keeping track of actual mappings the only relevant aspect. It follows that the design space of register renaming includes only two relevant qualitative aspects: the type of the rename buffers and the method of keeping track of actual mappings.
The design choices available for these two relevant design aspects result in nine possible combinations, called the basic alternatives for register renaming (as shown in Fig. 2.19), if we neglect the unpromising possibility to implement rename buffers in the shelving buffers. In addition, as the operand fetch policy of the processor, which is a design aspect of shelving, significantly affects how the rename process is carried out, in this figure we also take into account this aspect. This splits the nine basic renaming alternatives into 18 feasible implementation schemes. In this figure, we also indicate the implementation schemes that are used in relevant superscalar processors, as well as give some hints about their origins. As Fig. 2.19 indicates, out of the nine possible basic alternatives of renaming, relevant superscalar processors make use only of five. Moreover, latest processors employ mostly the following four basic alternatives of renaming:
1. Use of merged architectural and rename register files and of mapping tables (POWER4, POWER5, Pentium 4, as well as K7 (Athlon) and K8 (Hammer) for floating-point processing) 2. Use of separate rename register files and mapping registers within the rename buffers (PA8x00
line, Power3)
3. Renaming within the ROB and using mapping tables (Pentium Pro, Pentium II, Pentium III, Pentium M, Core)
4. Renaming within the ROB and using a future file (K7 (Athlon) and K8 (Hammer) for fixed point processing)
We emphasize that a few processors use different basic alternatives for renaming FX- and FP-instruc- tions, as is manifested for instance in the K7 and K8 processors. These processors use the ROB for renaming FX-instructions and a merged architectural and rename register file for renaming floating- point ones.
2.2.6.1 Implementation of the Rename Process
With reference to Section 2.2.2, we emphasize that the rename process can be broken down into the following subtasks:
1. Renaming the destination registers 2. Renaming the source registers
Basic alternatives of register renaming
Renaming within
the ROB
Merged architectural
and rename register file
Separate
rename register files
Issue- bound operand fetching
Proposals: Processors:
PM1 (1995) (SPARC 64) ∗The shelving buffers are also implemented in the ROB. The resulting unit is occasionally called the DRIS
ES/9000 (1992) POWER1 (1990) POWER2 (1993) P2SC (1996) Nx586 (1994) R10000 (1996) R12000 (1999) PowerPC 603 (1993) PowerPC 604 (1995) PowerPC 620 (1996)
POWER3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) Keller (1975) 6 Basic alternatives: Impl. schemes: Am29000 (1995) K5 (1995) Sohi, Vajapayem ∗ 48 (1987) Lightning ∗ (1991) K6 ∗ (1997) Johnson (1987) 47 Pentium 4 (2000) Mapping within the RBs Using a mapping table
Using a future file
Mapping
within the RBs
Using a
mapping table
Using a future file
Mapping
within the RBs
Using a
mapping table
Using a future file
POWER4 (2001) 51 POWER5 (2004) 52 K7 (FP) (1999) K8 (FP) (2003) 53
Pentium Pro (1995) Pentium II (1997) Pentium III (1999) Pentium M (2003)
54,55 Core (2006) 55,56 Smith, Pleszkon 46 (1985) K7 (FX) (1999) K8 (FX) (2003) 53
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching Issue- bound operand fetching
Dispatch- bound operand fetching
FIGURE 2.19 Basic implementation alternatives of register renaming (RB designates rename buffer).
3. Fetching the renamed source operands 4. Updating the rename buffers
5. Updating the architectural registers with the content of the rename buffers 6. Reclaiming of the rename buffers
7. Recovery from wrongly performed speculative execution and handling of exceptions
These subtasks are carried out more or less differently in the 18 distinct implementation schemes of renaming.
Of these, in Section 2.2.2 we described the rename process presuming one particular basic alternative (assuming the use of rename register files and mapping tables) in both operand-fetch scenarios that is in two implementation schemes. Below, instead of pointing out all differences in all further implementa- tion schemes of register renaming, we focus only on three particular tasks of renaming and point out significant differences encountered in different implementation schemes. In addition, we briefly discuss how inter-instruction dependencies are dealt with during renaming, how the processor recovers from misspeculations, and how it handles exceptions.
2.2.6.1.1 Remarks on Renaming Destination Registers
The way how the processor allocates new rename buffers depends on the type of rename buffers used. If rename buffers are realized in the ROB, a new ROB entry, and thereby a new rename buffer will automatically be allocated to each dispatched instruction. Else rename buffers need to be allocated only to those dispatched instructions, which include a destination register.
2.2.6.1.2 Remarks on Updating the Architectural Registers
As discussed previously, when instructions complete, their results need to be forwarded from the associated rename buffers into the originally addressed architectural registers. In cases where rename buffers are implemented separately from the architectural register file (as a stand-alone rename register file, or they are in the ROB or in the shelving buffer file), this task instructs the processor to physically transfer the contents of the related rename buffers into the referenced architectural registers. By contrast, if the processor uses a merged architectural and rename file, no physical data transfer is required; instead only the status of the related registers needs to be changed, as indicated before and shown in Fig. 2.15.
2.2.6.1.3 Remarks on Reclaiming Rename Buffers
The conditions for reclaiming no longer used rename buffers vary with the rename scheme employed. Thus, when operands are fetched dispatch bound, associated rename buffers may immediately be reclaimed after an instruction has been completed. On the other hand, if the processor fetches operands issue bound, associated rename buffers may only be reclaimed after the related instruction has been completed and, in addition, if it is also sure that no outstanding operand fetch requests are available to that rename buffer. The latter condition can be checked in different ways. One possibility is to use a counter for each rename buffer for checking outstanding fetch requests, as described in Section 2.2.2. Another option is applicable with merged architectural and rename register files. In this case, however, during instruction execution, a rename buffer becomes an architectural register and reclaiming is related to no longer used architectural registers, as discussed in Section 2.2.4.2. This method relies on keeping track of the most recent earlier instance of the same architectural register, and on reclaiming it when the instruction giving rise to the new instance completes [28].
2.2.6.1.4 Renaming of Destination and Source Registers if Inter-Instruction Dependencies Exist between the Instructions Dispatched in the Same Cycle
As we know, shelving relieves the processor of the need to check for data and control dependencies as well as for busy EUs during instruction dispatch. Nevertheless, despite shelving, instructions dispatched in the same cycle must still be checked for inter-instruction dependencies, and, in the case of depend- encies, the rename logic must be modified accordingly. Let us assume, for instance, that there are RAW dependencies between two subsequent instructions dispatched in the same cycle, as in the following example:
i1: mul r2, . . . , . . .
i2: add . . . , r2, . . .
Here, i2 needs the result of i1 as r2 is one of its source operands. We will also assume that the
destination register of i1(r2) will be renamed to r33 as follows:
i01: mul r33, . . . , . . .
In this case, the RAW-dependent source operand of i2(r2) has to be renamed to r33 rather than to the
rename buffer allocated before renaming of i1to r2.
Similarly, if WAW dependencies exist among the instructions dispatched in the same cycle, as for instance, between the instructions
i1: mul r2, . . . , . . .
i2: add r2, . . . , . . .
Obviously, different rename buffers need to be allocated to the destination registers of i1and i2, as
shown below: i01: mul r34, . . . , . . .
i02: add r35, . . . , . . .
2.2.6.1.5 Recovery of the Rename Process from Wrongly Executed Speculation and Handling of Exceptions
If the processor performs speculative execution, for instance, due to branch prediction, it may happen that the speculation turns out to be wrong. In this case, the processor needs to recover from the misspeculation. This involves essentially two tasks: (i) to undo all register mappings setup, and (ii) to reclaim rename buffers allocated, as already discussed. To invalidate established mappings there are two basic methods to choose from, independent of the actual implementation of renaming. The first option is to roll back all register mappings made during speculative execution, by using the identifiers of the faulty instructions, supplied by the ROB. While using this alternative, the recovery process lasts several cycles, since the processor can cancel only a small number of instructions (two to four) per cycle. A second alternative is based on checkpointing. In this method, before the processor begins with speculative execution, it saves the relevant machine state, including also the actual mapping, in shadow registers. If the speculative execution turns out to be wrong, the processor restores the machine state in a single cycle by reloading the saved state. For instance, both the PM1 (SPARC64) and the R10000 use checkpointing for recovery. Both processors incorporate mapping tables for register mapping, while the R10000 provides four sets of shadow registers and the PM1 16 for subsequent speculations.
We note that beyond the two basic methods discussed above, there is also a third option in the case when the processor uses mapping tables and issue-bound operand fetching. This method relies upon shadow mapping tables, which keep track of the actual mappings of the completed instructions. The entries of the shadow tables are set up when instructions complete and are deleted when allocated rename buffers are reclaimed. In the case of misspeculation, the correct state of the mapping table can be restored by loading the content of the shadow table. For example, Cyrix’s M3 makes use of this recovery mechanism.
The second task to be done during misspeculation is to reclaim rename buffers, which are allocated to the faulty instructions. This task can easily be performed by changing the state of the rename buffers involved to available, as indicated in Figs. 2.8 and 2.14.
A similar situation to the above described misspeculation arises when exceptions occur. In this case, the exception request must wait until the associated instruction comes to completion to provide precise exceptions [46]. At this time, the processor accepts the exception and cancels all instructions, which have
been dispatched after the failing one. For cancellation of the rename process, the same methods can be used as discussed above. For example, in the event of an exception the R10000 rolls back all younger register mappings made, whereas the PM1 first restores the mapping state to the first checkpoint after the failing instruction in one cycle, and then rolls back the remaining mappings until the failing instruction is reached.