• No results found

Application of Compiler-Assisted Multiple Instruction Rollback Recovery to Speculative Execution

N/A
N/A
Protected

Academic year: 2021

Share "Application of Compiler-Assisted Multiple Instruction Rollback Recovery to Speculative Execution"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

July 1993

UILU-ENG-93-2229

CRHC-93-16

Center for Reliable and High-Performance Computing

APPLICATION OF

COMPILER-ASSISTED

MULTIPLE INSTRUCTION

ROLLBACK RECOVERY TO

SPECULATIVE EXECUTION

N.J. Alewine

W. K. Fuchs

W.-M. Hwu

Coordinated Science Laboratory

College of Engineering

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Approved for Public Release. Distribution Unlimited.

(2)

UNCLASSIFIED

J a

xj_

{EduRifv C

lassification of this page

REPORT DOCUMENTATION PAGE

1a. REPORT SECURITY CLASSIFICATION

Unclassified _________

1b. RESTRICTIVE MARKINGS

None

2a. SECURITY CLASSIFICATION AUTHORITY

2b. DECLASSIFICATION/DOWNGRADING SCHEDULE

3. DISTRIBUTION/AVAILABILITY OF REPORT

Approved for public release;

distribution unlimited

4. PERFORMING ORGANIZATION REPORT NUMBER(S)

UILU-ENG-93-2229 CRHC-93-16

5. MONITORING ORGANIZATION REPORT NUMBER(S)

6a. NAME OF PERFORMING ORGANIZATION

Coordinated Science Lab

University of Illinois

6b. OFFICE SYMBOL

(If applicable)

N/A

7a. NAME OF MONITORING ORGANIZATION

National Aeronautics and Space Administration

6c ADDRESS (G'ty, State, and ZIP Code)

1101 W. Springfield Avenue

Urbana, IL 61801

7b. ADDRESS (City, State, and ZIP Code)

Moffitt Field, CA

8a. NAME OF FUNDING/SPONSORING ORGANIZATION

7a

8b. OFFICE SYMBOL

(If applicable)

9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER

8c. ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS

PROGRAM

PROJECT

TASK

7b

ELEMENT NO.

NO.

NO.

WORK UNIT ACCESSION NO.

11. TITLE (Include Security Classification)

Application of Compiler-Assisted Multiple Instruction Rollback Recovery to Speculative

Execution

12. PERSONAL AUTHOR(S)

ALEWINE, N. J, W. K. Fuchs, and W.-M. Hwu

13a. TYPE OF REPORT

Technical__

13b. TIME COVERED FROM_________ _ TO

14. DATE OF REPORT (fear, Month, Day)

1

QQ

3

Tnlv

19

---15. PAGE COUNT

18

16. SUPPLEMENTARY NOTATION

I

17

COSATI CODES 18. SUBJECT TERMS (Continue on reverse if necessary and identify by block number) FIELD GROUP SUB-GROUP r o l l b a c k r e c o v e r y , c o m p i l e r - a s s i s t e d m u l t i p l e i n s t r u c t i o n ,

t r a n s i e n t p r o c e s s o r f a i l u r e s , i n s t r u c t i o n a l l e v e l p a r a i -Jj3 .1 ÌS 2 L _ 19. ABSTRACT (Continue on reverse if necessary and identify by block number)

Speculative execution is a method to increase instruction level parallelism which can be exploited by both super-scalar and

V L I W

architectures. The key to a successful general speculation strategy is a repair mechanism to handle mispredicted branches and accurate reporting of exceptions for speculated instrucitons. Multiple instruction rollback is a technique developed for recovery from transient processor failure. Many of the difficulties encountered during recovery from branch misprediction or from instruction re-execution due to exception in a speculative execution arChitecute are similar to those encountered during multiple instruction rollback.

This paper investigates the applicability of a recently developed compiler-assisted multiple instruciton rollback scheme to aid in speculative exectuion repair. Extensions to the ocmpiler-assisted scheme to support branch and exception repair are presented along with performance measurements across ten application programs.

20. DISTRIBUTION /AVAILABILITY OF ABSTRACT

| 0UNCLASSIFIED/UNLIMITED □ SAME AS RPT. □ DTIC USERS

21. ABSTRACT SECURITY CLASSIFICAI U n c l a s s i f i e d

ION

I 22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Area Code) 22c. OFFICE SYMBOL

DD FORM 1473,

84 MAR 83 APR edition may be used until exhausted.

All other editions are obsolete. SECURITY CLASSIFICATION OF THIS PAGE

UNCLASSIFIED

(3)

UNCLASSIFIED____________

ICCUWITY C L A M I FI CATION O F THII f»AOE

(4)

TO APPEAR: WORKSHOP OH HARDWARE AND SOFTWARE ARCHITECTURES FOR FAULT

TOLERANCE: PERSPECTIVES AND TOWARDS A SYNTHESIS

JUNE 14-16, 1993

LE MONT SAINT-MICHEL, FRANCE

APPLICATION OF COMPILER-ASSISTED MULTIPLE

INSTRUCTION ROLLBACK RECOVERY TO

SPECULATIVE EXECUTION

N. J. Alewine* W . K. Fuchs, W .-M . Hwu

Center for Reliable and High-Performance Computing

Coordinated Science Laboratory

University of Illinois at Urban a-Champaign

Abstract

Speculative execution is a method to increase in­ struction level parallelism which can be exploited by both super-scalar and VLIW architectures. The key to a successful general speculation strategy is a repair mechanism to handle mispredicted branches and ac­ curate reporting o f exceptions for speculated instruc­ tions. Multiple instruction rollback is a technique developed for recovery from transient processor fail­ ures. Many o f the difficulties encountered during re­ covery from branch misprediction or from instruction re-execution due to exceptions in a speculative exe­ cution architecture are similar to those encountered during multiple instruction rollback.

This paper investigates the applicability o f a re­ cently developed compiler-assisted multiple instruc­ tion rollback scheme to aid in speculative execution repair. Extensions to the compiler-assisted scheme to support branch and exception repair are presented along with performance measurements across ten ap­ plication programs.

1

Introduction

Super-scalar and VLIW architectures have been shown effective in exploiting instruction level paral­ lelism (ILP) present in a given application [1-3]. Cre­ ating additional ILP in applications has been the sub­ ject o f study in recent years [4-6]. Code motion within

a basic block is insufficient to unlock the full potential o f super-scalar and VLIW processors with issue rates

* International Business M achines C orporation, B oca R aton, FI.

1 This research was supported in part by the N ational Aero­ nautics and Space Adm inistration (N A SA) under grant NASA NAG 1-613, in cooperation with the Illinois Com puter Labora­ tory for Aerospace Systems and Software (IC LA SS), and in part by the Departm ent o f the Navy and m anaged by the O ffice o f the C hief o f Naval Research under Contract N00014-91-J-1283.

greater than two [3]. Given a trace o f the most fre­ quently executed basic blocks, limited code movement across block boundaries can create additional ILP at the expense o f requiring complex compensation code to ensure program correctness [7]. Combining multiple basic blocks into superblocks permits code movement within the superblock without the compensation code required in standard trace scheduling [3].

General upward and downward code movement across trace entry points (joins) and general down­ ward code motion across trace exit points (branches, or forks) is permitted without the need for special hardware support [7]. Sophisticated hardware support is required, however, for unrestricted upward code mo­ tion across a branch boundary. Such code motion is referred to as speculative execution and has been shown to substantially enhance performance over non- speculated architectures [8-10]. This paper focuses on the support hardware for speculative execution, which ensures correct operation in the presence o f except­ ing speculated instructions (referred to as exception repair) and o f mispredicted branches (referred to as branch repair). It is shown that data hazards which re­ sult from exception and branch repair are very similar to data hazards that result from multiple instruction rollback, and that techniques used to resolve rollback data hazards are applicable to exception and branch repair.

The remainder o f the paper is organized as follows. Section 2 gives a brief overview o f a compiler-assisted multiple instruction rollback (M IR) scheme to be used as a base for application to speculative execution re­ pair (SER). Section 3 describes speculative execution and the requirements for exception repair and branch repaur. Section 4 introduces a schedule reconstruc­

tion scheme and extends the compiler-assisted rollback

scheme. Section 5 describes read buffer flush costs auad Section

6

presents performance impacts which result

(5)

from read buffer flushes.

2 C om p iler-A ssisted M u ltip le Instruc­

tion R ollback Recovery

2.1

Hazard Classification

Within a general error model, data hazards result­ ing from instruction retry are o f two types [11-13]. On-path hazards are those encountered when the in­ struction path after rollback is the same as the initial path and branch hazards are those encountered when the instruction path after rollback is different than the initial path. As shown in Figure 1, rx represents an on-path hazard where during the initial instruction

se-Figure 1: On-path data hazard.

quence rx is written and after rollback is read prior to being re-written. As shown in Figure 2, ry represents a branch hazard where the initial instruction sequence writes ry and after rollback ry is read prior to being re­ written however this time not along the original path.

2.2

On-path Hazard Resolution Using a

Read Buffer

Hardware support consisting o f a read buffer o f size

2

N , as shown in Figure 3, has been shown to be ef­

fective in resolving on-path hazards [11-13]. The read buffer maintains a window o f register read history. If am on-path hazard is present, then prior to writing over the old value o f the hazard register, a read of that value must have taken place within the last N instructions (else after rollback o f < N , a read o f the hazard register would not occur before a redefinition). Key to this scenario is the fact that the original path is repeated. Branch hazard resolution is left to the

Figure 2: Branch data hazard.

Figure 3: Read buffer.

compiler. At rollback, the reawl buffer is flushed back to the general purpose register file (G P R F ), restoring the register file to a restartable state. The primary advantage o f the read buffer is that it does not require an additional reawl port as with a history buffer, repli­ cation o f the GPRF as with the future file, or bypass logic as with the reorder buffer or delayed write buffer [14,15].

2.3

Branch Hazard Removal Compiler

Transformations

Compiler transformations have been shown to be effective in resolving branch hazards [11,12]. Branch hazard resolution occurs at three levels;

1

) pseudo code, 2) machine code, and 3) post-pass. Resolution at the pseudo code level would be accomplished by renaming the pseudo register ry o f instruction /,•

(6)

(Fig-ure 2) to rz . Node splitting, loop expansion and loop protection transformations aid in breaking pseudo reg­ ister equivalence relationships so that renaming can be performed. After the pseudo registers are mapped to physical registers, some branch hazards could re­ appear. This is prevented at the machine code level by adding hazard constraints to live range constraints prior to register allocation. Branch hazards that re­ main after the first two levels can be resolved by either creating a ‘‘covering’’ on-path hazard or by inserting nop (no operation) instructions ahead o f the hazard instruction until the rollback is guaranteed to be un­ der the branch. Given the branch hazard o f Figure

2

, a covering on-path hazard is created by inserting an MOV ry, ry instruction immediately before the in­ struction in which ry is defined. This guarantees that the old value o f ry is loaded into the read buffer and is available to restore the register file during rollback.

• •

Figure 5: Speculated instruction traps.

3.1

Branch Repair

3

Speculative Execution

Figures 4 and 5 illustrate the two basic problems which are encountered when attempting upward code motion across a branch. As shown in Figure 4, if the

'* - r* o v branch taken ---1 rI - r2 *

3 2

Figure 4: r\ in live_out o f taken path.

speculated instruction (i.e., an instruction moved up­ ward past one or more branches) modifies the system state, and due to the branch outcome the speculated instruction should not have been executed, program correctness could be affected. Figure 5 illustrates that if the speculated instruction causes an exception, and again due to the branch outcome, the excepting in­ struction should not have been executed, program per­ formance or even program correctness could be af­ fected.

Figure

6

shows an original instruction schedule and a new schedule after speculation. Instructions d, *, and / have been speculated above branches c and

g from their respective fall-through paths

.2

Specu­ lated instructions Me marked “ (s).” The motivation for such a schedule might be to hide the load delay o f the speculated instructions or to allow more time for the operands o f the branch instructions to become available. If c commits to the taken path (i.e., it is mispredicted by the static scheduler), some changes to the system state that have resulted from the execu­ tion o f d, *, and / , may have to be undone. No update is required for the PC; execution simply begins at j. If instead, c commits to the fall-through path but g commits to the taken path, then only i ’s changes to the system state may have to be undone.

Not all changes to the system state are equally im­ portant. If for example, d writes to register rx and

rx £ liv e.in (j) (i.e., along the path starting at j , a

redefinition o f rx will be encountered prior to a use of

rx [16]), then the original value o f rx does not have

to be restored. Inconsistencies to the system state as a result o f mispredicted branches exhibit similari­ ties to branch hazards in multiple instruction rollback [11,12]. Given this similarity between branch haz­ ards due to instruction rollback and branch hazards due to speculative execution, compiler-driven data­ flow manipulations, similar to those developed to elim­ inate branch hazards for MIR [11,12], cam be used to resolve branch hazards that result from speculation. Such compiler transformations have been proposed for

3For this exam ple it is assumed that the fall-through paths are the m ost likely outcom e o f the branch decisions at c and g.

(7)

a a RB_c: d b « d e 0 - * - j « i f d b i e («)f jum pLl f 0 j ■ 1 : 0 * k e RB_g: h h L 1 :[ ? J -^ k i i h jump L2 2: L2:

Original Speculated Recovery

Schedule Schedule

m

a

^

^--

m

Blocks

Figure

6

: Branch repair.

branch misprediction handling [9]. Since re-execution o f speculated instructions is not required for branch misprediction, compiler resolution o f branch hazards becomes a sufficient branch repair technique.

3.2

Exception Repair

Figure

6

also demonstrates the handling o f spec­ ulated trapping instructions. If d is a trapping in­ struction and an exception occurred during its execu­ tion, handling o f the exception must be delayed until c commits so that changes to the system state are mini­ mized, and in some cases to ensure that repair is pos­ sible in the event that c is mispredicted. If c commits to the taken path, the exception is ignored and d is haul died like any other speculated instruction given a branch mispredict. If c was correctly predicted, three exception repair strategies are possible. The first is to undo the effects o f only those instructions speculated above c (i.e., d, i, and / ) and then branch to a recov­ ery block R B .c [10] as shown in Figure

6

. The address of the recovery block can be obtained by using the PC value o f the excepting instruction as am index into a hash table. This strategy ensures precise interrupts [14,17] relative to the nonspeculated schedule but not relative to the original schedule. Recovery blocks can cause significamt code growth [10]. The second strat­ egy undoes the effects o f all instructions subsequent to

d (i.e., i,

6

, and / ) , handles the exception, amd resumes execution at instruction i [9]. This latter strategy pro­ vides restartable states and does not require recovery blocks. A third exception repair strategy undoes the effects o f only those subsequent instructions that are speculated above c (i.e., only i amd / ) , handles the ex­

ception, and resumes execution at instruction «, how­ ever, this time only executing speculated instructions until c is reaudied. The improved efficiency o f strategy

3

over that o f strategy

2

comes at the cost o f slightly more complex exception repair hardware.

When a branch commits and is mispredicted, the exception repair hardware must perform three func­ tions:

1

) determine whether am exception has occurred during the execution o f a speculated instruction,

2

) if an exception has occurred, determine the PC value o f the excepting instruction, amd 3) determine which changes to the system state must be undone. Func­ tions

1

and

2

aure similar to error detection amd location in multiple instruction rollback. Function 3 is similar to on-path hazard resolution in multiple instruction rollback [11,12,18]. On-path hazards assume that af­ ter rollback the initiad instruction sequence from the faulty instruction to the instruction where the error was detected is repeated.

Figure 7 illustrates the speculation o f a group of

:

Figure 7: Exception repadr.

instructions amd re-execution strategy 3. The load in­ struction traps, but the exception is not hamdled un­ til the bramch instruction commits to the fall-through path. Control is then returned to the trapping instruc­ tion. This scenario is identicad to multiple instruction rollbaudc where am error occurs during the loaui instruc­ tion and is detected during the bramch instruction. For this examiple, only r*i must be restored during rollback since amd r

5

will be rewritten prior to use during re-execution. Figure 7 shows that exception repair

(8)

hazards in speculative execution are the same as on- path hazards in multiple instruction rollback, and a read buffer as described in Section 2 can be used to resolve these hazards. The depth o f the read buffer is the maximum distance from h to In along any back­ wards walk3, where Jn is a trapping instruction that was speculated above branch instruction h

-3.3

Schedule Reconstruction

Assumed in Figures

6

and 7 are mechanisms to identify speculative instructions, determine the PC value o f excepting speculated instructions, and deter­ mine how many branches a given instruction has been speculated above. An example o f the latter case is shown in Figure

6

where instructions d, t, and / , are undone if c is mispredicted; however, only i must be undone if g is mispredicted.

If the hardware had access to the original code schedule, the design o f these mechanisms would be straightforward. Unfortunately, static scheduling re­ orders instructions at compile-time and information as to the original code schedule is lost. To enable recov­ ery from mispredicted branches and proper handling of speculated exceptions, some information relative to the original instruction order must be present in the compiler-emitted instructions. This will be referred to as schedule reconstruction.

By limiting the flexibility o f the scheduler, less in­ formation about the original schedule is required. For example, if speculation is limited to one level only (i.e., above a single branch), a single bit in the opcode fleld is sufficient to indicate that the instruction has been moved above the next branch [

8

]. The hardware would then know exactly which instruction effects to undo (i.e., the ones with this bit set). Also, remov­ ing branch hazards directly with the compiler permits general speculation with no schedule reconstruction for branch repair [9].

4

Im plicit Index Schedule Reconstruc­

tion

Implicit index scheduling supports general specula­

tion o f regular and trapping instructions. The scheme was inspired by the handling o f stores in the sentinel scheduling scheme [9] and was designed to exploit the unique properties o f the read buffer hardware design described in Section 2. Schedule reconstruction is ac­ complished by marking each instruction speculated or

3 A walk is a sequence o f edge traversals in a graph where the edges visited can be repeated [19].

nonspeculated by including a bit in the opcode field,

and using this encoding to maintain an operand his­ tory o f speculated instructions in a FIFO queue called a speculation read buffer (SRB). The SRB operates similar to a read buffer with additional provisions for exception handling.

4.1

Exception Repair Using a Speculation

Read Buffer

Figure

8

shows an original code schedule and two speculative schedules, along with the contents o f the SRB at the time branches Ie and Ig commit. Instruc­ tions Id and If have been speculated above branch instruction / c, and has been speculated above both

Ig and Ic. The encoding o f speculated instructions in­

forms the hardware that the source operands are to be saved in the SRB, along with the source operand values, corresponding register addresses, and the PC o f the speculated instruction.

Speculated instructions execute normally unless they trap. If a speculated instruction traps, the ex­ ception bit in the SRB which corresponds to the trap­ ping instruction is set and program execution contin­ ues. Subsequent instructions that use the result o f the trapping instruction are allowed to execute normally.

A chk^except(k) instruction is placed in the home block o f each speculated instruction. Only one

chk-except(k) instruction is required for a home block.

As the name implies, chL.except(k) checks for pend­ ing exceptions. The command cam simultaneously in­ terrogate each location in the SRB by utilizing the bit field k. As shown in schedule 1 o f Figure

8

,

chk.except(0011U) in J' checks exceptions for instruc­

tions Id and Id. If a checked exception bit is set, the SRB is flushed in reverse order, restoring the appropri­ ate register and PC values. Execution can then begin with the excepting instruction.

Figure

8

illustrates several on-path hazards which are resolved by the SRB. In schedule

1

, if /,• traps and the branch /<• commits to the taken path, U has cor­ rupted ri and If has corrupted r

7

. Flushing the SRB up through /, restores both registers to their values prior to the initial execution o f Ii. Note that register re is also corrupted but not restored by the SRB, since after rollback re will be rewritten with a correct value before the corrupted value is used.

As an alternative to checking for exceptions in each home block, the exception could be handled when the exception bit reaches the bottom o f the SRB. This is similar to the reorder buffer used in dynamic schedul­ ing [14] and eliminates the cost o f the chk-except(k) command, however, increases the exception handling

(9)

Original Schedule Speculated Schedule 1 Speculated Schecule 2 rl = r2 * r3

V

rL lr2*

'

L

r3 = r4 + r5

h-bne rJt r3, I.

V

r8 ri a r 7 * r8

VI

rs “ r> * 4

¥

V 0 “ r2 * r3 r7 = r7 + 4 bne r8, v7, 1^ rt - r6 * 4 r , = M E M (r, ) 1 c U o s r h d 2N Ic: bne rJt r3, Ij I’ : chk_except(001 111) V r8 rs + 4 V bne V r7> I*: chk_except(l 10000) lk- r6 r6 + 4 Except bit — 1 I Reg. No. i t . - 0 ic . valuti r?) 7 V value(rg) 8 V valuer7) 7 I.- - 0 1/ valuer2) 2

}

}

}

-I’ : chk_except(l 10011) ri + 4 lg: bne rg , r? , lk I*: chk_except(001100) V r6 = r6 * 4 Except b it— I j R e g .N o .-i I

f e

1 c “ o S r h d 2N i t . - 0 i t . value(r7) 7 I,- - 0 I,- valuer2) 2 V valuerg) 8 V valuefr7) 7

h

} J

}

-SRB Contents SRB Contents

(10)

latency which can impact performance depending on the frequency o f exceptions.

Implicit index scheduling derives its name from the ability o f the compiler to locate a particular register value within the SRB. This is possible only if the dy­ namically occurring history o f speculated instructions is deterministic at branch boundaries. Superblocks guarantee this by ensuring that the sole entry into the superblock is at the header and by limiting specula­ tion to within the superblock. For standard blocks, bookkeeping code [7] can be used to ensure this deter­ ministic behavior.

4.2

Branch Repair Using a Speculation

Read Buffer

As described in Section 2, branch repair can be han­ dled by resolving branch hazards with the compiler. Branch hazard resolution in multiple instruction roll­ back can be assisted by the read buffer when cover­ ing on-path hazards are present, reducing the perfor­ mance cost o f variable renaming [11,12]. In a similar fashion, the SRB can assist in branch repair. Figure 9 shows the original code schedule and the two spec­ ulative schedules o f Figure

8

. For this example, it is assumed that r j, r

3

, r6, and

7*7

are elements in both

live.in (Ij) and livc.in(Ik).

As shown in schedule 1, if branch instruction Ic commits to the taken path, 7*3, re, and rr, which were modified in Ii, Id, and / / , respectively, must be re­ stored. If instead, Ic commits to the fall-through path and Ig commits to the taken path, only

7*2

must be re­ stored. Registers t

*2

and

7*7

are rollback hazards that result from exception repair; therefore, the SRB con­ tains their unmodified values. By including a flush(k) command at the target o f Ie and Ig, the SRB can be used to restore r2 and/or rr given a misprediction of

Ie or Ig.

The flush(k) command selectively flushes the ap­ propriate register values given a branch misprediction. For example, in schedule 2 o f Figure 9, if Ic is predicted correctly and Ig is mispredicted, the SRB is flushed in reverse order up through /«, restoring vaiue(r2) from

Ii but not restoring valu e(rj) from I f . Since specu­

lation is always from the most probable branch path, the flush(k) command is always placed on the most improbable branch path, minimizing the performance penalty. Not all branch hazards are resolved by the presence o f on-path hazards. These remaining haz­ ards can be resolved with compiler transformations.

5

S R B Flush P enalty

The examples o f Section 4 demonstrate that compiler-assisted multiple instruction rollback cam be applied to both bramch repair amd exception repair in a speculative execution architecture. The flush penadty o f the read buffer is not a key concern in multiple in­ struction rollback applications since instruction faults are typicadly very rare. In application to exception re- pair in speculative execution, the SRB flush penalty is also not a major concern due to the infrequency o f ex­ ceptions involving speculated instructions. However, in application to bramch repadr, the SRB flush penadty could produce significamt performance impacts. Stud­ ies o f bramch behavior show a conditioned branch fre­ quency o f 11% to 17% [20]. Static bramch prediction methods result in bramch mispredictions in the ramge o f 5% to 15%. This results in a branch repadr fre­ quency as high as 2.5%. Assuming a CPI (clock cycles per instruction) rate o f one amd am average SRB flush penadty o f ten cycles, the performance overhead o f the flush mechanism would reach 22.5%. This indicates the importamce o f minimizing the amount o f redun- damt data stored in the SRB so that the flush penadty is reduced.

Recently, a technique was proposed to reduce the amount o f redundamt data in a read buffer so that the read buffer size could be reduced [12,13]. A similar technique cam be used to assure that only the data required for bramch amd exception repadr is stored in the SRB. In the implicit index scheme o f Section 4, a bit indicating whether am instruction is speculated is added to the opcode field. By expanded this field to two bits, operamd storage requirements cam be spec­ ified. Figure 10 shows the reduced contents o f the SRB given schedule 1 o f Figure 9. In the modified scheme, only the first reaui o f

7*7

must be maintained. Register r% is not required since it was not modified. The improved scheme adso eliminates blamk spaces in the SRB. For this exauuple, the misprediction o f Ie in schedule 1 o f Figure 9 results in four less variables to flush.

The coding o f the two speculation bits would be as follows:

00

) no save required,

01

) save operamd

1

,

10

) save operamd 2, amd 11) save both operamds. If neither operamd o f a speculated instruction has be saved in the SRB, the instruction is not marked as speculated. This is not a problem for bramch repair: however, if such am instruction traps, the hardware would have no way o f knowing not to hamdle the exception immedi­ ately. There would adso be no entry in the SRB for the exception bit or for the corresponding PC vadue. One solution to the problem would be to auid another bit to

(11)

Original Schedule Speculated Schedule 1 Speculated Schedule 2 V <i - r

2

* r3 V

r3 m r 4 + rS

Ie: bne rJf Tj. Iy V r< ” r7 • rS I .: ri “ r» + 4 y r7 = r7 + 4

I^i bne / j , ry, 1^

I»: r6 mr i * 4 I.: r2 = M EM (r2 ) I r 0 r2 * rJ V 0 r2 * rJ Except b it— , I Reg. No. 2 N l: valuer?) valuefrj valueffj) I^i bne ^7 » I ^ V r6 mr 6 + 4

Iy.

flu s h (lO lllO ) 1^ flush(lOOOOO) — 2 N

r8

r8 + 4 I£ bne fg I Tjt I ^ lK r6 = ri + 4 L : flu s h (lllO lO ) .

1

I*: flush(OOlOOO) — , V - 0 v v a lu er7) 7 I. - 0 I; va lu er2) 2 h valuerg) 8 h v a lu er?) 7 SRB Contents SRB Contents

(12)

Except b it— » Reg. No.

a

a

f é 1 c 2N u o s r h d

a

]f h value(r?) 7 i, vaiue(r2) 2 SRB Contents

Figure 10: SRB with reduced content.

Figure 11: Instrumentation code placement. the opcode field which marks speculated trapping in­ structions. A better solution is to code all speculated trapping instructions which have no operands to save as 01. This will indicate that exception handling is to be delayed and cause a reservation o f an entry in the SRB, and also will slightly increase the flush penalty during branch repairs.

6

Perform ance Evaluation

6.1

Evaluation Methodology

In this section, results o f a read buffer flush penalty evaluation are presented. The instrumentation code segments o f Figure 11 call a branch error procedure which performs the following functions:

1. Update the read buffer model.

2. Force actual branch errors during program exe­ cution, allowing execution to proceed along an incorrect path for a controlled number o f instruc­ tions.

3. Terminate execution along the incorrect path and restore the required system state from the simu­ lated read buffer.

4. Measure the resulting flush cycles during the branch repair.

5. Begin execution along the correct path until the next branch is encountered.

An example instrumentation code segment is shown in Figure 12. Parameters, such as operand saving in­ formation, current PC, branch fall-though PC, and branch target PC values, are passed by the instru­ mentation code to the branch error procedure. An additional miscellaneous parameter contains instruc­ tion type and information used for debugging.

Figure 13 gives a high level flow o f operation for the branch error procedure. When a branch instruction in the original application program is encountered, an

arm-branch flag is set. Prior to the execution o f the

next application instruction, the arm-branch flag is checked, and if set, the branch decision made by the application program is set aside. The branch is then predicted by the branch prediction model. Four mod­ els are used in the evaluation: 1) predict taken, 2) pre­ dict not taken, 3) dynamic prediction, and 4) static prediction from profiling information. The dynamic prediction model is derived from a two bit counter branch target buffer (BTB) design [21] and is the only model that requires updating with each predic­ tion outcome.

After the branch is predicted, the prediction is checked against the actual branch path taken by the application program. If the prediction was correct, ex­ ecution proceeds normally. If the prediction was incor­ rect, the correct branch path is loaded into the recov­ ery queue along with a branch error detection (BED) latency, and the predicted path is loaded into the PC. The BED latency indicates how long the execution of instructions is to continue along the incorrect path. The branch error time-out flag is set when the BED latency is reached. When a branch error is detected, the register file state is repaired using the read buffer contents. The PC value o f the correct branch path is obtained from the recovery queue. During branch er­ ror rollback recovery, the number o f cycles required to flush the read buffer during branch repair is recorded.

(13)

$_3 im lb _ 2 _ 2 4_0 : # i n s t r u c t i o n 24 # B e g in b rs im _ s im h o o k : s i subu l a sw l a sw l a sw l i sw l i sw move j $ s p , $ a t , $ a t , $ a t , $ a t , $ a t , $ at

t

$ a t , $ a t , $ a t , $ a t , $ a t , 44

$

16, s2 - 0 : norm al — hook address — instruction adress — next hook address — miscellaneous _s im lb _ 2 _ 2 4_0 2 0 ($ sp ) $_s im lb _ 2 _ 2 4_1 2 4 ($ sp ) $_3 im lb _ 2 _ 2 5_0 2 8 ($ sp ) 8216 ---3 2 ($ sp ) 16 « b r s im sa v e 4 0 ($ sp ) $sp # End b rs im _ s im h o o k . $ _ s im lb _ 2 _ 2 4_1 : addu $16, $16,

directs read buffer to save register 16 original instruction $_s im lb _ 2 _ 2 5_0 : # i n s t r u c t i o n 25 # B e g in brsim__sim h o o k : s i - 16, s2 - 9: b ra n ch subu $spr 44 l a $ a t , $ s im lb 2 25 0 — hook address sw $ a t , 2 0 ($sp) l a $ a t , $ s im lb 2 25 1 - e - —— instruction adress sw $ a t , 2 4 ($ sp ) l a $ a t , sw $ a t , 2 8 ($ sp ) l i $ a t , 532505 « sw $ a t , 3 2 ($ sp ) l a $ a t , sw $ a t , 3 6 ($ sp ) l i $ a t , 304 e ...— sw $ a t , 4 0 ($ sp ) registers 16 and 9 move $ a t , $sp j brsim __save # End brsim__sim h o o k . $ s im lb 2 25 1 :

bne $16 , $9, $_main__5 original instruction $ main 6:

(14)

Y w update RB model update recovery Queue return

i

• restore GPRF from RB model, record flush cycles • load PC from recovery queue PC - program counter

GPRF - general purpose register file RB - read buffer

BPM - branch prediction model

(15)

Table 1: Application programs. Program Static Size Description

QUEEN 148 eight-queen program WC 181 UNIX utility

QSORT 252 quick sort algorithm CMP 262 UNIX utility GREP 907 UNIX utility PUZZLE 932 simple game COMPRESS 1826 UNIX utility LEX 6856 lexical analyzer YACC 8099 parser-generator CCCP 8775 preprocessor for

gnu C compiler

It is assumed for this evaluation that two read buffer entries can be flushed in a single cycle. This cor­ responds to a split-cycle-save assumption o f the gen­ eral purpose register file [12]. Performance overhead due to read buffer flushes (% increase) is computed as

F lush-O H = 100 .

total-cycles

All instructions are assumed to require one cycle for execution. This assumption is conservative since the MIPS processor used for the evaluation requires two cycles for a load. The additional cycles would increase the total-cycles and thereby reduce the observed per­ formance overhead. In addition to accurately measur­ ing flush costs, the evaluation verifies the operation o f the read buffer and its ability to restore the appropri­ ate system state over a wide range o f applications.

The instrumentation insertion transformation oper­ ates on the s-code emitted by the MIPS code generator of the IMPACT C compiler [3]. The transformation determines which operands require saving in the read buffer and inserts calls to the initialization, branch er­

ror, and summary procedures. The resulting s-code

modules are then compiled and ran on a DECstation 3100. For the evaluation, BED latencies from 1 to 10 were used. Table 1 lists the ten application programs evaluated. Static Size is the number o f assembly in­ structions emitted by the code generator, not includ­ ing the library routines and other fixed overhead.

6.2

Evaluation Results

Experimental measurements o f read buffer flush overhead ( Flush Off) for various BED latencies are shown in Figures 14 through 23. The four branch

Flush OH <%) P_Taken: o -P_N_Taken:--o- Dyn_Pred: Prof_Pred: -a

-t— I— I— I— I— I— i— I— r

1 2 3 4 5 6 7 8 9 10

BED Latency

Figure 14: Flush penalty: QUEEN.

Flush OH <%)

50H P Taken: o

-P N

Taken:--o-40- Dyn_Pred: •••»•••

Prof_Pred:

- a

-

30-

20-10- _ „ s & ïS ïz l

0-

1"

**Ta

* ^

l i i l i

1— I— I— r 1 2 3 4 5 6 7 8 9 10 BED Latency

Figure 15: Flush penalty: WC.

prediction strategies used for the evaluation are: 1) predict taken (P-Taken), 2) predict not taken

(P-N-Taken), 3) dynamic prediction based on a

branch target buffer ( Dyn-Pred), and 4) static branch prediction using profiling data ( Prof-Prtd).

Flush costs were closely related to branch predic­ tion accuracies, i.e., the more often a branch was mis­ predicted, the more often flush costs were incurred. In a speculative execution architecture, branch predic­ tion inaccuracies result in performance impacts in ad­ dition to the impacts from the branch repair scheme. Branch misprediction increases the base run time o f an application by permitting speculative execution of unproductive instructions. Increased levels o f specular tion increase the performance impacts associated with branch prediction inaccuracies. Only the performance impacts associated read buffer flushes are shown in Figures 14 through 23.

(16)

Flush OH (% ) 50- 40- 30-P_Taken: o -P_N_Taken:--o- Dyn_Pred: •••*•• ProfJPred: -a

-i — -i— -i— -i— -i— -i— r

2 3 4 5 6 7 8 9 10

BED Latency

Figure 16: Flush penalty: COMPRESS.

Flush OH (% ) 50- 40-

30-20

-P Taken: o -PlN_Taken:--o- Dyn_Pred: ProfPred:

o

f

» f ■

't

f f f 2 3 4 5 6 7 8 9 10 BED Latency

Figure 17: Flush penalty: CMP.

Flush OH (* > 50H 40-

30-20

-P_Taken: o -P_N_Taken:--o- Dyn_Pred: •••*■■ Prof_Pred: -a -2 3 4 5 6 7 8 BED Latency 9 10

Figure 18: Flush penalty: PUZZLE.

Hush OH <%) 50- 40-

30-20

-1

10

-P_Taken: o -P_N_Taken:-o- Dyn_Pred: •••».. ProfPred: -a

-II--o—a

" i — i— i— i— i— i— i— r

1 2 3 4 5 6 7 8 9 10

BED Latency

Figure 19: Flush penalty: QSORT. For nine o f the ten applications, P-N.Taken was

significantly more accurate or marginally more ac­ curate in predicting branch outcomes than P - Taken. For QSORT, P-Taken was significantly more accurate than P-N-Taken. This result demonstrates that in a speculative execution architecture, it is difficult to guarantee optimal performance across a range o f ap­ plications given a choice between predict-taken and predict-not-taken branch prediction strategies.

For all but one application, Prof-Pred was more ac­ curate than either P-Taken or P-N-Taken. For CMP,

Prof-Pred, P-N-Taken, and Dyn-Pred were nearly per­

fect in their prediction o f branch outcomes. Prof-Pred marginally outperformed Dyn-Pred in all applications except LEX.

The purpose o f measuring read buffer flush costs given the recovery from injected branch errors is to establish the viability o f using a read buffer design

for branch repair for speculative execution. Although in such a speculative schedule only static prediction strategies would be applicable, the Dyn-Pred model was included to better assess how varying branch pre­ diction strategies impact flush costs. Overall, the ac­ curacy o f Dyn-Pred fell between P-Taken/P-N-Taken and Prof-Pred.

Over the ten applications studied, read buffer flush overhead ranged from 49.91% for the PJTaken strat­ egy in CCCP to .01% for the P-N-Taken strategy for CMP given a BED o f ten. It can be seen from Figures 14 through 23 that a good branch prediction strat­ egy is key to a low read buffer flush cost. The results show that given a static branch prediction strategy using profiling data, an average BED o f ten produces flush costs no greater than 14.8% a n d an average flush cost o f 8.1% across the ten applications studied. This performance overhead is comparable to the overhead

(17)

Hush OH

(% )

Figure 20: Flush penalty: GREP.

Flush OH

<%)

50H p_Taken: H P N

Taken:--o-OH— i— i— i— i— i— i— i— i— r

1 2 3 4 5 6 7 8 9 10

BED Latency

Figure 22: Flush penalty: YACC.

Flush OH

(% )

expected from a delayed write buffer scheme with a maximum allowable BED o f ten [15]. Given a max­ imum BED o f ten and an average BED o f less than ten, the flush costs o f the read buffer would be less than that o f a delayed write buffer, since a delayed write buffer is designed for a worst-case BED and the flush penalty o f a read buffer is based on the average BED. The observed flush costs are small in compari­ son to the substantial performance gain o f speculated architectures over that o f nonspeculated architectures

[

8

-

10

].

The BED for a given branch in this evaluation cor­ responds to the number o f instructions moved above a branch in a speculative schedule. The results o f the evaluation indicate that if the average number o f in­ structions speculated above a given branch is < 10, then the read buffer becomes a viable approach to handling branch repair.

Flush OH

<%)

2 3 4 5 6 7 8 9 10

BED Latency

Figure 23: Flush penalty: CCCP.

7

Sum m ary

Speculative execution has been shown to be an ef­ fective method to create additional instruction level parallelism in general applications. Speculating in­ structions above branches requires schemes to han­ dle mispredicted branches and speculated instructions that trap.

This paper showed that branch hazards resulting from branch mispredictions in speculative execution are similar to branch hazards in multiple instruction rollback developed for processor error recovery. It was shown that compiler techniques previously developed for error recovery can be used as an effective branch repair scheme in a speculative execution architecture. It was also shown that data hazards that result in rollback due to exception repair are similar to on-path hazards suggesting a read buffer approach to exception

(18)

repair.

Implicit index scheduling was introduced to exploit the unique characteristics o f rollback recovery using a read buffer approach. The read buffer design was extended to include PC values to aid in rollback from excepting speculated instructions.

Read buffer flush penalties were measured by in­ jecting branch errors into ten target applications and measuring the flush cycles required to recover from the branch errors using a simulated read buffer. It was shown that with a static branch prediction strat­ egy using profiling data, flush costs under 15% are achievable. The results o f these evaluations indicate that compiler-assisted multiple instruction rollback is viable for branch and exception repair in a speculative execution architecture.

8

Acknow ledgem ents

The authors wish to thank Shyh-Kwei Chen and C.-C. Jim Li for their help with the compiler aspects o f this paper. We would like to thank Scott Mahlke, William Chen, and John Christopher Gyllenhaal for their excellent technical suggestions and assistance with the IM PACT C compiler. Finally, we express our thanks to Janak Patel for his contributions to this research.

References

[1] R. P. Colwell, R. P. Nix, J. O ’Donnell, D. B. Par pworth, and P. K. Rodman, “A VLIW Architec­ ture for a Trace Scheduling Compiler,” in Proc.

2nd Int. Conf. Architecture Support Programming Languages and Operating Syst., pp. 105-111, Oct.

1987.

[2] J. C. Dehnert, P. Y . Hsu, and J. P. Bratt, “Over­ lapped Loop Support in the Cydra 5,” in Proc.

3rd Int. Conf. Architecture Support Programming Languages and Operating Syst., pp. 25-38, April

1989.

[3] P. Chang, W . Chen, N. Warter, and W .- M. W . Hwu, “IMPACT: An Architecture Frame­ work for Multiple-Instruction-Issue Processors,” in Proc. 18th Annu. Symp. Comput. Architecture, pp. 266-275, May 1991.

[4] B. R. Rau and C. D. Glaeser, “Some Scheduling Techniques and an Easily Schedulable Horizon- tad Architecture for High Performance Scientific

Computing,” in Proc. 20th Annu. Workshop Mi­

croprogramming Microarchitecture, pp. 183-198,

Oct. 1981.

[5] M. S. Lam, “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” in

Proc. A C M SIGPLAN 1988 Conf. Programming Language Design Implementation, pp. 318-328,

June 1988.

[6] A. Aiken and A. Nicolau, “Optimal Loop Paral­ lelization,” in Proc. A C M SIGPLAN 1988 Conf.

Programming Language Design Implementation,

pp. 308-317, June 1988.

[7] J. A. Fisher, “Trace Scheduling: A Technique for Global Microcode Compaction,” IEEE Trans.

Comput., vol. c-30, no. 7, pp. 478-490, July 1981.

[8] M. D. Smith, M. S. Lam, and M. Horowitz, “Boosting Beyond Scalar Scheduling in a Super­ scalar Processor,” in Proc. 17th Annu. Symp.

Comput. Architecture, pp. 344-354, May 1990.

[9] S. A. Mahlke, W . Y. Chen, W .-M . W . Hwu, B. R. Rao, and M. S. Schlansker, “Sentinel Scheduling for VLIW and Superscalar Processors,” in Proc.

5th Int. Conf. Architecture Support Programming Languages and Operating Syst., pp. 238-247, Oct.

1992.

[10] M. D. Smith, M. A. Horowitz, and M. S. Lam, “Efficient Superscalar Performance Through Boosting,” in Proc. 5th Int. Conf. Architecture

Support Programming Languages and Operating Syst., pp. 248-259, Oct. 1992.

[11] N. J. Alewine, S.-K. Chen, C.-C. J. Li, W . K. * Fuchs, and W .-M . W . Hwu, “Branch Recov­ ery with Compiler-Assisted Multiple Instruction Retry,” in Proc. 22th. Int. Symp. Fault-Tolerant

Comput., pp. 66-73, July 1992.

[12] N. J. Alewine, Compiler-assisted Multiple In­

struction Rollback Recovery using a Read Buffer.

PhD thesis, Tech. Rep. CRHC-93-06, University o f Illinois at Urbana-Champaign, 1993.

[13] N. J. Alewine, S.-K. Chen, W . K. Fuchs, and W .- M. W . Hwu, “Compiler-assisted Multiple Instruc­ tion Rollback Recovery using a Read Buffer,” Tech. Rep. CRHC-93-11, Coordinated Science Laboratory, University o f Illinois, May 1993. [14] J. E. Smith and A. R. Pleszkun, “Implementing

Precise Interrupts in Pipelined Processors,” IEEE

(19)

[15] Y . Tamir and M. Tremblay, “High-Performance Fault-Tolerant VLSI Systems Using Micro Roll­ back,” IEEE Trans. Comput., vol. 39, pp. 548- 554, Apr. 1990.

[16] A. V. Aho, R. Sethi, and J. D. Ullman, Compil­

ers: Principles, Techniques, and Tools. Reading,

MA: Addison-Wesley, 1986.

[17] M. Johnson, Superscalar Microprocessor Design. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1991. [18] C.-C. J. Li, S.-K. Chen, W . K. Fuchs, and W .-

M. W . Hwu, “Compiler-Assisted Multiple In­ struction Retry,” Tech. Rep. CRHC-91-31, Coor­ dinated Science Laboratory, University o f Illinois, May 1991.

[19] J. A. Bondy and U. Murty, Graph Theory with

Applications. London, England: Macmillan Press

Ltd., 1979.

[20] J. L. Hennessy and D. A. Patterson, Computer

Architecture: A Quantitative Approach. San Ma­

teo, CA: Morgan Kaufmann Publishers, Inc., 1990.

[21] J. K. Lee and A. J. Smith, “Branch Prediction Strategies and Branch Target Buffer Design,”

References

Related documents