Evolution of CISC Processors - Guide to RISC Processors

The evolution of CISC designs can be attributed to the desire of early designers to efﬁ- ciently use two of the most expensive resources, memory and processor, in a computer system. In the early days of computing, memory was very expensive and small in ca- pacity. This forced the designers to devise high-density code: that is, each instruction should do more work so that the total program size could be reduced. Because instructions are implemented in hardware, this goal could not be achieved until the late 1950s due to implementation complexity.

The introduction of microprogramming facilitated cost-effective implementation of complex instructions by using microcode. Microprogramming has not only aided in implementing complex instructions, it has also provided some additional advantages. Mi- croprogrammed control units use small fast memories to hold the microcode, therefore the impact of memory access latency on performance could be reduced. Microprogram- ming also facilitates development of low-cost members of a processor family by simply changing the microcode.

Another advantage of implementing complex instructions in microcode is that the instructions can be tailored to high-level language constructs such as while loops. For example, theloopinstruction of the IA-32 can be used to implementforloops. Simi- larly, memory block copying can be done by using its string instructions. Thus, by using these complex instructions, we close the “semantic gap” between HLLs and machine languages.

So far, we have concentrated on the memory resource. In the early days, effective processor utilization was also important. High code density also helps improve execution efﬁciency. As an example, consider the VAX-11/780, the ultimate CISC processor. It was introduced in 1978 and supported 22 addressing modes as opposed to 11 on the Intel 486 that was introduced more than a decade later. The VAX instruction size can range from 2 to 57 bytes, as shown in Table 3.1.

To illustrate how code density affects execution efﬁciency, consider the autoincrement addressing mode of the VAX processor. In this addressing mode, a single instruction can read data from memory, add contents of a register to it, write back the result to memory, and increment the memory pointer. Actions of this instruction are summarized below:

(R2)=(R2)+ R3; R2=R2+1

In this example, the R2 register holds the memory pointer. To implement this CISC instruction, we need four RISC instructions:

Chapter 3 • RISC Principles 41

Table 3.1Characteristics of some CISC and RISC processors

CISC RISC

Characteristic VAX 11/780 Intel 486 MIPS R4000

Number of instructions 303 235 94

Addressing modes 22 11 1

Instruction size (bytes) 2–57 1–12 4

Number of general-purpose registers 16 8 32

R4=(R2) ; load memory contents

R4=R4+R3 ; add contents of R3

(R2)=R4 ; store the result

R2=R2+1 ; increment memory address

The CISC instruction, in general, executes faster than the four RISC instructions. That, of course, was the reason for designing complex instructions in the ﬁrst place. However, execution of a single instruction is not the only measure of performance. In fact, we should consider the overall system performance.

Why RISC?

Designers make choices based on the available technology. As the technology—both hardware and software—evolves, design choices also evolve. Furthermore, as we get more experience in designing processors, we can design better systems. The RISC pro- posal was a response to the changing technology and the accumulation of knowledge from the CISC designs. CISC processors were designed to simplify compilers and to improve performance under constraints such as small and slow memories. The rest of the sec- tion identiﬁes some of the important observations that motivated designers to consider alternatives to CISC designs.

Simple Instructions

The designers of CISC architectures anticipated extensive use of complex instructions because they close the semantic gap. In reality, it turns out that compilers mostly ignore these instructions. Several empirical studies have shown that this is the case. One reason for this is that different high-level languages use different semantics. For example, the semantics of the C forloop is not exactly the same as that in other languages. Thus, compilers tend to synthesize the code using simpler instructions.

Few Data Types

CISC ISA tends to support a variety of data structures, from simple data types such as integers and characters to complex data structures such as records and structures. Empir- ical data suggest that complex data structures are used relatively infrequently. Thus, it is beneﬁcial to design a system that supports a few simple data types efﬁciently and from which the missing complex data types can be synthesized.

Simple Addressing Modes

CISC designs provide a large number of addressing modes. The main motivations are (i) to support complex data structures and (ii) to provide flexibility to access operands. Although this allows flexibility, it also introduces problems. First, it causes variable instruction execution times, depending on the location of the operands. Second, it leads to variable-length instructions. For example, the IA-32 instruction length can range from 1 to 12 bytes. Variable instruction lengths lead to inefficient instruction decoding and scheduling.

Large Register Set

Several researchers have studied the characteristics of procedure calls in HLLs. We quote two studies—one by Patterson and Sequin [22] and the other by Tanenbaum [28]. Several other studies, in fact, support the ﬁndings of these two studies.

Patterson and Sequin’s study of C and Pascal programs found that procedure call/return constitutes about 12 to 15% of HLL statements. As a percentage of the total machine language instructions, call/return instructions are about 31 to 33%. More interesting is the fact that call/return generates nearly half (about 45%) of all memory references. This is understandable as procedure call/return instructions use memory to store activation records. An activation record consists of parameters, local variables, and return values. In the IA-32, for example, the stack is extensively used for these activities. This explains why procedure call/return activities account for a large number of memory references. Thus, it is worth providing efﬁcient support for procedure calls and returns.

In another study, Tanenbaum [28] found that only 1.25% of the called procedures had more than six arguments. Furthermore, more than 93% of them had less than six local scalar variables. These ﬁgures, supported by other studies, suggest that the activation record is not large. If we provide a large register set, we can avoid memory references for most procedure calls and returns. In this context, we note that the eight general-purpose registers available in IA-32 processors are a limiting factor in providing such support. The Itanium, for example, provides a large register set (128 registers), and most procedure calls on the Itanium can completely avoid accessing memory.

Chapter 3 • RISC Principles 43

In document Guide to RISC Processors (Page 50-53)