Computer Architecture Basics

(1)

Computer Architecture Basics

CIS 450 – Computer Organization and Architecture

Copyright c

°2002 Tim Bower

The interface between a computer’s hardware and its software is its architecture. The architecture is described by what the computer’s instructions do, and how they are specified. Understanding how it all works requires knowledge of the structure of a computer and its assembly language.

What is a computer?

There are lots of machines in our world, but only some of those machines qualify as being a computer. What features makes a machine a computer?

The very first machines which bore the label of a computer were designed using electro-mechanical switches. These switches were large. The computers designed from them were more like automated adding machines than today’s computers. A program written for these early machines was entered into the computer by setting an array of relays to be either an electrical short or open circuit. This was often accomplished with the aid of a panel of plug-in contact points and cables. After setting the relays, the program could be executed. To execute a new program, the cables needed to be moved to form a new network of relays.

With the invention of the vacuum tube in the 1940s, faster computers could be designed which could also run more complicated programs. The real genesis of modern computers, however, came with the practice of storing a program in memory. The possibility to storing much larger programs in memory became reality with the invention of ferrite core memory in the 1950s.

According to mathematician John von Neumann, for a machine to be a computer it must have the following: 1. Addressable memory that holds both instructions and data.

2. An arithmetic logic unit. 3. A program counter.

Put another way, it must be programmable. A computer executes the following simple loop for each program. pc= 0; do { instruction = memory[pc++]; decode( instruction ); fetch( operands ); execute; store( results );

} while( instruction != halt ); Note:

• Instructions are the verbs and operands are the objects of this process.

• In some architectures, such as the SPARC, the program counter is advanced by a set amount after each

instruction is read. In the Intel x86, however, the size of the instruction varies. So as the instruction is read and decoded, the amount which the program counter should be advanced is also determined.

The important computer architecture components from von Neumann’s stored program control computer are: CPU Central processing unit The engine of the computer that executes programs.

ALU Arithmetic logic unit This is the part of the CPU that executes individual instructions involving data

(2)

(memory)

Data

registers

Instructions

CPU

ALU

PC

IR

Computer Architecture Proposed by John von Neumann.

Register A memory location in the CPU which holds a fixed amount of data. Registers of most current systems hold 32 bits or 4 bytes of data.

PC Program counter, also called the instruction pointer, is a register which holds the memory address of the

next instruction to be executed.

IR Instruction register A register which holds the current instruction being executed.

Acc Accumulator A register designated to hold the result of an operation performed by the ALU. Register File A collection of several registers.

Fundamental Computer Architectures

Here we describe the most common Computer Architectures, all of which use stored program control.

The Stack Machine

A stack machine implements a stack with registers. The operands of the ALU are always the top two registers of the stack and the result from the ALU is stored in the top register of the stack.

Examples of the stack machine include Hewlett–Packard RPN calculators and the Java Virtual Machine (JVM).

The advantage of a stack machine is it can shorten the length of instructions since operands are implicit. This was important when memory was expensive (20-30 years ago). Now, in Java, it is important since we want to ship executables (class files) over the network.

The Accumulator Machine

An accumulator machine has a special register, called an accumulator, whose contents are combined with another operand as input to the ALU, with the result of the operation replacing the contents of the accumulator.

(3)

Who is John von Neumann?

John Louis von Neumann was born 28 December 1903 in Budapest, Hungary and Died 8 February 1957 in Washington DC.

He was a brilliant mathematician, synthesizer, and promoter of the stored program concept, whose logical design of the Institute for Advanced Studies (IAS) computer became the prototype of most of its successors - the von Neumann Architecture.

Von Neumann was a child prodigy, born into a banking family in Budapest, Hungary. When only six years old he could divide eight-digit numbers in his head.

At a time of political unrest in central Europe, he was invited to visit Princeton University in 1930, and when the Institute for Advanced Studies was founded there in 1933, he was appointed to be one of the original six Professors of Mathematics, a position which he retained for the remainder of his life. By the latter years of World War II von Neumann was playing the part of an executive management consultant, serving on several national committees, applying his amazing ability to rapidly see through problems to their solutions. Through this means he was also a conduit between groups of scientists who were otherwise shielded from each other by the requirements of secrecy. He brought together the needs of the Los Alamos National Laboratory (and the Manhattan Project) with the capabilities of the engineers at the Moore School of Electrical Engineering who were building the ENIAC, and later built his own computer called the IAS machine. Several “supercomputers” were built by National Laboratories as copies of his machine.

Following the war, von Neumann concentrated on the development of the IAS computer and its copies around the world. His work with the Los Alamos group continued and he continued to develop the synergism between computer capabilities and the needs for computational solutions to nuclear problems related to the hydrogen bomb.

His insights into the organization of machines led to the infrastructure which is now known as the “von Neumann Archi-tecture”. However, von Neumann’s ideas were not along those lines originally; he recognized the need for parallelism in computers but equally well recognized the problems of construction and hence settled for a sequential system of implemen-tation. Through the report entitled “First Draft of a Report on the EDVAC” [1945], authored solely by von Neumann, the basic elements of the stored program concept were introduced to the industry.

In the 1950’s von Neumann was employed as a consultant to IBM to review proposed and ongoing advanced technology projects. One day a week, von Neumann “held court” with IBM. On one of these occasions in 1954 he was confronted with the FORTRAN concept. John Backus remembered von Neumann being unimpressed with the concept of high level languages and compilers.

Donald Gillies, one of von Neumann’s students at Princeton, and later a faculty member at the University of Illinois, recalled in the mid-1970’s that the graduates students were being “used” to hand assemble programs into binary for their early machine (probably the IAS machine). He took time out to build an assembler, but when von Neumann found out about it he was very angry, saying (paraphrased), “It is a waste of a valuable scientific computing instrument to use it to do clerical work.”

(4)

(memory)

Data

Instructions

CPU

ALU

PC

IR

Stack

Stack Machine Architecture.

(memory)

Data

Instructions

CPU

ALU

PC

IR

ACC

Accumulator Machine Architecture.

(memory)

Data

Instructions

CPU

ALU

PC

IR

Register

File

(5)

Example Machine Instructions

y = y + 10; y’≡ &y [y’]≡ *y’ = *&y = y

Stack Machine push [y’] push 10 add pop y’ Accumulator Machine load [y’] add 10 store y’ Load/Store Machine load r0, [y’] load r1, 10 add r0, r1, r2 store r2, y’

accumulator = accumulator [op] operand;

In fact, many machines have more than one accumulator

Pentium: 1, 2, 4, or 6 (depending on how you count) MC68000: 16

In order to add two numbers in memory,

1. place one of the numbers into the accumulator (load operand) 2. execute the add instruction

3. store the contents of the accumulator back into memory (store operand)

The Load/Store Machine

Registers: provide faster access but are expensive. Memory: provides slower access but is less expensive.

A small amount of high speed memory (expensive), called a register file, is provided for frequently accessed variables and a much larger slower memory (less expensive) is provided for the rest of the program and data. (SPARC: 32 registers at any one time)

This is based on the principle of “locality” — at a given time, a program typically accesses a small number of variables much more frequently than others.

The machine loads and stores the registers from memory. The arithmetic and logic instructions operate with registers, not main memory, for the location of operands.

Since the machine addresses only a small number of registers, the instruction field to refer to a register (operand) is short; therefore, these machines frequently have instructions with three operands:

add src1, src2, dest

Machine Instructions

Machine instructions are classified into the following three categories: 1. data transfer operations (memory⇔ register, register ⇔ register) 2. arithmetic logic operations (add, sub, and, or, xor, shift, etc) 3. program control operations (branch, call, interrupt)

(6)

The Computer’s Software

The program instructions are stored in memory in machine code or machine language format. An assembler is the program used to translate symbolic programs (assembly language) into machine language programs.

machine language Low level computer instructions that are encoded into binary words.

assembly language The lowest level human readable programming language. All of the detailed instructions for the computer are listed. Assembly programs are directly encoded into machine code. Assembly code can be written by humans, but is more typically produced by a compiler.

high level language Humans typically write programs in a language which allows program logic to be ex-pressed at a conceptual level, ignoring the implementation details which are required of assembly language programs.

Years ago, hardware efficiency was extracted at the expense of the programmer’s time. If a fast program was needed, then it was written in assembly language. Compilers were capable of translating programs from high level languages, but they generated assembly language programs that were relatively inefficient as compared with the same programs written by a programmer in assembly language. Programmers often found it necessary to optimize the assembly language code created by a compiler to improve the performance and reduce the memory requirements of the program.

This is no longer the case. Compilers have improved to the point that they can generate code comparable to, or better than, the code most programmers can generate. Even if hand crafted optimizations could improve the performance, there is little benefit derived from such a laborious activity. Many computers today execute so fast and have enough memory that it is not necessary to optimize code at the assembly language level.

So, since it is increasingly rare for programmers to work at the assembly language level, why is it necessary to learn assembly language? There are actually several reasons to study assembly language.

1. To understand or work on an operating system. Operating systems need to execute instructions which can not be expressed in a high level language, so it is necessary that a portion of an operating system be written in assembly language. Some instances when an operating system needs assembly language include: initializing the hardware and data in the CPU at boot time, handling interrupts, low level interfaces with hardware peripherals, and cases when a compiler’s protection features interfere with the needed operations. 2. To understand or work on a compiler.

3. Real time or embedded systems programming where there may be critical constraints for a program related either to performance or available memory. In some cases with embedded systems, a compiler may not be available.

4. To understand the internal working of a computer. Computer architecture can best be understood when assembly language is used to supplement the study of computer architecture. Assembly language code does not hide details about what the computer is doing.

Complex Instruction Sets and Reduced Instruction Sets

Another important classification of types of computer architectures relates to the available set of instructions for the processor. Here we discuss the historical background and technical differences between two types of processors.

If memory is an expensive and limited resource, there is a large benefit in reducing the size of a program. During the 1960s and 1970s, memory was at a premium. Therefore, much effort was expended on minimizing the size of individual instructions and minimizing the number of instructions necessary to implement a program. During this time period, almost all computer designers believed that rich instruction sets would simplify compiler design and improve the quality of computer architecture.

New instructions were developed to replace frequently used sequences of instructions. For example, a loop variable is often decremented, followed by a branch operation if the result is positive. New architectures therefore introduced a single instruction to decrement a variable and branch conditionally based on the result. Some instructions came to be more like a procedure than a simple operation. Some of these powerful single instructions

(7)

required four or more parameters. As an example, the IBM System/370 has a single instruction that copies a character string of arbitrary length from any location in memory to any other location in memory, while translating characters according to a table stored in memory.

Computers which feature a large number of complex instructions are classified as complex instruction set

computers (CISC). Other examples of CISC computers include the Digital Equipment VAX and the Intel x86

line of processors. The DEC VAX has more than 200 instructions, dozens of distinct addressing modes and instructions with as many as six operands.

The complexity of CISC was accommodated by the introduction of microprogramming or microcode. Mi-crocode composed of low-level hardware instructions that implement high-level instructions required by an ar-chitecture. Microcode was placed in ROM or control-store RAM (which is more expensive, but faster than the ferrite-core memory used in many computers).

However, not all computer designers fell in line with the CISC philosophy. Seymore Cray, for one, believed that complexity was bad, and continued to build the fastest computers in the world by using simple, register-oriented instruction sets. Cray was a proponent of the Reduced Instruction Set Computer (RISC), which is the antidote to CISC. The CDC 6600 and the Cray-1 supercomputer were the precursors of modern RISC architectures. In 1975, Cray made the following remarks about his computer design:

[Registers] made the instructions very simple. . . That is somewhat unique. Most machines have rather elaborate instruction sets involving many more memory references in the instructions than the ma-chines I have designed. Simplicity, I guess, is a way of saying it. I am all for simplicity. If it’s very complicated, I cannot understand it.

Various technological changes in the 1980s made the architectural assumptions of the 1970s no longer valid.

• Faster (10 times or more) and cheaper semiconductor memory and integrated circuits began to replace

ferrite-core and transistor based discrete circuits.

• The invention of cache memories substantially improved the speed of non-microcoded programs.

• Compiler technology had progressed rapidly; optimizing compilers generate code that used only a small

subset of most instruction sets.

A new set of simplified design criteria emerged:

• Instructions should be simple unless there is a good reason for complexity. To be worthwhile, a new

instruction that increases cycle time by 10% must reduce the total number of cycles executed by at least 10%.

• Microcode is generally no faster than sequences of hardwired instructions. Moving software into microcode

does not make it better. It just makes it harder to modify.

• Fixed–format instructions and pipelined1 _{execution are more important than program size. As memory}

becomes cheaper and faster, the space/time tradeoff resolved in favor of time — reducing space no longer decreases time.

• Compiler technology should simplify instructions, rather than generate more complex instructions. Instead

of adding a complicated microcoded instruction, optimizing compilers can generate sequences of simple, fast instructions to do the job. Operands can be kept in registers to increase speed even faster.

What is RISC?

Assembly language programs occasionally use large sets of machine instructions, whereas high–level language compilers generally do not. For example, SUN’s C compiler uses only about 30% of the available Motorola 68020 instructions. Studies show that approximately 80% of computations for a typical program requires only 20% of a processor’s instruction set.

The designers of RISC machines strive for hardware simplicity, with close cooperation between machine architecture and compiler design. In order to add a new instruction, computer architects must ask:

1_{Pipelining relates to parallelizing the steps in the loop of instruction executing. The next instruction is fetched and decoded} while the current instruction is executing. We will discuss pipelining more when we study the Sun SPARC architecture.

(8)

• to what extent would the added instruction improve performance and is it worth the cost of implementation

?

• no matter how useful it is in an isolated instance, would it make all others perform more slowly by its mere

presence ?

The goal of RISC architecture is to maximize the effective speed of a design by performing infrequent functions in software and by including in hardware only features that yield a net performance gain. Performance gains are measured by conducting detailed studies of large high–level language programs.

RISC architectures eliminate complicated instructions that require microcode support.

RISC Architecture

The following characteristics are typical of RISC architectures. Although none of these are required for an architecture to be called RISC, this list does describe most current RISC architectures, including the SPARC design.

1. Single–cycle execution: Most instructions are executed in a single machine cycle.

2. Hardwired control with little or no microcode: Microcode adds a level of complexity and raises the number of cycles per instruction.

3. Load/Store, register-to-register design: All computational instructions involve registers. Memory accesses are made with only load and store instructions.

4. Simple fixed-format instructions with few addressing modes: All instructions are the same length (typically 32 bits) and have just a few ways to address memory.

5. Pipelining: The instruction set design allows for the processing of several instructions at the same time. 6. High–performance memory: RISC machines have at least 32 general–purpose registers and large cache

memory.

7. Migration of functions to software: Only those features that measurably improve performance are imple-mented in hardware. Software contains sequences of simple instructions for executing complex functions rather than complex instructions themselves, which improves system efficiency.

8. More concurrency is visible to software: For example, branches take effect after execution of the following instruction, permitting a fetch of the next instruction during execution of the current instruction.

The real keys to enhanced performance are single-cycle execution and keeping the cycle time as short as possible. Many characteristics of RISC architectures, such as load/store and register-to-register design, facilitate single-cycle execution. Simple fixed-format instructions on the other hand, permit shorter cycles by reducing decoding time.

Early RISC Machines

In the mid 1970s, some computer architects observed that even complex computers execute mostly simple in-structions. This observation led to work on the IBM 801 – the first intentional RISC machine (even though the term RISC had yet to be coined).

The term RISC was coined as part of David Patterson’s 1980 course in microprocessor design at the University of California at Berkeley. The RISC-I chip design was completed in 1982, and the RISC-II chip design was completed in 1984. The RISC-II was a 32-bit microprocessor with 138 registers, and a 330-ns (3 MHz) cycle time. Without the aid of elaborate compiler technology, the RISC-II outperformed the VAX 11/780 at integer arithmetic.

(9)

Size:

Speed:

30 GB

5ms

200 B

5 ns

10ns

256 KB

128 MB

100 ns

Main Memory

Cache

Main

Memory

I/O

Devices

(disk)

bus

Memory

L2

CPU

L1 Cache

Register file

Disk Memory

L2 Cache

L1 Cache

Registers

6 ns

128 KB

Memory Hierarchy Design

Memory hierarchy design is based on three important principles:

• Make the common case fast.

• Principle of locality. Spatial locality refers to the fact that memory that is physically located closer to the

CPU can be accessed faster. Temporal locality refers to the tendency of programs to access the same data several times in a short period of time.

• Smaller is faster.

These are the levels in a typical memory hierarchy. Moving farther away from the CPU, the memory in the

level becomes larger and slower.

When a memory lookup is required, the L1 cache is searched first. If the data is found, this is called a hit. If the data is not in L1 cache, this is called a miss and the L2 cache is checked. If the data is not in the L2 cache, then the data is retrieved from main memory. When there is a miss at either the L1 or L2 cache, the data retrieved from the next level is saved in the cache for future use. Cache hits make the program run much faster than if all memory accesses must go to the main memory.

The connection between the CPU and main memory is called the front-side bus. A common design is for the front-side bus to be divided into four channels. If the front-side bus speed is listed at 800 MHz, it is probably four channels each running at 200 MHz. The connection between the CPU and the L2 cache is called the backside bus.

Binary Representation of Data

Here we briefly consider the format used to store data variables in memory and in registers. If you need more details than is provided here, then check your notes from EECE 241 or other resources.

(10)

Registers

L1 Cache

L2 Cache

Main Memory

Disk

Faster,

more expensive

memory

quantity

Larger

Integer Variables

Unsigned variables that generally fall into the category of integers (char, short, int, long) are stored in straight binary format, beginning with all zeros for zero up to all ones for the largest number that can be represented by the data type.

The signed variables that generally fall into the category of integers (char, short, int, long) are stored in 2’s – compliment format. This ensures that the binary digits represent a continuous number line from the most negative number to the largest positive number with zero being represented with all zero bits. The most significant bit is considered the sign bit. The sign bit is one for negative numbers and zero for positive numbers.

Decimal int (hex) short (hex)

-2,147,483,648 0x80000000 -2,147,483,647 0x80000001 .. . ... -32,768 0xffff8000 0x8000 -32,767 0xffff8001 0x8001 .. . ... ... -2 0xfffffffe 0xfffe -1 0xffffffff 0xffff 0 0x00000000 0x0000 1 0x00000001 0x0001 .. . ... ... 32,767 0x00007fff 0x7fff .. . ... 2,147,483,647 0x7fffffff

Any two binary numbers can thus be added together in a straight forward manner to get the correct answer. If there is a carry bit beyond what the data type can represent, it is discarded.

1 0x0001

+(-1) + 0xffff ---

---0 0x0000

To change the sign of any number, invert all the bits and add 1. 2 = 0x0002 = 000...010 ==> 111...101

+ 1

(11)

X

00010010

Big Endian

_{Little Endian}

X

0x12

0x34

0x56

0x78

X − 2

X − 3

X + 4

X + 5

X + 8

X − 4

X − 1

X + 6

X + 7

X − 8

X+ 8

X − 8

X − 4

X − 3

X − 2

X − 1

X + 7

X + 6

X + 5

X + 4

Memory at Address X contains 0x12345678

0x56

0x78

0x34

0x12

X + 1 X + 2

X + 3

X + 2

X + 1

Big/Little Endian Memory Maps

Conversions of Integer Variables

It is often necessary to convert a smaller data type to a larger type. For this, there are either special instructions (Intel x86), or a sequence of a couple simple instructions (Sun SPARC) to promote a variable to a larger data type.

If the variable is unsigned, then extra zeros are just filled into the most significant bits (movezx move - zero extending, for Intel x86).

For signed variables, then the sign bit needs to be extended to fill the most significant bits (movesx move -sign extending, for Intel x86).

0x6fa1 ==> 0x00006fa1 (sign extend a positive number) 0xfffe ==> 0xfffffffe (sign extend a negative number) 0x9002 ==> 0xffff9002 (sign extend a negative number)

Byte Order

Not all computers store the bits (and bytes) of a variable in the same order. The Intel x86 line of processors stores the least significant bit in the lowest memory address (right most position) and the most significant bit in the highest memory address. This scheme is called Little Endian.

Sun SPARC and most other UNIX platforms do the opposite. They store the most significant byte in the lowest memory address. SPARC is thus considered a Big Endian machine. In a TCP/IP packet, the first transmitted data is the most significant byte, thus the Internet is considered Big Endian.

The lowest memory address is considered the memory address for a variable. Hence we see a difference between Little Endian and Big Endian when we draw memory maps. With Little Endian (Intel) we label the location of an address on the right side of the map. With Big Endian (SPARC), labels are placed on the left side of the map.

The term is used because of an analogy with the story Gulliver’s Travels, in which Jonathan Swift imagined a never-ending fight between the kingdoms of the Big-Endians and the Little-Endians, whose only difference is in where they crack open a hard-boiled egg.

(12)

32 bits

8

23

1

1 64 bits

11

52 mantissa

exp

s

exp

mantissa

Single Precision

s

Double Precision

IEEE FPS floating point formats.

Floating Point Variables

Floating point variables have been represented in many different ways inside computers of the past. But there is now a well adhered to standard for the representation of floating point variables. The standard is known as the IEEE Floating Point Standard (FPS). Like scientific notation, FPS represents numbers with multiple parts, a sign bit, one part specifying the mantissa and a part representing the exponent. The mantissa is represented as a signed magnitude integer (i.e., not 2’s Compliment), where the value is normalized. The exponent is represented as an unsigned integer which is biased to accommodate negative numbers. An 8-bit unsigned value would normally have a range of 0 to 255, but 127 is added to the exponent, giving it a range of -126 to +127.

Follow these steps to convert a number to FPS format. 1. First convert the number to binary.

2. Normalize the number so that there is one nonzero digit to the left of the binary place, adjusting the exponent as necessary.

3. The digits to the right of the binary point are then stored as the mantissa starting with the most significant bits of the mantissa field. Because all numbers are normalized, there is no need to store the leading 1. Note: Because the leading 1 is dropped, it is no longer proper to refer to the stored value as the mantissa. In IEEE terms, this mantissa minus its leading digit is called the significand.

4. Add 127 to the exponent and convert the resulting sum to binary for the stored exponent value. For double precision, add 1023 to the exponent. Be sure to include all 8 or 11 bits of the exponent.

5. The sign bit is a one for negative numbers and a zero for positive numbers.

6. Compilers often express FPS numbers in hexadecimal, so a quick conversion to hexadecimal might be desired.

Here are some examples using single precision FPS.

• 3.5 = 11.1 (binary)

= 1.11 x 2^1 sign = 0, significand = 1100..., exponent = 1 + 127 = 128 = 10000000 FPS number (3.5) = 0100000001100000...

(13)

• 100 = 1100100 (binary)

= 1.100100 x 2^6 sign = 0, significand = 100100..., exponent = 6 + 127 = 133 = 10000101 FPS number (100) = 010000101100100...

= 0x42c80000

• What decimal number is represented in FPS as 0xc2508000?

Here we just reverse the steps.

0xc2508000 = 11000010010100001000000000000000 (binary)

sign = 1; exponent = 10000100; significand = 10100001000000000000000 exponent = 132 ==> 132 - 127 = 5

-1.10100001 x 2^5 = -110100.001 = -52.125

Floating Point Arithmetic

Until fairly recently, floating point arithmetic was performed using complex algorithms with an integer ALU. The main ALU in CPUs is still an integer arithmetic ALU. However, in the mid-1980s, special hardware was developed to perform floating point arithmetic. Intel, for example, sold a chip known as the 80387 which was a math co-processor to go along with the 80386 CPU. Most people did not buy the 80387 because of the cost. A major selling point of the 80486 was that the math co-processor was integrated onto the CPU, which eliminated the need to purchase a separate chip to get faster floating point arithmetic.

Floating point hardware usually has a special set of registers and instructions for performing floating point arithmetic. There are also special instructions for moving data between memory or the normal registers and the floating point registers.

Most of the discussion in this class will focus on integer operations, but we will try to show at least a couple examples of floating point arithmetic.

Role of the Operating System

The operating system (OS) is a program that allocates and controls the use of all system resources: the processor, the main memory, and all I/O devices. In addition, the operating system allows multiple, independent programs to share computer resources while running concurrently. But when we look at our programs (written in any language), we don’t see any allowance for the operating system or any other program. The code is written as if our program is the only program running. So how is this accomplished? How does the operating system get control back from user programs to do its work? The answer relates to the tight coupling between key parts of the code in the OS kernel,2 the architecture of the CPU, and something called interrupts.

When a computer is turned on, or booted, the OS (Windows, Linux, Minix, Solaris, etc . . . ), initializes the hardware and also builds critical data structures in memory. Most of the data structures are used by the operating system kernel. However, some of the data structures are loaded according to the specification of the CPU manufacturer. This CPU specific data is used to switch processing from user programs and the kernel.

In the Intel x86, for example, two special registers in the CPU hold pointers to memory used when an interrupt is received. When a hardware event occurs, such as when a key is pressed on the keyboard, a hardware interrupt is issued. The CPU then reads a register to get a pointer to a stack where it will save some of the key register values to. This is not the same stack that the user program uses. The CPU then reads another register to get a pointer to a special table called the interrupt descriptor table. It also checks with the interrupt hardware to get a vector for which interrupt occurred. Then, based on which interrupt occurred and the information in the interrupt descriptor table, the CPU causes processing to switch from the running of a user level program to running a interrupt handler in the kernel. All of the above described operations are done automatically by the

2_{The kernel of an OS is the critical part of the OS that handles the lowest levels of the OS such as scheduling of processes, memory} management, and device control. It is not related to the user interface or utilities provided by the OS.

(14)

CPU when an interrupt is received. Thus, the reception of an interrupt is how user programs are suspended and processing switched to the kernel.

Once the kernel gets control, it will want to save more registers from the user program, handle the hardware event and check if work needs to be done related to internal operations such as memory or process management. Then finally, the kernel will let a user program run again. In doing so, it will restore some registers and issue a special instruction that causes the final registers to be restored and processing to switch back to the user program. Since all the registers are restored, the user program never knows that it was interrupted.

There are three types of interrupts which the CPU recognizes.

Hardware Interrupt This is any type of hardware event such as a key pressed on the keyboard, a hard disk completing the reading or writing of data, or the reception of an ethernet packet, etc. . . . Many operating systems program a clock to issue interrupts at regular intervals so that the kernel is guaranteed to get control on a regular basis even if no hardware events occur and a user program never releases the CPU. Software Interrupt When a user program needs to make a system call to the operating system, such as for I/O or to request more memory, it may issue a special instruction called a software interrupt to cause the CPU to switch processing to the kernel.

Trap A trap is issued by the CPU itself when it detects that something is wrong or needs special attention. In most cases a trap is issued when a user program performs an illegal instruction such as a divide by zero error or illegal memory reference. In the Sun SPARC, there are some traps which occur in normal processing of a program.

Most of the kernel’s code is termed reentrant, meaning that additional interrupts may be received even while processing a previous interrupt. There are special assembly language instructions to turn interrupts off or on. Interrupts are turned off in critical sections of the kernel when an interrupt will cause memory corruption in the kernel. When interrupts are turned off, interrupts are queued by the hardware and will be issued when interrupts are turned on again. A critical concern in operating system design is knowing when to turn interrupts off and on. Interrupts should be left on except when absolutely necessary. Thus operating systems use clever algorithms to make as much of the kernel reentrant as possible.

More will be discussed about operating systems as related to computer architecture and assembly language later in the semester after more specifics of the processors and assembly language have been covered.