• No results found

Performance evaluation

N/A
N/A
Protected

Academic year: 2021

Share "Performance evaluation"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Departamento de Ingeniería Electrónica

y de Telecomunicaciones

Facultad de Ingeniería

Arquitecturas Avanzadas de

Computadores - 2547021

Performance evaluation

(2)

Bibliography and evaluation

Bibliography

Lecture slides

Chapter 4: Computer Organization and Design – The

hardware/software interface, D. A. Patterson y J. L. Henessy,

Morgan Kaufman Publishers, 3rd Edition, 2005.

Chapter 1: Computer architecture – A quantitative

approach, J. Henessy and D. Patterson, Morgan Kaufman,

5th Edition, 2011 (previous editions may be good too).

Evaluation

(3)

How good is a computer?

We can think of many parameters:

Porcessor’s clock rate

Power consumed by a program

Execution time for a program

Number of tasks done per second

Reliability

Aesthetic appearance

Social repercussion, etc…

How should we compare two computer systems?

These are the

metrics, the things

we want to estimate

or measure (not all

of them are easy to

measure though)

(4)

Performance: Latency vs.

Throughput

Latency

: time to finish a fixed task

Throughput

: number of tasks per unit of time

Different: exploit parallelism for throughput, not latency

Usually a trade-off: latency

vs.

throughput

Choose definition of performance that matches your goals

Scientific program: latency; web server: throughput?

Example

: transport people 10 km

Car: capacity = 5, speed = 60 kmh

Bus: capacity = 60, speed = 20 kmh

Latency:

car = 10 min

, bus = 30 min

(5)

Example: latency vs. throughput

Do the following changes to a computer system

increase throughput, decrease response time or

both?

a) Replacing the processor with a faster version

b) Adding more processors to a systems that uses multiple

processors for separate tasks (a web sever)

Answer…

a) Both

(6)

Comparing Performance

System a is x times faster than b if

𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑎 =

𝑙𝑎𝑡𝑒𝑛𝑐𝑦(𝑏) 𝑥

𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑎 = 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑏 ∗ 𝑥

System a is x% faster than b if

𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑎 =

𝑙𝑎𝑡𝑒𝑛𝑐𝑦(𝑏) (1 +

𝑥 100)

𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑎 = 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑏 ∗ (1 +

𝑥 100)

Car/bus example

Latency? Car is 3 times (and 200%) faster than bus

Throughput? Bus is 4 times (and 300%) faster than car

(7)

Performance definitions

Let’s define our final goal as to minimize the

execution time

for some application, then

we can define performance in terms of

execution time as follows:

(8)

Execution time

Execution time is affected by multiple factors in a

computer system:

execution time =

CPU time

+ disk access

+ memory access

+ I/O activities

+ OS overhead

We will focus on CPU time since we’ll study mostly

the processor.

However, some applications depend heavily on

e.g. disk access performance.

(9)

CPU time

We measure CPU time in seconds, but…

Remember that computer HW works synchronously,

with a clock signal, having a period and a

frequency

How to relate clock cycles with CPU time?

reg

logic

reg

data

(10)

Clock cycles and CPU time

Just use one of the two simple formulas:

CPU time = clock cycles * cycle time

Or using clock rate

CPU time = clock cycles / cycle rate

Classic designer’s tradeoff :

Attempting to reduce the clock cycles may lead to

reducing the clock rate too, and vice versa

(11)
(12)
(13)

How about instructions?

Since a program executes instructions, they should

also play a part in the CPU performance equations

So far we had:

CPU time = clock cycles * cycle time

Now we will also say that:

clock cycles = instructions for a program * average clock cycles per instruction

CPI: Cycles Per Instruction

IC: Instruction Count

(14)
(15)

The CPU performance equation

Finally , the classic formula that incorporates the three

key factors that affect performance is:

CPU time = Instruction Count * CPI * cycle time

Or

(16)

CPU Performance Equation

Factors affecting CPU execution time:

CPU time = Instruction Count * CPI / clock rate

Factor

Inst. count

CPI

Clock rate

Program

x

(x)

Compiler

x

(x)

ISA

x

x

(x)

Microarchitecture

x

x

(17)

Cycles per Instruction (CPI)

Depends on the instruction

CPI

i

= Execution Time of Instruction

i

* Clock Rate

Computing the total CPI:

(18)

Another CPI Example

Assume a processor with instruction frequencies and

costs

Integer ALU: 50%, 1 cycle

Load: 20%, 5 cycle

Store: 10%, 1 cycle

Branch: 20%, 2 cycle

Which change would improve performance more?

a) Faster branch prediction to reduce branch cost to 1

cycle?

b) Better data cache to reduce load cost to 3 cycles?

Compute CPI

Base = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*2 = 2

A = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*

1

= 1.8

(19)
(20)
(21)

IPC, MIPS and GHz

The metrics you are most likely to see in marketing are

IPC (instruction per cycle), MIPS (million instruction per

second) and GHz

How are they incomplete?

Back to the CPU time formula:

Which processor would you buy?

Processor A: CPI = 2, clock = 5 GHz

Processor B: CPI = 1, clock = 3 GHz

Probably A, but B is faster (assuming same ISA/compiler)

1/IPC

1/MIPS

1/GHz

Meta-point: danger

of partial

(22)

Gene Amdahl

American computer architect

Born in 1922

Worked for IBM until 1970

Founded Amdahl Corporation to

compete in the mainframe market

against IBM

Proposed the later known as

“Amdahl’s Law” during the 1967

Spring Joint Computer Conference

(23)

Amdahl’s law

Suppose an enhancement speeds up a fraction

f

of

a task by a factor of

S

f

(24)

Practicing Amdahl’s law

1.

What is the percentage of time each instruction takes?

2.

How much is the total time reduced if the time for FP instructions is

reduced by 20%? How much is the total speed up?

3.

How much is the total time reduced if the time for L/S instructions is

reduced by 20%? How much is the total speed up?

4.

Can the total time be reduced by 20% by reducing only the time for

branch instructions?

(25)

References

Related documents

seconds instructions cycles seconds program program instruction

b) common instruction set architecture c) complex information set architecture d) common information set architecture.. reduce the cycles per instruction at the cost of the number of

Basics of Computer Architecture, System Bus and instruction cycles, memory subsystem organization and interfacing, system buses and instruction cycles, memory

•  (instructions / program) * (cycles / instruction) * (seconds / cycle). •  Instructions / program: dynamic

„ On average, how many clock cycles does each instruction require.. „ Function of pipeline design and

44.11.2.2 INSTRUCTION STALL CYCLES AND FLOW CHANGE INSTRUCTIONS The CALL and RCALL instructions write to the stack using working register, W15, and can therefore, force an

Instruction Instruction Latency in producing result using result clock cycles.. FP ALU op Another FP ALU

If this bit is written to one at the same time as SPMEN, the next SPM instruction within four clock cycles executes Page Write, with the data stored in the temporary buffer.. The