Departamento de Ingeniería Electrónica
y de Telecomunicaciones
Facultad de Ingeniería
Arquitecturas Avanzadas de
Computadores - 2547021
Performance evaluation
Bibliography and evaluation
Bibliography
•
Lecture slides
•
Chapter 4: Computer Organization and Design – The
hardware/software interface, D. A. Patterson y J. L. Henessy,
Morgan Kaufman Publishers, 3rd Edition, 2005.
•
Chapter 1: Computer architecture – A quantitative
approach, J. Henessy and D. Patterson, Morgan Kaufman,
5th Edition, 2011 (previous editions may be good too).
Evaluation
How good is a computer?
We can think of many parameters:
–
Porcessor’s clock rate
–
Power consumed by a program
–
Execution time for a program
–
Number of tasks done per second
–
Reliability
–
Aesthetic appearance
–
Social repercussion, etc…
How should we compare two computer systems?
These are the
metrics, the things
we want to estimate
or measure (not all
of them are easy to
measure though)
Performance: Latency vs.
Throughput
•
Latency
: time to finish a fixed task
•
Throughput
: number of tasks per unit of time
–
Different: exploit parallelism for throughput, not latency
–
Usually a trade-off: latency
vs.
throughput
–
Choose definition of performance that matches your goals
•
Scientific program: latency; web server: throughput?
•
Example
: transport people 10 km
–
Car: capacity = 5, speed = 60 kmh
–
Bus: capacity = 60, speed = 20 kmh
–
Latency:
car = 10 min
, bus = 30 min
Example: latency vs. throughput
Do the following changes to a computer system
increase throughput, decrease response time or
both?
a) Replacing the processor with a faster version
b) Adding more processors to a systems that uses multiple
processors for separate tasks (a web sever)
Answer…
a) Both
Comparing Performance
System a is x times faster than b if
𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑎 =
𝑙𝑎𝑡𝑒𝑛𝑐𝑦(𝑏) 𝑥
𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑎 = 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑏 ∗ 𝑥
System a is x% faster than b if
𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑎 =
𝑙𝑎𝑡𝑒𝑛𝑐𝑦(𝑏) (1 +
𝑥 100)
𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑎 = 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑏 ∗ (1 +
𝑥 100)
Car/bus example
–
Latency? Car is 3 times (and 200%) faster than bus
–
Throughput? Bus is 4 times (and 300%) faster than car
Performance definitions
Let’s define our final goal as to minimize the
execution time
for some application, then
we can define performance in terms of
execution time as follows:
Execution time
Execution time is affected by multiple factors in a
computer system:
execution time =
CPU time
+ disk access
+ memory access
+ I/O activities
+ OS overhead
We will focus on CPU time since we’ll study mostly
the processor.
However, some applications depend heavily on
e.g. disk access performance.
CPU time
•
We measure CPU time in seconds, but…
•
Remember that computer HW works synchronously,
with a clock signal, having a period and a
frequency
•
How to relate clock cycles with CPU time?
reg
logicreg
data
Clock cycles and CPU time
Just use one of the two simple formulas:
CPU time = clock cycles * cycle time
Or using clock rate
CPU time = clock cycles / cycle rate
Classic designer’s tradeoff :
Attempting to reduce the clock cycles may lead to
reducing the clock rate too, and vice versa
How about instructions?
Since a program executes instructions, they should
also play a part in the CPU performance equations
So far we had:
CPU time = clock cycles * cycle time
Now we will also say that:
clock cycles = instructions for a program * average clock cycles per instruction
CPI: Cycles Per Instruction
IC: Instruction Count
The CPU performance equation
Finally , the classic formula that incorporates the three
key factors that affect performance is:
CPU time = Instruction Count * CPI * cycle time
Or
CPU Performance Equation
Factors affecting CPU execution time:
CPU time = Instruction Count * CPI / clock rate
Factor
Inst. count
CPI
Clock rate
Program
x
(x)
Compiler
x
(x)
ISA
x
x
(x)
Microarchitecture
x
x
Cycles per Instruction (CPI)
•
Depends on the instruction
–
CPI
i
= Execution Time of Instruction
i
* Clock Rate
•
Computing the total CPI:
Another CPI Example
•
Assume a processor with instruction frequencies and
costs
Integer ALU: 50%, 1 cycle
Load: 20%, 5 cycle
Store: 10%, 1 cycle
Branch: 20%, 2 cycle
•
Which change would improve performance more?
a) Faster branch prediction to reduce branch cost to 1
cycle?
b) Better data cache to reduce load cost to 3 cycles?
•
Compute CPI
Base = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*2 = 2
A = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*
1
= 1.8
IPC, MIPS and GHz
•
The metrics you are most likely to see in marketing are
IPC (instruction per cycle), MIPS (million instruction per
second) and GHz
How are they incomplete?
•
Back to the CPU time formula:
•
Which processor would you buy?
–
Processor A: CPI = 2, clock = 5 GHz
–
Processor B: CPI = 1, clock = 3 GHz
–
Probably A, but B is faster (assuming same ISA/compiler)
1/IPC
1/MIPS
1/GHz
Meta-point: danger
of partial
Gene Amdahl
•
American computer architect
•
Born in 1922
•
Worked for IBM until 1970
•
Founded Amdahl Corporation to
compete in the mainframe market
against IBM
•
Proposed the later known as
“Amdahl’s Law” during the 1967
Spring Joint Computer Conference
Amdahl’s law
•
Suppose an enhancement speeds up a fraction
f
of
a task by a factor of
S
f
Practicing Amdahl’s law
1.
What is the percentage of time each instruction takes?
2.
How much is the total time reduced if the time for FP instructions is
reduced by 20%? How much is the total speed up?
3.
How much is the total time reduced if the time for L/S instructions is
reduced by 20%? How much is the total speed up?
4.
Can the total time be reduced by 20% by reducing only the time for
branch instructions?